Java class file


A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine. A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file.
JVMs are available for many platforms, and a class file compiled on one platform will execute on a JVM of another platform. This makes Java applications platform-independent.

History

On 11 December 2006, the class file format was modified under Java Specification Request 202.

File layout and structure

Sections

There are 10 basic sections to the Java Class File structure:
Class files are identified by the following 4 byte header : CA FE BA BE. The history of this magic number was explained by James Gosling referring to a restaurant in Palo Alto:

"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" I hit on BABE and decided to use it.
At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI.

General layout

Because the class file contains variable-sized items and does not also contain embedded file offsets, it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types:
Some of these fundamental types are then re-interpreted as higher-level values, depending on context.
There is no enforcement of word alignment, and so no padding bytes are ever used.
The overall layout of the class file is as shown in the following table.
byte offsetsizetype or valuedescription
04 bytesu1 =
0xCA hex
magic number used to identify file as conforming to the class file format
14 bytesu1 =
0xFE hex
magic number used to identify file as conforming to the class file format
24 bytesu1 =
0xBA hex
magic number used to identify file as conforming to the class file format
34 bytesu1 =
0xBE hex
magic number used to identify file as conforming to the class file format
42 bytesu2minor version number of the class file format being used
52 bytesu2minor version number of the class file format being used
62 bytesu2major version number of the class file format being used.

Java SE 14 = 58,

Java SE 13 = 57,

Java SE 12 = 56,

Java SE 11 = 55,

Java SE 10 = 54,

Java SE 9 = 53,

Java SE 8 = 52,
Java SE 7 = 51,
Java SE 6.0 = 50,
Java SE 5.0 = 49,
JDK 1.4 = 48,
JDK 1.3 = 47,
JDK 1.2 = 46,
JDK 1.1 = 45.
For details of earlier version numbers see footnote 1 at
72 bytesu2major version number of the class file format being used.

Java SE 14 = 58,

Java SE 13 = 57,

Java SE 12 = 56,

Java SE 11 = 55,

Java SE 10 = 54,

Java SE 9 = 53,

Java SE 8 = 52,
Java SE 7 = 51,
Java SE 6.0 = 50,
Java SE 5.0 = 49,
JDK 1.4 = 48,
JDK 1.3 = 47,
JDK 1.2 = 46,
JDK 1.1 = 45.
For details of earlier version numbers see footnote 1 at
82 bytesu2constant pool count, number of entries in the following constant pool table. This count is at least one greater than the actual number of entries; see following discussion.
92 bytesu2constant pool count, number of entries in the following constant pool table. This count is at least one greater than the actual number of entries; see following discussion.
10cpsize tableconstant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing number of entries in total.
...cpsize tableconstant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing number of entries in total.
...cpsize tableconstant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing number of entries in total.
...cpsize tableconstant pool table, an array of variable-sized constant pool entries, containing items such as literal numbers, strings, and references to classes or methods. Indexed starting at 1, containing number of entries in total.
10+cpsize2 bytesu2access flags, a bitmask
11+cpsize2 bytesu2access flags, a bitmask
12+cpsize2 bytesu2identifies this class, index into the constant pool to a "Class"-type entry
13+cpsize2 bytesu2identifies this class, index into the constant pool to a "Class"-type entry
14+cpsize2 bytesu2identifies super class, index into the constant pool to a "Class"-type entry
15+cpsize2 bytesu2identifies super class, index into the constant pool to a "Class"-type entry
16+cpsize2 bytesu2interface count, number of entries in the following interface table
17+cpsize2 bytesu2interface count, number of entries in the following interface table
18+cpsizeisize tableinterface table: a variable-length array of constant pool indexes describing the interfaces implemented by this class
...isize tableinterface table: a variable-length array of constant pool indexes describing the interfaces implemented by this class
...isize tableinterface table: a variable-length array of constant pool indexes describing the interfaces implemented by this class
...isize tableinterface table: a variable-length array of constant pool indexes describing the interfaces implemented by this class
18+cpsize+isize2 bytesu2field count, number of entries in the following field table
19+cpsize+isize2 bytesu2field count, number of entries in the following field table
20+cpsize+isizefsize tablefield table, variable length array of fields
each element is a field_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.5
...fsize tablefield table, variable length array of fields
each element is a field_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.5
...fsize tablefield table, variable length array of fields
each element is a field_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.5
...fsize tablefield table, variable length array of fields
each element is a field_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.5
20+cpsize+isize+fsize2 bytesu2method count, number of entries in the following method table
21+cpsize+isize+fsize2 bytesu2method count, number of entries in the following method table
22+cpsize+isize+fsizemsize tablemethod table, variable length array of methods
each element is a method_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.6
...msize tablemethod table, variable length array of methods
each element is a method_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.6
...msize tablemethod table, variable length array of methods
each element is a method_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.6
...msize tablemethod table, variable length array of methods
each element is a method_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.6
22+cpsize+isize+fsize+msize2 bytesu2attribute count, number of entries in the following attribute table
23+cpsize+isize+fsize+msize2 bytesu2attribute count, number of entries in the following attribute table
24+cpsize+isize+fsize+msizeasize tableattribute table, variable length array of attributes
each element is an attribute_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7
...asize tableattribute table, variable length array of attributes
each element is an attribute_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7
...asize tableattribute table, variable length array of attributes
each element is an attribute_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7
...asize tableattribute table, variable length array of attributes
each element is an attribute_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7

Representation in a C-like programming language

Since C doesn't support multiple variable length arrays within a struct, the code below won't compile and only serves as a demonstration.

struct Class_File_Format

The constant pool

The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit numbers, where index value 1 refers to the first constant in the table.
Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1, but the count should actually be interpreted as the maximum index plus one. Additionally, two types of constants take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.
The type of each item in the constant pool is identified by an initial byte tag. The number of bytes following this tag and their interpretation are then dependent upon the tag value. The valid constant types and their tag values are:
Tag byteAdditional bytesDescription of constantVersion introduced
12+x bytes
UTF-8 string: a character string prefixed by a 16-bit number indicating the number of bytes in the encoded string which immediately follows. Note that the encoding used is not actually UTF-8, but involves a slight modification of the Unicode standard encoding form.1.0.2
34 bytesInteger: a signed 32-bit two's complement number in big-endian format1.0.2
44 bytesFloat: a 32-bit single-precision IEEE 754 floating-point number1.0.2
58 bytesLong: a signed 64-bit two's complement number in big-endian format 1.0.2
68 bytesDouble: a 64-bit double-precision IEEE 754 floating-point number 1.0.2
72 bytesClass reference: an index within the constant pool to a UTF-8 string containing the fully qualified class name 1.0.2
82 bytesString reference: an index within the constant pool to a UTF-8 string 1.0.2
94 bytesField reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. 1.0.2
104 bytesMethod reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. 1.0.2
114 bytesInterface method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. 1.0.2
124 bytesName and type descriptor: two indexes to UTF-8 strings within the constant pool, the first representing a name and the second a specially encoded type descriptor.1.0.2
153 bytesMethod handle: this structure is used to represent a method handle and consists of one byte of type descriptor, followed by an index within the constant pool.7
162 bytesMethod type: this structure is used to represent a method type, and consists of an index within the constant pool.7
174 bytesDynamic: this is used to specify a dynamically computed constant produced by invocation of a bootstrap method.11
184 bytesInvokeDynamic: this is used by an invokedynamic instruction to specify a bootstrap method, the dynamic invocation name, the argument and return types of the call, and optionally, a sequence of additional constants called static arguments to the bootstrap method.7
192 bytesModule: this is used to identify a module.9
202 bytesPackage: this is used to identify a package exported or opened by a module.9

There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant.
Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".
The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences. The first is that the code point U+0000 is encoded as the two-byte sequence C0 80 instead of the standard single-byte encoding 00. The second difference is that supplementary characters are encoded using a surrogate-pair construction similar to UTF-16 rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example, U+1D11E is encoded as the 6-byte sequence ED A0 B4 ED B4 9E, rather than the correct 4-byte UTF-8 encoding of F0 9D 84 9E.