GOFF
The GOFF specification was developed for IBM's MVS operating system to supersede the IBM OS/360 Object File Format to compensate for weaknesses in the older format.
Background
The original IBM OS/360 Object File Format was developed in 1964 for the new IBM System/360 mainframe computer. The format was also used by makers of plug compatible and workalike mainframes, including the Univac 90/60, 90/70 and 90/80 and Fujitsu B2800. The format was expanded to add symbolic records and expanded information about modules, plus support for procedures and functions with names longer than 8 characters. While this helped, it did not provide for the enhanced information necessary for today's more complicated programming languages and more advanced features such as objects, properties and methods, Unicode support, and virtual methods.The GOFF object file format was developed by IBM approximately in 1995 as a means to overcome these problems. The earliest mention of this format was in the introductory information about the new High Level Assembler. Note that the OS/360 Object File Format was simply superseded by the GOFF format, it was not deprecated, and is still in use by assemblers and language compilers where the language can withstand the limitations of the older format.
Conventions
This article will use the term "module" to refer to any name or equivalent symbol, which is used to provide an identifier for a piece of code or data external to the scope to which it is referenced. A module may refer to a subroutine, a function, Fortran Common or Block Data, an object or class, a method or property of an object or class, or any other named routine or identifier external to that particular scope referencing the external name.The terms "assembler" for a program that converts assembly language to machine code, as well as as the process of using one, and as the process of using a "compiler," which does the same thing for high-level languages, should, for the purposes of this article. be considered interchangeable; thus where "compile" and "compiler" are used, substitute "assemble" and "assembler" as needed.
Numbers used in this article are expressed as follows: unless specified as hexadecimal, all numbers used are in decimal. When necessary to express a number in hexadecimal, the standard mainframe assembler format of using the capital letter X preceding the number, expressing any hexadecimal letters in the number in upper case, and enclosing the number in single quotes, e.g. the number 15deadbeef16 would be expressed as X'15DEADBEEF'.
A "byte" as used in this article, is 8-bits, and unless otherwise specified, a "byte" and a "character" are the same thing; characters in EBCDIC are also 8-bit. When multi-byte character sets are used in user programs, they will use two bytes.
Requirements and restrictions
The format is similar to the OS/360 Object File Format but adds additional information for use in building applications.- GOFF files are either fixed- or variable-length records.
- A GOFF record must completely fit within a single record of the underlying file system. A GOFF file is not a stream-type file.
- Fixed-length records must be 80 bytes. The minimum size of a variable-length record is 56 bytes. In the case of fixed-length records, there will be unused bytes at the end of a record. These bytes must be set to binary zero.
- The program reading GOFF records is not to make assumptions about the internal format of records, the operating system is presumed to be able to provide fixed- or variable-length records without the program reading them needing to be aware of the operating system internal file management. The length of a record is not part of the record itself.
- Binary values are stored in big endian format, e.g. the value 1 is X'01' for an 8-bit value, X'0001' for a 16-bit value, X'00000001' for a 32-bit value, and X'0000000000000001' for a 64-bit value.
- Bits are counted from left to right; bit 0 is the left-most bit in a byte or word.
- Fixed-length records are required for GOFF files deployed on Unix systems.
- A record may be continued on a subsequent record. Where a record is continued, no intervening record shall occur between the record being continued and the continuation record.
- A GOFF object file starts with an HDR record and ends with an END record. The END record should include the number of GOFF records in the file.
- A language compiler or assembler can produce multiple GOFF files in one compilation/assembly, but the individual GOFF files must be separate from each other.
- Module and Class names are case sensitive. A module named "exit" need not be the same as "EXIT" used by the Fortran language.
- Some conventions applicable to the OS/360 Object File Format are carried over to the GOFF Object File Format, including:
- * Unless otherwise specified, all characters are in the EBCDIC character set, except for external names, as stated below.
- * ESD items must be numbered starting with 1 and each new item is to have the next number in sequence, without any 'gaps' in the numbering sequence.
- * An ESD item must be defined before any other record references it.
- * Each ESD record contains exactly one ESD item.
- * An RLD record may contain one or more items, and an RLD record may be continued to a subsequent record.
- * To ensure future compatibility, fields indicated as 'reserved' should be set to binary zero.
- * Character sets used for external names are not defined by the GOFF standard, but there is a provision for a file to indicate what character set is being used. Some IBM products, however, only allow characters for external names and other identifiers to a restricted range, typically hexadecimal values of X'41' through X'FE' plus the shift-in and shift out characters, X'0F' and X'0E', respectively.
- The new format supports Class names, of which there are two types, reserved and user supplied or non-reserved. All class names have a maximum length of 16 characters.
- Reserved Class names consist of a single letter, an underscore, and 1 to 14 characters. Reserved Class names beginning with B_ are reserved for the binder; Reserved Class names beginning with C_ marked as loadable are reserved for programs created for use with IBM's Language Environment. Class names beginning with C_ which are not marked as loadable, as well as classes beginning with X_, Y_ or Z_ are available for general use as non-reserved.
- User Supplied class names may be lower-case.
- Class names are not external symbols.
- The SYM object file symbolic table information from the 360 Object File format record is not available for GOFF object files; the ADATA record must be used instead.
Record Types
- HDR record must occur first, it defines the header for the object file.
- ESD records define main programs, subroutines, functions, dummy sections, Fortran Common, methods and properties, and any module or routine that can be called by another module. They are used to define the program or program segments that were compiled in this execution of the compiler, and external routines used by the program in C, CALL EXIT in Fortran; new and dispose in Pascal). ESD records should occur before any reference to an ESD symbol.
- TXT records have been expanded, and in addition to containing the machine instructions or data which is held by the module, they also contain Identification Data records, Associated Data records, and additional information related to the module.
- RLD records are used to relocate addresses. For example, a program referencing an address located 500 bytes inside the module, will internally store the address as 500, but when the module is loaded into memory it's bound to be located someplace else, so an RLD record informs the linkage editor or loader what addresses to change. Also, when a module references an external symbol, it will usually set the value of the symbol to zero, then include an RLD entry for that symbol to allow the loader or linkage editor to alter the address to the correct value.
- LEN records are new, and supply certain length information.
- END records indicate the end of a module, and optionally where the program is to begin execution. This must be the last record in the file.
Format
PTV
The PTV field represents the first 3 bytes of every GOFF record.Byte | Bits | Value | Purpose |
0 | All | 03 | Indicates start of a GOFF record |
1 | 0-3 | 0 | ESD record |
1 | 0-3 | 1 | TXT record |
1 | 0-3 | 2 | RLD record |
1 | 0-3 | 3 | LEN record |
1 | 0-3 | 4 | END record |
1 | 0-3 | X'5'-X'E' | Reserved |
1 | 0-3 | X'F' | HDR record |
1 | 4-5 | Reserved | |
1 | 6-7 | 00 | Initial record that is not continued on the next record. This should be the only value used for variable-length GOFF records |
1 | 6-7 | 01 | Initial record which is continued on next record |
1 | 6-7 | 10 | Continuation record not continued on next record |
1 | 6-7 | 11 | Continuation record which is continued on the next record |
2 | All | 00 | Version Number of the object file format. All values except X'00' are reserved |
HDR
The HDR record is required, and must be the first record.Byte | Size | Field | Value | Purpose |
0-2 | 3 | PTV | X'03F000' | Only allowed value; HDR record currently cannot be continued |
3-47 | 45 | 0 | Reserved | |
48-51 | 4 | Architecture Level | Binary 0 or 1 | GOFF Architecture level; all values except 0 and 1 are reserved |
52-53 | 2 | Module Properties Size | binary | Length of Module Properties Field |
54-59 | 6 | 0 | Reserved | |
60- | 0+ | Module Properties | Module Properties List |
ESD
An ESD record gives the public name for a module, a main program, a subroutine, procedure, function, property or method in an object, Fortran Common or alternate entry point. An ESD record for a public name must be present in the file before any reference to that name is made by any other record.Continuation
In the case of fixed-length records where the name requires continuation records, the following is used:Behavior Attributes
ADATA records
ADATA records are used to provide additional symbol information about a module. They replaced the older SYM records in the 360 object file format. To create an ADATA record- Create an ESD record of type ED for the class name that the records are part of
- Set all fields in the Behavioral Attributes record to 0 except
- * Class Loading is X'10'
- * Binding Algorithm is 0
- * Text Record Style is X'0010'
- * Optionally set the Read Only and Not Executable values if appropriate
- Create a TXT record for each ADATA item
- * Element ESDID is the value of the ADATA ED record for that particular ADATA entry
- * Offset is zero
- * Data Length is the length of the ADATA record
- * Data field contains the actual ADATA record itself
Class names assigned to ADATA records are translated by IBM programs by converting the binary value to text and appending it to the name C_ADATA, So an item numbered X'0033' would become the text string C_ADATA0033.
TYpe | Description |
Translator records. | |
Program Management records | |
Reserved | |
Reserved for compilers and assemblers not released by IBM. | |
Available for User Records. IBM will not use these values. |
TXT
TXT records specify the machine code instructions and data to be placed at a specific address location in the module. Note that wherever a "length" must be specified for this record, the length value must include any continuations to this record.Continuation
Compression Table
A compression table is used if bytes 20-21 of the TXT record is nonzero. The R value is used to determine the number of times to repeat the string; the L value indicates the length of the text to be repeated "R" times. This could be used for pre-initializing tables or arrays to blanks or zero or for any other purpose where it is useful to express repeated data as a repeat count and a value.IDR Data Table
IDR Format 1
Note that unlike most number values stored in a GOFF file, the "version", "release" and "trans_date" values are numbers as text characters instead of binaryByte | Size | Field | Value | Purpose |
0-9 | 10 | Translator | Any text | This value is what the assembler or compiler identifies itself as; IBM calls this the "PID value" or "Program ID value" from IBM's catalog numbers of various programs, e.g. the Cobol Compiler for OS/VS1 is called "IKFCBL00" |
10-11 | 2 | Version | two digits | This is the version number of the assembler or compiler, 0 to 99. |
12-13 | 2 | Release | two digits | This is the release number subpart of the version number above, also 0 to 99 |
14-18 | 5 | Trans_Date | YYDDD | 5 text characters indicating the 2-digit year, and the 3-digit day of the year this module was compiled or assembled; years 01-65 are presumed to be in the 21st Century, while year 00 or years greater than 65 are presumed to be in the 20th Century, e.g. 2000 or 1966-1999. The three digit day starts at 001 for January 1; 032 for February 1; 060 is March 1 in standard years and February 29 in leap years; and continuing through 365 for December 31 in standard years and 366 for leap years. |
IDR Format 2
Normally compilers and assemblers do not generate this format record, it is typically created by the binder.Byte | Size | Field | Value | Purpose |
0-3 | 4 | Date | Packed decimal form YYYYDDDF | Date module was assembled or compiled, with the year and day of the year |
4-5 | 2 | Data_Length | Binary value | Actual length of next field, an unsigned, nonzero value |
6-85 | 80 | IDR_Data | Format of this data has not been disclosed |
IDR Format 3
All text in this item are character data; no binary information is used.Byte | Size | Field | Value | Purpose |
0-9 | 10 | Translator | Any text value the compiler/assembler writer wishes to use to identify itself | |
10-11 | 2 | Version | 00 to 99 | Version number of the assembler or compiler |
12-13 | 2 | Release | 00 to 99 | Release number of above version |
14=20 | 7 | Compile_Date | YYYYDDD | Year and day of year the program was compiled or assembled. |
21-29 | 9 | Compile_Time | HHMMSSTTT | Hour, minute, second and thousandth of second that the program was compiled or assembled |
RLD
RLD records allow a module to show where it references an address that must be relocated, such as references to specific locations in itself, or to external modules.Relocation Data
If R_Pointer is omitted this field starts 4 bytes lower, in bytes 8-11.If R_Pointer or P_Pointer is omitted, this field starts 4 bytes lower. If both fields are omitted, this field starts 8 bytes lower.
If R_Pointer, P_Pointer, or Offset are omitted, this field starts 4 bytes lower. If any two of them are omitted, this field starts 8 bytes lower. If all of them are omitted, this field starts 12 bytes lower.
To clarify, if a module in a C program named "Basura" was to issue a call to the "exit" function to terminate itself, the R_Pointer address would be the ESDID of the routine "exit" while the P_Pointer would be the ESDID of "Basura". If the address was in the same module R_Pointer and P_Pointer would be the same.
Flags
LEN
LEN records are used to declare the length of a module where it was not known at the time the ESD record was created, e.g. for one-pass compilers.Field | Offset | Size | Description |
PTV | 0-2 | 3 | Record Type X'033000' |
3-5 | 3 | Reserved | |
Length | 6-7 | 2 | Length of items following this field; value must be non-zero |
Elements | 8- | Element length data; see Elements table below | |
REM | Trailing data to end of record for fixed-length records, must contain binary zeroes; not present for variable-length records. |
Elements
A deferred-length element entry cannot be continued or splitField | Offset | Size | Description |
ESDID | 0-3 | 4 | ESDID of element this value applies to |
4-7 | 4 | Reserved | |
Length | 8-11 | 4 | Length of the item referenced |
END
END must be the last record for a module. An 'Entry Point' is used when an address other than the beginning of the module is to be used as the start point for its execution. This is used either because the program has non-executable data appearing before the start of the module, or because the module calls an external module first, such as a run-time library to initialize itself.Field | Offset | Size | Bits | Description |
PTV | 0-2 | 3 | X'034000' - Not-continued | |
PTV | 0-2 | 3 | X'034100' - Continued on next record | |
3 | 0-5 | 6 | Reserved | |
Flags | 3 | 6-7 | 2 | Declarations regarding the presence or absence of an entry point |
Flags | 3 | 6-7 | 2 | 00 - No entry point given; all other values in this record are invalid |
Flags | 3 | 6-7 | 2 | 01 - Entry point specified by ESDID |
Flags | 3 | 6-7 | 2 | 10 - Entry point specified by name |
Flags | 3 | 6-7 | 2 | 11 - Reserved |
AMODE | 4 | 1 | Addressing Mode value of entry point; the values are as specified in field 0 of the Behavior Attributes table in the ESD record. | |
5-7 | 3 | Reserved | ||
Record Count | 8-11 | 4 | Number of GOFF records in this module | |
ESDID | 12-15 | 4 | Value of ESDID if entry point is referenced by ESDID; binary zero if referenced by name | |
16-19 | 4 | Reserved | ||
Offset | 20-23 | 4 | Address offset of module entry point; this cannot be specified for an external entry point | |
Name Length | 24-25 | 2 | Length of name, this must be zero if entry point was specified by ESDID. | |
Name | 26- | The name of the external symbol used as the entry point for this module; is binary zeros if entry point was specified by ESDID; if this record is continued this is the initial 54 characters of the name. This is the only non-binary value in the record; it would be a text field representing the public name for the entry point | ||
REM | Trailer extending to the end of the record; should be binary zeros to end of record for fixed-length records; omitted for variable-length |