At its simplest, the Compound File Binary Format is a container, with little restriction on what can be stored within it. A CFBF file structure loosely resembles a FAT filesystem. The file is partitioned into Sectors which are chained together with a File Allocation Table which contains chains of sectors related to each file, a Directory holds information for contained files with a Sector ID for the starting sector of a chain and so on.
Structure
The CFBF file consists of a 512-Byte header record followed by a number of sectors whose size is defined in the header. The literature defines Sectors to be either 512 or 4096 bytes in length, although the format is potentially capable of supporting sectors ranging in size from 128-Bytes upwards in powers of 2. The lower limit of 128 is the minimum required to fit a single directory entry in a Directory Sector. There are several types of sector that may be present in a CFBF:
File Allocation Table Sector – contains chains of sector indices much as a FAT does in the FAT/FAT32 filesystems
MiniFAT Sectors – similar to the FAT but storing chains of mini-sectors within the Mini-Stream
Double-Indirect FAT Sector – contains chains of FAT sector indices
Directory Sector – contains directory entries
Stream Sector – contains arbitrary file data
Range Lock Sector – contains the byte-range locking area of a large file
More detail is given below for the header and each sector type.
CFBF Header format
The CFBF Header occupies the first 512 bytes of the file and information required to interpret the rest of the file. The C-Style structure declaration below shows the members of the CFBF header and their purpose: typedef unsigned long ULONG; // 4 Bytes typedef unsigned short USHORT; // 2 Bytes typedef short OFFSET; // 2 Bytes typedef ULONG SECT; // 4 Bytes typedef ULONG FSINDEX; // 4 Bytes typedef USHORT FSOFFSET; // 2 Bytes typedef USHORT WCHAR; // 2 Bytes typedef ULONG DFSIGNATURE; // 4 Bytes typedef unsigned char BYTE; // 1 Byte typedef unsigned short WORD; // 2 Bytes typedef unsigned long DWORD; // 4 Bytes typedef ULONG SID; // 4 Bytes typedef GUID CLSID; // 16 Bytes struct StructuredStorageHeader ;
File Allocation Table (FAT) Sectors
When taken together as a single stream the collection of FAT sectors define the status and linkage of every sector in the file. Each entry in the FAT is 4 bytes in length and contains the sector number of the next sector in a FAT chain or one of the following special values:
FREESECT – denotes an unused sector
ENDOFCHAIN – marks the last sector in a FAT chain
FATSECT – marks a sector used to store part of the FAT
DIFSECT – marks a sector used to store part of the DIFAT
Range Lock Sector
The Range Lock Sector must exist in files greater than 2GB in size, and must not exist in files smaller than 2GB. The Range Lock Sector must contain the byte range 0x7FFFFF00 to 0x7FFFFFFF in the file. This area is reserved by Microsoft's COM implementation for storing byte-range locking information for concurrent access.
Glossary
FAT – File Allocation Table, also known as: SAT – Sector Allocation Table
DIFAT – Double-Indirect File Allocation Table
FAT Chain – a group of FAT entries which indicate the sectors allocated to a Stream in the file
Stream – a virtual file which occupies a number of sectors within the CFBF
Sector – the unit of allocation within the CFBF, usually 512 or 4096 Bytes in length