Program database


Program database is a proprietary file format for storing debugging information about a program. PDB files commonly have a.pdb extension. A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. This symbol information is not stored in the module itself, because it takes up a lot of space.

Applications

When a program is debugged, the debugger loads debugging information from the PDB file and uses it to locate symbols or relate current execution state of a program source code. Microsoft Visual Studio uses PDB files as its primary file format for debugging information.
Another use of PDB files is in services that collect crash data from users and relate it to the specific parts of the source code that cause the crash.
Microsoft compilers will, under appropriate options, store information in a single PDB about types found in the compiled sources. Debug information specific to each source is stored in the compiled object file, and contains references to types in the PDB. Each compilation will add to the PDB any types that are not already found there, so that references in already compiled object files remain valid.
The Microsoft linker, under appropriate options, builds a complete new PDB which combines the debug information found in its input modules, the types referenced by those modules, and other information generated by the linker. If the link is performed incrementally, an existing PDB is modified by adding replacing only the information pertaining to added or replaced modules, and adding any new types not already in the PDB.
PDB files are usually removed from the programs' distribution package. They are used by developers during debugging to save time and gain insight.

Extracting information

The PDB format is documented , information can be extracted from a PDB file using the DIA interfaces, available on Microsoft Windows. There are also third-party tools that can also extract information from PDB such as radare2 and

Multiple stream format

The PDB is a single file which is logically composed of several sub-files, called streams. It is designed to optimize the process of making changes to the PDB, as performed by compiles and incremental links. Streams can be removed, added, or replaced without rewriting any other streams, and the changes to the metadata which describes the streams is minimized as well.
The PDB is organized in fixed-size pages, typically 1K, 2K, or 4K, numbered consecutively starting at 0.
Note: It is presumed that all numeric information is stored in little-endian form, the native form for Intel x86 based processors. The pdbparse Python code makes this assumption.

Stream

Each stream in the PDB occupies several pages, which aren't necessarily consecutively numbered. The stream has a number and a length. The stream content is the concatenation of its pages, truncated to the stream's length.

Metadata format

The function of the PDB metadata is to identify all of the component streams, giving the length, and sequence of pages for each stream. Streams are numbered consecutively starting with 0. There is also a root stream, unnumbered, which contains some of the metadata.

Header

The PDB begins with a header, consisting of:
The header may be longer than a single page.
Microsoft tools use two PDB formats:
Signature is "Microsoft C/C++ MSF 7.00\r\n\x1ADS\0\0\0".
Remainder of the header consists of:
The root stream describes all of the PDB streams starting with stream 0. Its contents vary with the PDB format version.
Version 2
The root stream consists of:
The root stream consists of:
Microsoft tools store different sorts of information in different numbered streams. Some stream numbers have a fixed information type associated with them, and other streams are identified in the aforementioned fixed type streams.
Stream 1 is used to verify that the PDB is the same file referred to in an executable or object file stream.
Stream 2 and stream 4 hold types information. Actual type records define types used in the program. The structure of these records can be found in the file cvinfo.h provided by Microsoft. There are two flavors of records, each with its own set of index numbers: type IDs and types; only types are stored in stream 2 and only type IDs are stored in stream 4. The indices are used to refer to these records from within symbol records and other type records.
Stream 3 is a directory for other streams. Note, it is not present in Version 2, nor in a PDB produced by a compiler. The stream starts with a header which is padded to be 64 bytes in total
OffsetSizeNameDescription
04SignatureHeader identifier, 0xFFFFFFFF
44HeaderVersionVersion of the Header
84Age
122snGSSyms
142usVerAll
union ;
162snPSSyms
182usVerPdbDllBuildbuild version of the pdb dll that built this pdb last
202snSymRecs
222VerPdbDllRBldrbld version of the pdb dll that built this pdb last
244cbGpModisize of rgmodi substream
284cbSCsize of Section Contribution substream
324cbSecMapsize of section map
364cbFileInfosize of file info stream
404cbTSMapsize of the Type Server Map substream
444iMFCMFC Index
484cbDbgHdrsize of optional DbgHdr info appended to the end of the stream
524cbECInfonumber of bytes in EC substream, or 0 if no EC enabled Mods
562flags
struct _flags flags;
582wMachineMachine identifier, same as used in COFF object format, e.g., hex 8664 for Intel x86 64-bit
604RESERVEDfuture expansion, pad to 64 bytes