Printf format string
printf format string refers to a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied literally into the function's output, but format specifiers, which start with a % character, indicate the location and method to translate a piece of data to characters.
"printf" is the name of one of the main C output functions, and stands for "print formatted". printf format strings are complementary to scanf format strings, which provide formatted input. In both cases these provide simple functionality and fixed format compared to more sophisticated and flexible template engines or parsers, but are sufficient for many purposes.
Many languages other than C copy the printf format string syntax closely or exactly in their own I/O functions.
Mismatches between the format specifiers and type of the data can cause crashes and other vulnerabilities. The format string itself is very often a string literal, which allows static analysis of the function call. However, it can also be the value of a variable, which allows for dynamic formatting but also a security vulnerability known as an uncontrolled format string exploit.
History
Early programming languages such as Fortran used special statements with completely different syntax from other calculations to build formatting descriptions:WRITE OUTPUT TAPE 6, 601, IA, IB, IC, AREA
601 FORMAT
ALGOL 68 had more function-like API, but still used special syntax :
printf);
But using the normal function calls and data types simplifies the language and compiler, and allows the implementation of the input/output to be written in the same language. These advantages outweigh the disadvantages and in most newer languages I/O is not part of the syntax.
C's
printf
has its origins in BCPL's writef
function. In comparison to printf
, *N is a newline and the order of field width and type are swapped in writef
:WRITEF
Probably the first copying of the syntax to outside the C language was the Unix printf shell command, which first appeared in Version 4, as part of the porting to C.
Format placeholder specification
Formatting takes place via placeholders within the format string. For example, if a program wanted to print out a person's age, it could present the output by prefixing it with "Your age is ", and using the signed decimal specifier character to denote that we want the integer for the age to be shown immediately after that message, we may use the format string:printf;
Syntax
The syntax for a format placeholder isParameter field
This is a POSIX extension and not in C99. The Parameter field can be omitted or can be:This feature mainly sees its use in localization, where the order of occurrence of parameters vary due to the language-dependent convention.
Flags field
The Flags field can be zero or more of:Width field
The Width field specifies a minimum number of characters to output, and is typically used to pad fixed-width fields in tabulated output, where the fields would otherwise be smaller, although it does not cause truncation of oversized fields.The width field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk
*
. For example, printf
will result in 10
being printed, with a total width of 5 characters.Though not part of the width field, a leading zero is interpreted as the zero-padding flag mentioned above, and a negative value is treated as the positive value in conjunction with the left-alignment
-
flag also mentioned above.Precision field
The Precision field usually specifies a maximum limit on the output, depending on the particular formatting type. For floating point numeric types, it specifies the number of digits to the right of the decimal point that the output should be rounded. For the string type, it limits the number of characters that should be output, after which the string is truncated.The precision field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk
*
. For example, printf
will result in abc
being printed.Length field
The Length field can be omitted or be any of:For floating point types, this has no effect.
ll
printf
to expect a long long
-sized integer argument.L
printf
to expect a long double
argument.z
printf
to expect a size_t
-sized integer argument.j
printf
to expect a intmax_t
-sized integer argument.t
printf
to expect a ptrdiff_t
-sized integer argument.Additionally, several platform-specific length options came to exist prior to widespread use of the ISO C99 extensions:
ISO C99 includes the
inttypes.h
header file that includes a number of macros for use in platform-independent printf
coding. These must be outside double-quotes, e.g. printf;
Example macros include:
Type field
The Type field can be any of:Custom format placeholders
There are a few implementations ofprintf
-like functions that allow extensions to the escape-character-based mini-language, thus allowing the programmer to have a specific formatting function for non-builtin types. One of the most well-known is the glibc's
. However, it is rarely used due to the fact that it conflicts with static format string checking. Another is , which allows adding multi-character format names.Some applications include their own
printf
-like function, and embed extensions into it. However these all tend to have the same problems that register_printf_function
has.The Linux kernel
printk
function supports a number of ways to display kernel structures using the generic %p
specification, by appending additional format characters. For example, %pI4
prints an IPv4 address in dotted-decimal form. This allows static format string checking at the expense of full compatibility with normal printf.Most non-C languages that have a
printf
-like function work around the lack of this feature by just using the %s
format and converting the object to a string representation. C++ offers a notable exception, in that it has a printf
function inherited from its C history, but also has a completely different mechanism that is preferred.Vulnerabilities
Invalid conversion specifications
If the syntax of a conversion specification is invalid, behavior is undefined, and can cause program termination. If there are too few function arguments provided to supply values for all the conversion specifications in the template string, or if the arguments are not of the correct types, the results are also undefined. Excess arguments are ignored. In a number of cases, the undefined behavior has led to "Format string attack" security vulnerabilities.Some compilers, like the GNU Compiler Collection, will statically check the format strings of printf-like functions and warn about problems. GCC will also warn about user-defined printf-style functions if the non-standard "format"
__attribute__
is applied to the function.Field width versus explicit delimiters in tabular output
Using only field widths to provide for tabulation, as with a format like%8d%8d%8d
for three integers in three 8-character columns, will not guarantee that field separation will be retained if large numbers occur in the data. Loss of field separation can easily lead to corrupt output. In systems which encourage the use of programs as building blocks in scripts, such corrupt data can often be forwarded into and corrupt further processing, regardless of whether the original programmer expected the output would only be read by human eyes. Such problems can be eliminated by including explicit delimiters, even spaces, in all tabular output formats. Simply changing the dangerous example from before to %7d %7d %7d
addresses this, formatting identically until numbers become larger, but then explicitly preventing them from becoming merged on output due to the explicitly included spaces. Similar strategies apply to string data.Programming languages with printf
Languages that use format strings that deviate from the style in this article, languages that inherit their implementation from the JVM or other environment, and languages that do not have a standard native printf implementation but have external libraries which emulate printf behavior are not included in this list.- awk
- C
- *C++
- *Objective-C
- D
- F#
- G
- GNU MathProg
- GNU Octave
- Go
- Haskell
- J
- Java and JVM languages
- Lua
- Maple
- MATLAB
- Max
- Mythryl
- PARI/GP
- Perl
- PHP
- Python
- R
- Raku
- Red/System
- Ruby
- Tcl
- Transact-SQL
- Vala and
FileStream.printf
) - The printf utility command, sometimes built in the shell like some implementations of the Korn shell, Bourne again shell, or Z shell. These commands usually interpret C escapes in the format string.