C character classification
C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.
History
Early C-language programmers working on the Unix operating system developed programming idioms for classifying characters into different types. For example, for the ASCII character set, the following expression identifies a letter, when its value is true:||
As this may be expressed in multiple formulations, it became desirable to introduce short, standardized forms of such tests that were placed in the system-wide header file ctype.h.
Implementation
Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written as
#define isdigit
Early versions of Linux used a potentially faulty method similar to the first code sample:
#define isdigit >= '0' &&
This can cause problems if the variable x has a side effect. For example, if one calls isdigit or isdigit). It is not immediately evident that the argument to isdigit is evaluated twice. For this reason, the table-based approach is generally used.
Overview of functions
The functions that operate on single-byte characters are defined in ctype.h header file.The functions that operate on wide characters are defined in wctype.h header file.
The classification is evaluated according to the effective locale.
Byte character | Wide character | Description |
|
| checks whether the operand is alphanumeric |
|
| checks whether the operand is alphabetic |
|
| checks whether the operand is lowercase |
|
| checks whether the operand is an uppercase |
|
| checks whether the operand is a digit |
|
| checks whether the operand is hexadecimal |
|
| checks whether the operand is a control character |
|
| checks whether the operand is a graphical character |
|
| checks whether the operand is space |
|
| checks whether the operand is a blank space character |
|
| checks whether the operand is a printable character |
|
| checks whether the operand is punctuation |
|
| converts the operand to lowercase |
|
| converts the operand to uppercase |
| checks whether the operand falls into specific class | |
| converts the operand using a specific mapping | |
| returns a wide character class to be used with iswctype | |
| returns a transformation mapping to be used with towctrans |