C++ Character Types
The standard char variable in C++ is a scant 1 byte wide and can handle only 255 different characters. This is plenty enough for European languages but not big enough to handle symbol-based languages such as kanji.
Several standards have arisen to extend the character set to handle the demands of these languages. UTF-8 uses a mixture of 8-, 16-, and 32-bit characters to implement almost every kanji or hieroglyph you can think of but still remain compatible with simple 8-bit ASCII. UTF-16 uses a mixture of 16- and 32-bit characters to achieve an expanded character set, and UTF-32 uses 32 bits for all characters.
UTF stands for Unicode Transformation Format, from which it gets the common nickname Unicode.
The table describes the different character types supported by C++. At first, C++ tried to get by with a vaguely defined wide character type, wchar_t. This type was intended to be the wide character type native to the application program’s environment. C++ ‘11 introduced specific types for UTF-16 and UTF-32.
|Variable||Example||What It Is|
|char||‘c’||ASCII or UTF-8 characters||wchar_t||L’c’||Character in wide format||char_16t||u’c’||UTF-16 character||char_32t||U’c’||UTF-32 character|
UTF-16 is the standard encoding for Windows applications. The wchar_t type refers to UTF-16 in the Code::Blocks/gcc compiler.
Any of the character types in the table can be combined into strings as well:
wchar_t* wideString = L"this is a wide string";