What is u32string in modern C++? How can I use u32string in C++ software? Is std::u32string same as std::string? Why do I have an error when I define a std::u32string? Which literal should I use with the std::u32string? Let’s answer these questions in this post.
Table of Contents
What is u32string?
The u32string (std::u32string or std::pmr::u32string) are the string class data types for the 32bit characters defined in the std and std::pmr namespaces. It is a string class for 32bit characters.
This is an instantiation of the basic_string class template that uses char16_t as the character type, with its default char_traits ad allocator types. In example, the std:string uses one byte (8 bits) while the std::u32string uses four bytes (32bits) per each character of the text string. In basic string definition std::u32string is defined as std::basic_string<char32_t>. Type definition can be shown as below,
1 2 3 |
typedef basic_string<char32_t> u32string; |
Note that, several string types for common character types are provided by basic string definitions as below,
String Type | Basic String Definition | Standard |
std::string | std::basic_string<char> | |
std::wstring | std::basic_string<wchar_t> | |
std::u8string | std::basic_string<char8_t> | (C++20) |
std::u16string | std::basic_string<char16_t> | (C++11) |
std::u32string | std::basic_string<char32_t> | (C++11) |
A simple example of using std::u32string in modern C++ software
Here is a simple example to use u32string,
1 2 3 4 5 |
std::u32string str4 = U"This is a String"; std::pmr::u32string pstr4 = U"This is a String"; |
as you see different string data types requires different ‘L’,’u’ and ‘U’ literals.
L, u and U are String Literals here, represents the type of characters of string. These might be default in you editor and or compiler options that means you don’t need to add if you know the default. A string literal is a sequence of characters surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in “…”, R”(…)”, u8″…”, u8R”(…)“, u”…”, uR”˜(…)˜”, U”…”, UR”zzz(…)zzz”, L”…”, or LR”(…)”, respectively. Please see String Literals section in this document Working Draft, Standard for Programming Language C++. Here below we sum some of these standards used in C++.
Examples to String Literals for Strings Definitions
- str=”abcd”; default string based on compiler/IDE options.
- str=u8″abcd”; a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8, including the null terminator
- str=u”abcd”; a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, including the null terminator
- str=U”abcd”; a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, including the null terminator
- str=L”abcd”; a wide string literal. A wide string literal has type “array of n const wchar_t”, including the null terminator
- str=R”abcd”; raw strings
What is difference between L”” and U”” and u”” literals in C++
- L is based on wide string literal depends on array of n const wchar_t in your compiler/IDE options. Generally it is UTF-8 or UTF-16 format
- u is for UTF-16 format,
- U is for UTF32 formats
Here is a full C++ software example of using std::u16string
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#include <iostream> int main() { std::string str1 = "This is a String"; std::wstring str2 = L"This is a String"; std::u16string str3 = u"This is a String"; std::u32string str4 = U"This is a String"; std::pmr::string pstr1 = "This is a String"; std::pmr::wstring pstr2 = L"This is a String"; std::pmr::u16string pstr3 = u"This is a String"; std::pmr::u32string pstr4 = U"This is a String"; return 0; } |
In C++ Builder, use the awesome UnicodeString
In the latest versions of C++ Builder (10 and above), Strings are Unicode Strings. Unicode strings are easy to use in world-wise languages with many methods. Unicode standard for UnicodeString provides a unique number for every character (8, 16 or 32 bits) more than ASCII (8 bits) characters. UnicodeStrings are being used widely because of support to languages worldwide and emojis. In modern C++ nowadays there are two types of strings used; the array of chars (char strings) and UnicodeStrings (WideStrings and AnsiStrings are older, not compatible with all features now). CLANG / C++ Builder / GNU C / VC++ compilers, IDEs are using this standard for GUI forms to support all languages to provide applications in global. More information about the structure of Unicode Strings can be found here . RAD Studio, Delphi & C++ Builder uses Unicode-based strings: that is, the type String is a Unicode string (System.UnicodeString) instead of an ANSI string. If you want to transform your codes to Unicode strings we recommend you this article.
C++ Builder is the easiest and fastest C and C++ IDE for building simple or professional applications on the Windows, MacOS, iOS & Android operating systems. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for cross-platform UIs.
There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.
Design. Code. Compile. Deploy.
Start Free Trial
Free C++Builder Community Edition