While C++ is one of the most modern programming languages in the world, in 2024, it still has problems with strings and conversions. One of the problems is new std::basic_string
types, it is hard to display and convert them. For example, there are no perfect conversion methods between basic_strings
, especially, from higher character type strings to lower character type strings, such as conversion from std::u32string
to a std::wstring
. In this post, we will explain these string types and how you can convert from a u32string
to a wstring
.
Table of Contents
What is a wstring (std::wstring) in C++ ?
Wide strings are the string class for ‘wide’ characters represented by wstring
. Alphanumeric characters are stored and displayed in string form. In other words wstring
stores alphanumeric text with 2 or 4 byte chars. Wide strings are the instantiation of the basic_string
class template that uses wchar_t
as the character type. The type definition of a wstring
is as given below:
1 2 3 |
typedef std::basic_string<wchar_t> std::wstring; |
What is u32string (std::u32string)?
The u32string
(std::u32string
or std::pmr::u32string
) is the string class data type for the 32-bit characters defined in the std
and std::pmr
namespaces. It is a string class for 32-bit characters.
This instantiates the basic_string
class template that uses char16_t
as the character type, with its default char_traits
and allocator types. For example, the std::string
uses one byte (8 bits) while the std::u32string
uses four bytes (32 bits) per each character of the text string. In basic string definition, std::u32string
is defined as std::basic_string<char32_t>
. Type definition can be shown as below,
1 2 3 |
typedef basic_string<char32_t> u32string; |
Note that we released about conversion from u16string
to wstring
as discussed below.
Now, let’s see how we can convert u32string
to a wstring
.
Is there a simple example of how to use u32string and wstring in C++?
Here is a simple example of how to use std::u32string
and std::wstring
in C++.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#include <iostream> int main() { std::wstring str2 = L"This is a wstring"; std::u32string str4 = U"This is a u32string"; std::pmr::wstring pstr2 = L"This is a wstring"; std::pmr::u32string pstr4 = U"This is a u32string"; return 0; } |
How can we convert u32string to a wstring in C++?
First of all, we should say that we do not recommend converting the higher-byte string to a lower-byte string. Let’s assume that a developer wants to extract ASCII characters from u32string
, and they don’t care about other character losses. In this situation, maybe this developer needs to convert u32string
to wstring
or later to a string or to a char array.
Simply we can convert u32string
to a wstring
as below ( Thank you Remy Lebeau ),
1 2 3 4 5 6 |
std::u32string str32 = U"This is a u32string"; std::wstring wstr = std::wstring( str32.begin(), str32.end() ); std::wcout << "wstring: " << wstr << std::endl; |
Or, we can create our u32string_to_wstring()
method as described below.
1 2 3 4 5 6 |
std::wstring u32string_to_wstring (const std::u32string& str) { return std::wstring(str.begin(), str.end()); } |
We can use this conversion function like so:
1 2 3 4 5 6 |
std::u32string str32 = U"This is a u32string"; std::wstring wstr = u32string_to_wstring(str32); std::wcout << "wstring: " << wstr << std::endl; |
the output for both options will be:
1 2 3 |
wstring: This is a u32string |
How can we correctly convert u32string to a wstring in C++?
There is another way to convert them by using std::wstring_convert
and std::codecvt_utf16
templates. We can use the std::wstring_convert
class template to convert a u16string
to a wstring
. The wstring_convert
class template converts byte strings to wide strings using an individual code conversion tool, Codecvt These standard tools are suitable for use with the std::wstring_convert
. We can use these with:
- std::codecvt_utf8 for the UTF-8/UCS2 and UTF-8/UCS4 conversions
- std::codecvt_utf8_utf16 for the UTF-8/UTF-16 conversions.
Note that codecvt_utf16
and wstring_convert
are deprecated in C++17
and removed in C++26
and there is no better solution for Windows since they are not removed in most C++ compilers which means we can use them before C++26.
Syntax of std::wstring_convert
class template (deprecated in C++17),
1 2 3 4 5 6 7 |
template< class Codecvt, class Elem = wchar_t, class Wide_alloc = std::allocator<Elem>, class Byte_alloc = std::allocator<char> > class wstring_convert; |
Syntax of std::codecvt_utf8
class template (deprecated in C++17):
1 2 3 4 5 6 |
template< class Elem, unsigned long Maxcode = 0x10ffff, std::codecvt_mode Mode = (std::codecvt_mode)0 > class codecvt_utf16 : public std::codecvt<Elem, char, std::mbstate_t>; |
In other way, we can use the following code (thank you Remy Lebeau ).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
std::wstring u32string_to_wstring (const std::u32string& str) { #ifdef MSWINDOWS std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv; std::string bytes = conv.to_bytes(str); return std::wstring(reinterpret_cast<const wchar_t*>(bytes.c_str()), bytes.length()/sizeof(wchar_t)); #else return std::wstring(str.begin(), str.end()); #endif } |
In my tests, std::wstring(str.begin(), str.end())
runs well on Windows with bcc32, bcc64, and with the latest Windows Modern 64-bit bcc64x compilers. But #ifdef MSWINDOWS ... #else
part is recommended for some better conversions on Windows. Please check which one is suitable for your compiler.
How can we convert wstring to u32string in C++?
If you are looking conversion from wstring to u32string, here is how we can assign it in two different ways,
1 2 3 4 5 6 7 8 |
std::wstring wstr = L"This is a wstring"; std::u32string str32; str32.assign( wstr.begin(), wstr.end()); // or str32 = std::u32string( wstr.begin(), wstr.end() ); |
Is there a full example of how to convert u32string to a wstring in C++?
Here is a full example that converts a u32string
to a wstring
,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
#include <iostream> #include <string> #include <locale> #include <codecvt> #include <algorithm> std::wstring u32string_to_wstring (const std::u32string& str) { #ifdef MSWINDOWS std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv; std::string bytes = conv.to_bytes(str); return std::wstring(reinterpret_cast<const wchar_t*>(bytes.c_str()), bytes.length()/sizeof(wchar_t)); #else return std::wstring(str.begin(), str.end()); #endif } int main() { std::u32string str32 = U"This is a u32string"; //Converting u32string to wstring std::wcout << L"Converting u32string to wstring ..." << std::endl; std::wstring wstr = std::wstring(str32.begin(), str32.end()); std::wcout << L"wstring: " << wstr << std::endl; // or we can print directly as a wstring std::wcout << L"wstring: " << std::wstring(str32.begin(), str32.end()) << std::endl; // or we can use our function as below std::wstring ws = u32string_to_wstring(str32); std::wcout << L"wstring: " << ws << std::endl; // Listing all characters for(int i=0; i<wstr.length(); ++i) { std::wcout << wstr[i] << " - " << (int)wstr[i] << std::endl; } // Converting wstring to u32string std::wcout << L"Converting wstring to u32string ..." << std::endl; str32.assign( wstr.begin(), wstr.end()); std::wcout << L"wstring: " << std::wstring(str32.begin(), str32.end()) << std::endl; // or str32 = std::u32string( wstr.begin(), wstr.end() ); std::wcout << L"wstring: " << std::wstring(str32.begin(), str32.end()) << std::endl; system("pause"); return 0; } |
Here is the output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
Converting u32string to wstring ... wstring: This is a u32string wstring: This is a u32string wstring: This is a u32string T - 84 h - 104 i - 105 s - 115 - 32 i - 105 s - 115 - 32 a - 97 - 32 u - 117 3 - 51 2 - 50 s - 115 t - 116 r - 114 i - 105 n - 110 g - 103 Converting wstring to u32string ... wstring: This is a u32string wstring: This is a u32string |
C++ Builder is the easiest and fastest C and C++ IDE for building simple or professional applications on the Windows, MacOS, iOS & Android operating systems. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for cross-platform UIs.
There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.