Site icon Learn C++

How To Convert Char Array String To Unicode String Correctly In C++?

How To Convert Char Array String To Unicode String Correctly In C++

I think one of the biggest problems in C++ today is there’s no perfect library to do conversions between strings in a global locale distribution. There are most problems when displaying or converting some characters in most languages. Moreover, there are more conversion problems between UTF8, UTF16, and UTF32 strings. There are solutions but I found them not modern and simple as in today’s programming world. The UnicodeString is one of the most powerful string formats in use roday. A few weeks ago, I saw that char* arrays of a struct had some conversion problems when I read them from a file and displayed them in a TMemo box. In this post, I want to give you an example of how you can convert this kind of char* array to a Unicode String cxorrectly.

What is a string? a basic_string? and UnicodeString?

The basic_string (std::basic_string and std::pmr::basic_string) is a class template that stores and manipulates sequences of alpha numeric string objects (char, w_char,…). A basic string can be used to define string, wstring, u8string, u16string and u32string data types.

The UnicodeString string type is a default String type of RAD Studio, C++ Builder, Delphi that is in UTF-16 format that means characters in UTF-16 may be 2 or 4 bytes. In C++ Builder and Delphi; Char and PChar types are now WideChar and PWideChar, respectively. There is a good article about Unicode in RadStudio. And here is a good post about Basic String and Unicode String.

How to convert char array string to UnicodeString correctly in C++?

Assume that we have a struct with some char arrays, such as an example header as below,

[crayon-67995260ef79d543545346/]

When we read this from a file, you should obtain char array properties correctly, such as chunk_ID, format, etc.They have nul terminator, so when we obtain and display these types we may have wrong outputs. To avoid this, there are 3 different solutions.

1. How to convert char string array to UnicodeString in a single line in C++?

In C++ Builder, UnicodeString type is really awesome in all the ways when you want to use strings. We can convert char string array to Unicode string in a single line in 3 different ways.

First, we can use this syntax to convert a char array in structs, or a char* array. (Thank You Remy Lebeau, Embarcadero MVP)

[crayon-67995260ef7a7558138279/]

We can use this as below,

[crayon-67995260ef7a9554837293/]

or we can define as below,

[crayon-67995260ef7ab567040245/]

then we can display it in our Memo component as below,

[crayon-67995260ef7ac040733504/]

2. How can we use printf method of UnicodeString to convert char string array in a single line in C++?

Second, we can use printf() method of UnicodeString. Here we should use “%.4hs” format specifier as below,

[crayon-67995260ef7b7315063914/]

Here above, the .printf() method of System::UnicodeString takes a wide format string, and we are passing wav.format which is narrow string. When we are going to use wide printf with narrow inputs, then we should use “%hs" format specifier. The h tells printf that we are using narrow data in a context that expects wide. Likewise, we would use %ls when you are sending wide data to a version of printf that expects %s to mean narrow. ( Thank you Bruneau Babet, Embarcadero Developer)

3. How to convert char string array to UnicodeString using with std::string in C++?

Third, If you want to do this using with std::string, you can write as below,

[crayon-67995260ef7b9460245450/]

You can do same line step by step. First you should convert this format to std::string as below, note that that has 4 bytes size,

[crayon-67995260ef7bd330205850/]

Now you can convert it to const char * as below,

[crayon-67995260ef7bf274657930/]

Finally, you can safely convert char* to UnicodeString as below,

[crayon-67995260ef7c1268427713/]

Why other methods are not correct?

Assume that we read a wave file info in a struct, and we try to display some of the members of this struct in a Memo component. Let’s do this in 4 different ways. Compiler will compile all these lines below correctly but the outputs will be different.

[crayon-67995260ef7c3316450182/]

Normally output of wav.format should be “WAVE“, here are the outputs what will they look like,

[crayon-67995260ef7c7530190691/]

From these outputs above,

Is there an example to convert char string array to UnicodeString correctly in C++?

Here is a full example about to convert char string array to UnicodeString in C++ Builder,

[crayon-67995260ef7c9962340042/]

C++ Builder is the easiest and fastest C and C++ compiler and IDE for building simple or professional applications on the Windows operating system. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for UIs.

There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.

Exit mobile version