C++C++17Code SnippetLearn C++

How To Convert Char Array String To Unicode String Correctly In C++?

How To Convert Char Array String To Unicode String Correctly In C++

I think one of the biggest problems in C++ today is there’s no perfect library to do conversions between strings in a global locale distribution. There are most problems when displaying or converting some characters in most languages. Moreover, there are more conversion problems between UTF8, UTF16, and UTF32 strings. There are solutions but I found them not modern and simple as in today’s programming world. The UnicodeString is one of the most powerful string formats in use roday. A few weeks ago, I saw that char* arrays of a struct had some conversion problems when I read them from a file and displayed them in a TMemo box. In this post, I want to give you an example of how you can convert this kind of char* array to a Unicode String cxorrectly.

What is a string? a basic_string? and UnicodeString?

The basic_string (std::basic_string and std::pmr::basic_string) is a class template that stores and manipulates sequences of alpha numeric string objects (char, w_char,…). A basic string can be used to define string, wstring, u8string, u16string and u32string data types.

The UnicodeString string type is a default String type of RAD Studio, C++ Builder, Delphi that is in UTF-16 format that means characters in UTF-16 may be 2 or 4 bytes. In C++ Builder and Delphi; Char and PChar types are now WideChar and PWideChar, respectively. There is a good article about Unicode in RadStudio. And here is a good post about Basic String and Unicode String.

How to convert char array string to UnicodeString correctly in C++?

Assume that we have a struct with some char arrays, such as an example header as below,

When we read this from a file, you should obtain char array properties correctly, such as chunk_ID, format, etc.They have nul terminator, so when we obtain and display these types we may have wrong outputs. To avoid this, there are 3 different solutions.

1. How to convert char string array to UnicodeString in a single line in C++?

In C++ Builder, UnicodeString type is really awesome in all the ways when you want to use strings. We can convert char string array to Unicode string in a single line in 3 different ways.

First, we can use this syntax to convert a char array in structs, or a char* array. (Thank You Remy Lebeau, Embarcadero MVP)

We can use this as below,

or we can define as below,

then we can display it in our Memo component as below,

2. How can we use printf method of UnicodeString to convert char string array in a single line in C++?

Second, we can use printf() method of UnicodeString. Here we should use “%.4hs” format specifier as below,

Here above, the .printf() method of System::UnicodeString takes a wide format string, and we are passing wav.format which is narrow string. When we are going to use wide printf with narrow inputs, then we should use “%hs" format specifier. The h tells printf that we are using narrow data in a context that expects wide. Likewise, we would use %ls when you are sending wide data to a version of printf that expects %s to mean narrow. ( Thank you Bruneau Babet, Embarcadero Developer)

3. How to convert char string array to UnicodeString using with std::string in C++?

Third, If you want to do this using with std::string, you can write as below,

You can do same line step by step. First you should convert this format to std::string as below, note that that has 4 bytes size,

Now you can convert it to const char * as below,

Finally, you can safely convert char* to UnicodeString as below,

Why other methods are not correct?

Assume that we read a wave file info in a struct, and we try to display some of the members of this struct in a Memo component. Let’s do this in 4 different ways. Compiler will compile all these lines below correctly but the outputs will be different.

Normally output of wav.format should be “WAVE“, here are the outputs what will they look like,

From these outputs above,

  • As you see only we can obtain data and display it correctly as in (c), (f) and (g) lines.
  • (a) and (b) fails because, printf member of System::UnicodeString takes a wide format string, while it expects %s to also be wide we are using narrow wav.format.
  • (d) and (e) fails because of the no nul terminate the input in format. So the logic picks the next property fmt that happens to follow the WAVE. Luckly there is a "\0" nul terminate after the f,m, t characters and it stops there.

Is there an example to convert char string array to UnicodeString correctly in C++?

Here is a full example about to convert char string array to UnicodeString in C++ Builder,

How To Convert Char Array String To Unicode String Correctly In C++

C++ Builder is the easiest and fastest C and C++ compiler and IDE for building simple or professional applications on the Windows operating system. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for UIs.

There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome C++ content in your inbox, every day.

We don’t spam! Read our privacy policy for more info.

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder.
Design. Code. Compile. Deploy.
Start Free Trial

Free C++Builder Community Edition

About author

Dr. Yilmaz Yoru has 35+ years of coding with more than 30+ programming languages, mostly C++ on Windows, Android, Mac-OS, iOS, Linux, and some other operating systems. He graduated and received his MSc and PhD degrees from the Department of Mechanical Engineering of Eskisehir Osmangazi University. He is the founder and CEO of ESENJA LLC Company. His interests are Programming, Thermodynamics, Fluid Mechanics, Artificial Intelligence, 2D & 3D Designs, and high-end innovations.
Related posts
C++C++11C++14C++17C++20Introduction to C++Learn C++

How To Learn The Move Constructors In Modern C++?

Artificial Intelligence TechC++Language FeatureLearn C++

How To Develop A Simple Hopfield Network In C++

C++C++11C++14C++17C++20Introduction to C++Learn C++Syntax

Learn Default Constructors Of Classes In Modern C++

Artificial Intelligence TechC++C++11C++14C++17Learn C++

How To Develop Special AI Activation Functions In C++?