C++C++11C++14C++17Language FeatureLearn C++Syntax

Unicode Strings in C++ On Windows

Unicode standard for UnicodeString provides a unique number for every character (8, 16 or 32 bits) more than ASCII (8 bits) characters. UnicodeStrings are being used widely because of support to languages world wide and emojis. In modern C++ nowadays there are two types of strings used; array of chars (char strings) and UnicodeStrings (WideStrings and AnsiStrings are older, not compatible with all features now). CLANG / C++ Builder / GNU C / VC++ compilers, IDEs are using this standard for GUI forms to support all languages to provided applications in global. More information about the structure of Unicode Strings can be found here . RAD Studio , Delphi & C++ Builder uses Unicode-based strings: that is, the type String is a Unicode string (System.UnicodeString) instead of an ANSI string. If you want to transform your codes to Unicode strings we recommend you this article.

In general there are 4 type of alphanumeric string declarations in C++;

  • Array of chars (See Fundemental Types)
    chars are shaped in ASCII forms which means each character has 1 byte (8 bits) size (that means you have 0 to 255 characters)
  • AnsiString
    Previously, String was an alias for AnsiString. For RAD Studio, C++ Builder and Delphi, the format of AnsiString has changed. CodePage and ElemSize fields have been added. This makes the format for AnsiString identical for the new UnicodeString.
  • WideString
    WideStrings was previously used for Unicode character data. Its format is essentially the same as the Windows BSTR. WideString is still appropriate for use in COM applications.
  • UnicodeString (String)
    UnicodeString data is in UTF-16 format that means characters in UTF-16 may be 2 or 4 bytes. There is a good article about Unicode in RadStudio. Default type of string in most IDEs (i.e. RAD Studio, C++ Builder, Delphi, Visual Studio) is the UnicodeString type. In C++ Builder & Delphi; Char and PChar types are now WideChar and PWideChar, respectively.

C++ Examples about Unicode Strings (based on CLANG / C++ Builder Compiler)

How to declare Unicode Strings

L is a String Literal here, represents a wchar_t literal; u8, u and U literals can be used too. These might be default in you editor and or compiler options that means you don’t need to add if you know the default. A string literal is a sequence of characters surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in “…”, R”(…)”, u8″…”, u8R”(…)“, u”…”, uR”˜(…)˜”, U”…”, UR”zzz(…)zzz”, L”…”, or LR”(…)”, respectively. Please see String Literals section in this document Working Draft, Standard for Programming Language C++. Here below we sum some of these standards used in C++.

Examples to String Literals for Strings Definitions

  • str=”abcd”; default string based on compiler/IDE options.
  • str=u8″abcd”; a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8, including the null terminator
  • str=u”abcd”; a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, including the null terminator
  • str=U”abcd”; a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, including the null terminator
  • str=L”abcd”; a wide string literal. A wide string literal has type “array of n const wchar_t”, including the null terminator
  • str=R”abcd”; raw strings

What is difference between L”” and U”” and u”” literals in C++

  • L is based on wide string literal depends on array of n const wchar_t in your compiler/IDE options. Generally it is UTF-8 or UTF-16 format
  • u is for UTF-16 format,
  • U is for UTF32 formats

Length of Unicode String

Size of Unicode String

How to reach / read characters of Unicode String:

How to change characters of Unicode String

How to find position of a string in Unicode String

Converting Unicode String to Integer

Converting Unicode String to Double
Converting Unicode String to Float

Converting Unicode String to LowerCase

Converting Unicode String to UpperCase

Converting Unicode String to char String

Converting from higher number chars to lower number chars is not recommended , if you need to convert to a low level (char) that means that low level variable needs to be higher level (unicode) , otherwise you can loose some unicode characters which will result missing or wrong characters in your char strings.

Converting Unicode String to ANSI String

Converting Unicode String to Wide String

Substring of a Unicode String

Insert a String to UnicodeString

Deleting / Trimming Part of Unicode String

Compare Unicode Strings

Triming spaces and control characters from a Unicode String

For more details about these commands and properties listed above please check UnicodeString Mehtods & Properties and UnicodeString Types from UnicodeString wilki.



close

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome C++ content in your inbox, every day.

We don’t spam! Read our privacy policy for more info.

About author

33+ years of coding with more than 30+ programming languages, mostly C++ on Windows, Android, Mac-OS, iOS, Linux and some other operating systems. Dr. Yilmaz Yoru was born in 1974, Eskisehir-Turkey. He graduated from the department of Mechanical Engineering of Eskisehir Osmangazi University in 1997. One year later he started to work in the same university as an assistant. He received his MSc and PhD degrees from the same department of the same university. He has married and he is a father of a son. Some of his interests are Programming, Thermodynamics, Fluid Mechanics and Artificial Intelligence. He also likes the graphical 2D & 3D design and high-end innovations.
Related posts
C++C++11C++14C++17Introduction to C++Learn C++

This Is How To Get A Substring of a Wide String in C++

C++C++11C++14C++17Introduction to C++Learn C++

How To Access Individual Character Elements Of A C++ String

C++C++11C++14C++17Introduction to C++Learn C++

The Right Way To Access Character Elements Of A Wide String

C++Language FeatureLearn C++

You NEED to Learn To Use JSON (JavaScript Object Notation)

Worth reading...
Top 6 C++ IDEs For Building Native Windows Apps In 2020
ru_RURussian