Site icon Learn C++

Unicode Strings in C++ On Windows

In programming, one of the most used variables is text strings, and they are sometimes really important when storing and retrieving valuable data. It is important to store your data safely in its language and localization. Most programming languages have issues when storing texts and letters. In C++, there is very old well-known string type (arrays of chars) and modern types of std::basic_string types such as std::stringand std::wstring. In addition to these modern string types, C++ Builder has another amazing string feature, UnicodeString. In this post, we explain what UnicodeString is in modern C++ and how to it.

What are the string types in C++?

In general there are 3 type of alphanumeric string declarations in C++;

In addition, there were some old string types that we used in C++ Builder and Delphi before,

What is UnicodeString (String) in C++ Builder?

Unicode standard for UnicodeString provides a unique number for every character (8, 16 or 32 bits) more than ASCII (8 bits) characters. UnicodeStrings are being used widely because of support to languages world wide and emojis. In modern C++ nowadays there are two types of strings used; array of chars (char strings) and modern strings such as std::string, std::wstring or UnicodeString (default type for the String). Most compilers, IDEs are using these new string standards in their GUI forms and components to support all languages that provides applications in global.

In C++ Builder, there were other string types, such as WideStrings and AnsiStrings. They are now older, and not compatible with all features now of modern programming. More information about the structure of Unicode Strings can be found here . RAD Studio , Delphi & C++ Builder uses Unicode-based strings: that is, the type String is a Unicode string (System.UnicodeString) instead of an ANSI string. If you want to transform your codes to Unicode strings we recommend you this article.

What is basic_string in modern C++?

The basic_string (std::basic_string and std::pmr::basic_string) is a class template that stores and manipulates sequences of alpha numeric string objects (char,w_char,…). For example, str::string and std::wstring are the data types defined by the std::basic_string<char>. In other words, basic_string is used to define different data_types which means a basic_string is not a string only, it is a namespace for a general string format. A basic string can be used to define string, wstring, u8string, u16string and u32string data types.

The basic_string class is dependent neither on the character type nor on the nature of operations on that type. The definitions of the operations are supplied via the Traits template parameter (i.e. a specialization of std::char_traits) or a compatible traits class. The basic_string  stores the elements contiguously.

Several string types for common character types are provided by basic string definitions as below,

String TypeBasic String DefinitionStandard
std::string std::basic_string<char>
std::wstringstd::basic_string<wchar_t>
std::u8stringstd::basic_string<char8_t>(C++20)
std::u16stringstd::basic_string<char16_t>(C++11)
std::u32stringstd::basic_string<char32_t> (C++11)

Several string type in std::pmr namespace for common character types are provided by the basic string definitions too. Here are more details about basic string types and their literals.

Note that you can use both std::basic_string (std::string, std::wstring, std::u16string, …) / std::pmr and UnicodeString types in C++ Builder.

How can we use UnicodeStrings in C++ Builder?

Here are some modern examples how you can use strings with the UnicodeString type,

How to declare Unicode Strings

L is a String Literal here, represents a wchar_t literal; here u8, u and U literals can be used too. These might be default in you editor and or compiler options that means you don’t need to add if you know the default. A string literal is a sequence of characters surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in “…”, R”(…)”, u8″…”, u8R”(…)“, u”…”, uR”˜(…)˜”, U”…”, UR”zzz(…)zzz”, L”…”, or LR”(…)”, respectively. Please see String Literals section in this document Working Draft, Standard for Programming Language C++. Here below we sum some of these standards used in C++. Here are some examples,

[crayon-6622c0690ead5344344432/]

Examples to String Literals for Strings Definitions

What is difference between L”” and U”” and u”” literals in C++

Length of Unicode String

[crayon-6622c0690eadf016739723/]

Size of Unicode String

[crayon-6622c0690eae2347449580/]

How to reach / read characters of Unicode String:

[crayon-6622c0690eae3629212964/]

How to change characters of Unicode String

[crayon-6622c0690eae5182134764/]

How to find position of a string in Unicode String

[crayon-6622c0690eae7562355813/]

Converting Unicode String to Integer

[crayon-6622c0690eae9959516606/]

Converting Unicode String to Double
Converting Unicode String to Float

[crayon-6622c0690eaea486952208/]

Converting Unicode String to LowerCase

[crayon-6622c0690eaec691314555/]

Converting Unicode String to UpperCase

[crayon-6622c0690eaee758411537/]

Converting Unicode String to char String

[crayon-6622c0690eaef832200709/]

Converting from higher number chars to lower number chars is not recommended , if you need to convert to a low level (char) that means that low level variable needs to be higher level (unicode) , otherwise you can loose some unicode characters which will result missing or wrong characters in your char strings.

Converting Unicode String to ANSI String

[crayon-6622c0690eaf1542278197/]

Converting Unicode String to Wide String

[crayon-6622c0690eaf3911307977/]

Substring of a Unicode String

[crayon-6622c0690eaf5871720710/]

Insert a String to UnicodeString

[crayon-6622c0690eaf6117501943/]

Deleting / Trimming Part of Unicode String

[crayon-6622c0690eaf8463535392/]

Compare Unicode Strings

[crayon-6622c0690eafa720934267/]

Triming spaces and control characters from a Unicode String

[crayon-6622c0690eafc509956180/]

For more details about these commands and properties listed above please check UnicodeString Mehtods & Properties and UnicodeString Types from UnicodeString wilki.

C++ Builder is the easiest and fastest C and C++ compiler and IDE for building simple or professional applications on the Windows operating system. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for UIs.

There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.



Exit mobile version