Site icon Learn C++

How To Operate On The Words In A String In C++

How can I operate on the words in a given string? How can I split a string with multiple delimiters in C++?

What is a word in programming?

Today the digital world has billions of texts and books that have many sentences. Sometimes we need to operate on these sentences at the words level rather than the individual characters. For example, we need to analyze sentences to define the frequency of words. One of the best benefits of programming, we can develop applications that can analyze millions of sentences in the words level. At this time, we should define what is a word. Note: we are talking about lexical words here, words humans speak, read and write – not words in the computer science field meaning.

Google translate says that a ‘word’ means,

  1. a single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence and typically shown with a space on either side when written or printed.”I don’t like the word ‘unofficial’”
  2. a command, password, or signal

In programming, word terms can be different on different applications. Generally, words are separated by spaces, commas, dots, and some other characters. If you are looking only at words composed with the letters only then you are looking at alpha words. If you are looking at numbers only then you are looking at numeric words, or you may be looking for alphanumeric words.

When operating on the words, the language of the input and localization are also important. For example, in some locale settings decimal symbols can be dot and digit grouping symbol coma, and in other locale settings, they may be reversed. Simply a developer should focus on each character of the given string and should know well about the locale settings for that given input. I think there is no good single method to operate on words globally.

Generally, alphabetic words can be separated by the defined sets of delimiter characters like space, comma, dot and so on and that’s pretty universal.

How to operate on a string with multiple delimiters?

[crayon-6763ea2c3e02c063439919/]

How to use Unicode Strings in word operations?

In modern C++, we mostly use UnicodeStrings a string. Unicode standard for UnicodeString provides a unique number for every character (8, 16 or 32 bits) more than ASCII (8 bits) characters. UnicodeStrings are being used widely because of support to languages world wide and emojis. In C++ Builder, there are two types of strings used; array of chars (char strings) and UnicodeStrings (WideStrings and AnsiStrings are older, not compatible with all features now). More information about the structure of Unicode Strings can be found here . RAD Studio , Delphi & C++ Builder uses Unicode-based strings: that is, the type String is a Unicode string (System.UnicodeString) instead of an ANSI string. If you want to transform your codes to Unicode strings we recommend you this article.

Please check this post for more details,

How to use the UnicodeString Pos() method to search a string?

The UnicodeStrings class has a lot of useful properties and methods to operate in modern strings. Pos() method, is one of this useful method to search UnicodeString in a UnicodeString.

The Pos Method of a UnicodeString, returns character index at which specified substring begins. Pos returns the character index in the UnicodeString instance at which the substring subStr begins, where 1 is the index of the first character, 2 is the index of the second character, and so on. If the substring is not contained in the UnicodeStringPos returns 0.

Here is a example how to use it,

[crayon-6763ea2c3e034300166970/]

How to use delimiters in Unicode strings?

Delimiters (token term also used) are characters which separates each string in a group of strings. For example, spaces, comas and other symbols can be used as a delimiter char to separate strings between them. Normally in C++ strtok() is used with chars. In Modern C++, strings are now UnicodeStrings, TStringlists has some properties with which to extract strings. If you are new to UnicodeStrings please check here

The example procedure below shows how to split a UnicodeString to a StringList which has delimited UnicodeStrings

[crayon-6763ea2c3e038584041517/]

We can easily use this In C++ Builder. To do so, create a new “Multi-Device C++ Builder Project”, Add a Memo and a Button on the Form of the Project. Double click to Button, add this split procedure and OnClick() event of the Button as shown below.

[crayon-6763ea2c3e039417468535/]

We can use any UnicodeString Chars as shown above in L’ ‘ format. This method supports UnicodeStrings, because of this it is much more modern than classic strtok() function. Instead of this split(…) definition, we can also define this function as a member of Form like this:

[crayon-6763ea2c3e03a730839918/]

This time this split procedure should be defined as a a public member in public: definitions of Form1 class as below;

[crayon-6763ea2c3e03b188633198/]

C++ Builder is the easiest and fastest C and C++ IDE for building simple or professional applications on the Windows, MacOS, iOS & Android operating systems. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for cross-platform UIs.

There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.

Exit mobile version