**Artificial Intelligence** models in today’s AI development are rapidly increasing. These models execute and learn using mathematical functions that optimize the best result when it propagates or chooses the best solutions when it back propagates between artificial neurons. One of these functions is the **Activation Function**. Also called a **transfer function**, or **threshold function** it determines the activation value of a neuron as an output that is calculated by the sum of weighted values from inputs. In this post, we describe the most used activation functions available today and show how they can be used in C++.

Table of Contents

## What is an AI activation function?

An **Activation Function** ( phi() ) also called a **transfer function**, or **threshold function**. It determines the activation value ( a = phi(sum) ) from a given value (sum) from the **Net Input Function** . In a **Net Input Function**, the **sum** is a sum of signals in their weights. The activation function is a new value of this sum with a given function or conditions. Activation function is a way to transfer the sum of all weighted signals to a new activation value of that signal. There are different activation functions, some common ones are Linear (Identity), bipolar and logistic (sigmoid).

In C++ you can create your own AI activation function. Note that ‘sum’ is the result of Net Input Function which calculates sum of all weighted signals. We will use sum as a result of input function. Here the activation value of an artificial neuron (output value) can be written by the Activation function as shown below,

By using this **sum **Net Input Function Value** **and **phi() activation function**, we can calculate the output.

## How can we develop special activation functions in AI?

In AI development, there are different types of activation functions for different purposes. We can use them or we can develop a special activation function. Now let’s see activation function types in AI development.

### 1. Identity Function ( y=x )

An **Identity Function**, also called an **Identity relation** or **Identity Map** or **Identity Transformation**, is a function in mathematics that always returns the same value that was used as its argument. We can briefly say that it is a y=x function or *f*(*x*) = *x* function. This function can be also used as a activation function in some AI applications.

This is a very simple Activation Function which is also an Identity Function,

1 2 3 4 5 6 |
float phi(float sum) { return sum ; // Identity Function, linear transfer function f(sum)=sum } |

return value of this function should be floating number **( float**, **double, long double) ** because of weights are generally between 0 and 1.0,

### 2. Binary Step Function (Heaviside Step Function)

Binary Step Function , or **Heaviside step function**, or the **unit step function**, is a step function, named after Oliver Heaviside (1850–1925), the value of which is zero for negative arguments and one for positive arguments. That means it results 0 or 1 as a Boolean. This function is an example of the general class of step functions, all of which can be represented as linear combinations of translations of this one.

Thus, our Binary Step Function should be as shown below.

1 2 3 4 5 6 |
bool phi(float sum) { return(sum>0) ; // Binary Step Function , Heaviside Step Function, Unit Step Function, } |

This activation function returns 1 (true) if sum>0 otherwise returns 0 (false).

For more details and examples please check this post below.

### 3. Logistic Function (Logistic Curve) and Sigmoid Function

**Logistic function** or **Logistic Curve** is a common S-shaped curve (sigmoid curve) with equation below.

Here,**L** is maximum value of curve, **x0 **is the value of curves mid point,**k** the logistic growth rate or steepness of the curve

The most used Logistic Function is a **Standard Logistic Function** which is known as **Sigmoid Function**, where L and k is 1 and x0=0. Thus our function can be written one of these terms as below,

and in C++ the Sigmoid activation function can be written as below:

1 2 3 4 5 6 |
double phi(double sum) { return ( 1/(1+ std::exp( -1*sum)) ); // Standard Logistic Function, Sigmoid Function } |

Note that here division costs more CPU usage than a multiplication, as in function given above we can use the version with tanh() like so:

1 2 3 4 5 6 |
double phi(double sum) { return ( 0.5*(1+ std::tanh( 0.5*sum)) ); // Standard Logistic Function, Sigmoid Function } |

As you see, we have only multiplication and addition with tanh() function here. If the network’s sum values are in a range i.e. between (0-10) , to achieve faster approximate results arrayed results can be used. There may be y array with 10000 members and for example y[1001] can hold pre-calculated value for phi(1.0001). This will make your neural network faster but also may cause more errors or hard to achieve desired epoch numbers. It should be tested with the one of the normal sigmoid function versions as above.

For more details and examples please check this post below,

### 4. Hyperbolic Tangent Function (tanh)

The hyperbolic tangent is a trigonometric function tanh() as below,

This function has the unique solution to the differential equation *f* ′ = 1 − *f* ^{2}, with *f* (0) = 0.

An Activation Function can be a used as a Hyperbolic Tangent Function as below,

1 2 3 4 5 6 |
double phi(double sum) { return ( std::tanh(sum) ); // Hiperbolic Tangent Function } |

For more details and examples please check this post below,

### 5. Exponential Linear Unit (ELU)

An **Exponential Linear Unit (ELU) **is another activation function which is developed and published by Djork-Arne Clevert, Thomas Unterthiner & Sepp Hochreiter with the title “FAST AND ACCURATE DEEP NETWORK LEARNING BY EXPONENTIAL LINEAR UNITS (ELUS)”. You can find the text of the paper by clicking here.

According to their study, they introduce the “exponential linear unit” (ELU) that it speeds up learning in deep neural networks and leads to higher classification accuracies. ELU activation function alleviate the vanishing gradient problem via the identity for positive values, like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized .ReLUs (PReLUs),. They also proof that ELUs have improved learning characteristics compared to the units with other activation functions. In contrast to ReLUs

Exponential Linear Unit (ELU) can be written as below.

and derivative of this function can be written as:

In C & C++ Programming language, simply Exponential Linear Unit function can be written as below

1 2 3 4 5 6 7 8 |
double alpha = 0.1; // ranges from 0 to 1.0 double phi(double sum) { return( sum>0? sum : alpha*( std::exp(sum)-1) ); // ELU Function } |

For more details and examples please check this post below,

### 6. Scaled Exponential Linear Unit (SELU)

The **Scaled Exponential Linear Unit** is another activation function which is a scaled version of ELU by using λ parameter. **Scaled Exponential Linear Unit** is developed and released with the “Self-Normalizing Neural Networks” paper by Günter Klambauer, Thomas Unterthiner, Andreas Mayr in 2017. They introduced self-normalizing neural networks (SNNs) to enable high-level abstract representations. Neuron activations of SNNs automatically converge towards zero mean and unit variance, while the batch normalization requires explicit normalization.

SELU is a scaled version of ELU activation function by multiplying with λ parameter, So we can simply say this,

The SELU Activation Function can be written as follows,

They have solved for α and λ and obtain the solutions α01 ≈ 1.6733 and λ01 ≈ 1.0507, where the subscript 01 indicates that these are the parameters for fixed point (0, 1). According this explanation, each node may have different α and λ parameters. So we can define alfa and lambda parameters in neuron structure and we can calculate SELU as below.

1 2 3 4 5 6 |
double phi(double sum) { return( sum>0? lambda*sum : lambda*alpha*( std::exp(sum)-1) ); // SELU Function } |

For more details and examples please check this post below,

### 7. Rectified Linear Unit (ReLU)

In Artificial Neural Networks, the **Rectifier Linear Unit Function** or in other terms **ReLU Activation Function** is an activation function defined as the positive part of its argument. Can be written as f(x)= max(0, x) where *x* is sum of weighted input signals to an artificial neuron. ReLU Function is also known as a Ramp Function and is analogous to Half-wave Rectification in electrical engineering.

This function called as a **Parametric ReLU function**. If Beta is 0.01 it is called **Leaky ReLU function**.

This is max-out ReLU function,

if Beta is 0 then f(x) = max(x, 0). This function will return always positive numbers. Let’s write maxout ReLU function in C programming language,

Here is the example,

1 2 3 4 5 6 |
double phi(double sum) { return ( std::max(0, sum) ); // ReLU Function } |

For more details and examples please check this post below,

### 8. Gaussian Error Linear Unit (GELU)

A **Gaussian Error Linear Unit** is an alternative to RELU, ELU functions, defined and published by Dan Hendrycks and Kevin Gimpel in 2016. It is used to smooth RELU and ELU activations (Full paper can be found here)

The Gaussian Error Linear Unit (GELU) is a high-performing neural network activation function. The GELU activation function is xΦ(x), where Φ(x) the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (x1x>0). An empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations has been applied and there is performance improvements across all considered computer vision, natural language processing, and speech tasks.

GELU function can be written as

We can approximate the GELU with,

or if greater feedforward speed is worth the cost of exactness we can use this approximation below,

We can use different CDFs, i.e we can use Logistic Function, Cumulative Distribution Function CDF σ(x) to obtain activation value, that is call the **Sigmoid Linear Unit **(SiLU) xσ(x).

From the second formula we can code our phi() activation function with GELU as below,

1 2 3 4 5 6 7 8 |
double sqrt_2divPI= std::sqrt(2.0/M_PI); double phi(double sum) { return( 0.5*sum*(1 + std::tanh( sqrt_2divPI*( sum + 0.044715*std::pow(sum,3)) ) ) ); // GeLU Function } |

For more details and examples please check this post below,

### 9. SoftPlus Activation Function

The **SoftPlus Activation Function** is developed and published by Dugas et al in 2001. The full paper can be found here. Put simply, the Softplus function can be written as below,

**f(x) = log( 1+exp(x) );**

According to their paper; the basic idea of the proposed class of functions, they replace the sigmoid of a sum by a product of Softplus or sigmoid functions over each of the dimensions (using the Softplus over the convex dimensions and the sigmoid over the others). They introduced the concept that new classes of functions similar to multi-layer neural networks have those properties.

Another study published by Xavier Glorot, Antoine Bordes, Yoshua Bengio, the full paper is titled “Deep Sparse Rectifier Neural Networks” which can be found here. According to this study, in Artificial Neural Networks, while logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks.

A SoftPlus activation function in C++ can be written as below:

1 2 3 4 5 6 |
double phi(double sum) { return ( std::log( 1+ exp(sum) ) ); // SoftPlus Function } |

For more details and examples please check this post below,

### 10. Self Regularized Non-Monotonic (Mish) Activation Function

**Self Regularized Non-Monotonic (Mish) Activation Function** is inspired by the Swish activation function. It is a smooth, continuous, self-regularized, non-monotonic activation function. This function is published “Mish: A Self Regularized Non-Monotonic Activation Function” by Diganta Misra in 2019.

According to this study, “Mish uses the Self-Gating property where the non-modulated input is multiplied with the output of a non-linear function of the input. Due to the preservation of a small amount of negative information, Mish eliminated by design the preconditions necessary for the Dying ReLU phenomenon. This property helps in better expressivity and information flow. Being unbounded above, Mish avoids saturation, which generally causes training to slow down due to near-zero gradients drastically. Being bounded below is also advantageous since it results in strong regularization effects. Unlike ReLU, Mish is continuously differentiable, a property that is preferable because it avoids singularities and, therefore, undesired side effects when performing gradient-based optimization.”

We explained softplus() activation function before. Mish Activation Function can be defined by using softplus() as follows,

Hence, Mish Activation Function can be defined mathematically as follows,

Author compared well Mish, ReLU, SoftPlus and Swish activation function outputs and also compared first and second derivatives of Mish and Swish.

Mish function can be coded in C++ as below,

1 2 3 4 5 6 |
double phi(double sum) { return ( sum* std::tanh( std::ln(1+ std::exp(sum)) ); // Mish Function } |

For more details and examples please check this post below,

### 11. Softmax Function

In neural networks, the SoftMax function is often used in the final layer of a neural network-based classifier. Such these kinds of networks are generally trained under a log loss or cross entropy methods that are a non-linear variant of multinominal logistic regression. Softmax function is used to soft outputs between 0 and 1, it can be used as an activation function too.

For a x vector (or array) which has n members a Softmax for each member can be written as below,

This function may overflow due to infinitive results. To avoid this we can modulate x values by subtracting maximum value m.

For more details and examples please check this post below,

**C++ Builder is the easiest and fastest C and C++ compiler and IDE for building simple or professional applications on the Windows operating system. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for UIs.**

**There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.**