In this post, you’ll learn what a Gaussian Error Linear Unit is. How does the GELU function work in ANN? Where can GELU be applied to AI technologies? By learning C++ Gaussian Error Linear units, it will help you to build C++ applications with the use of C++ IDE.

## What is an activation function?

An **Activation Function** ( phi() ) also called as **transfer function**, or **threshold function** determines the activation value ( a = phi(sum) ) from a given value (sum) from the **Net Input Function**. **Net Input Function**, here, means **the sum** is a sum of signals in their weights, and activation function is a new value of this sum with a given function or conditions. In another term. The activation function is a way to transfer the sum of all weighted signals to a new activation value of that signal. There are different activation functions, mostly Linear (Identity), bipolar and logistic (sigmoid) functions are used. The activation function and its types are explained well here.

In C++ (in general in most Programming Languages) you can create your activation function. Note that sum is the result of Net Input Function which calculates the sum of all weighted signals. We will use some as a result of the input function. Here activation value of an artificial neuron (output value) can be written by the activation function as below,

## What is a Gaussian Error Linear Unit or GELU?

A **Gaussian Error Linear Unit** is an alternative to RELU, ELU functions, defined and published by Dan Hendrycks and Kevin Gimpel in 2016. It is used to smooth RELU and ELU activations (Full paper can be found here)

The Gaussian Error Linear Unit (GELU) is a high-performing neural network activation function. The GELU activation function is xΦ(x), where Φ(x) the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (x1x>0). An empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations has been applied and there is performance improvements across all considered computer vision, natural language processing, and speech tasks.

GELU function can be written as

We can approximate the GELU with,

or if greater feedforward speed is worth the cost of exactness we can use this approximation below,

We can use different CDFs, i.e we can use Logistic Function, Cumulative Distribution Function CDF σ(x) to obtain activation value, that is call the **Sigmoid Linear Unit **(SiLU) xσ(x).

From the second formula we can code our phi() activation function with GELU as below,

1 2 3 4 5 6 7 8 |
double sqrt_2divPI= std::sqrt(2.0/M_PI); double phi(double sum) { return( 0.5*sum*(1 + std::tanh( sqrt_2divPI*( sum + 0.044715*std::pow(sum,3)) ) ) ); // GeLU Function } |

From the third formula we can use a sigmoid function and we can code our phi() activation function as below,

1 2 3 4 5 6 7 8 9 10 11 |
double sigmoid(double x) { return( 1/( 1 + std::exp(-1*x) ) ); } double phi2(double sum) { return( sum* sigmoid(1.702*sum) ); // GeLU Function } |

These formulas can both be tested in this example below,

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
#include <iostream> double sqrt_2divPI= std::sqrt(2.0/M_PI); double phi(double sum) { return( 0.5*sum*(1 + std::tanh( sqrt_2divPI*( sum + 0.044715*std::pow(sum,3)) ) ) ); // GeLU Function } double sigmoid(double x) { return( 1/( 1 + std::exp(-1*x) ) ); } double phi2(double sum) { return( sum* sigmoid(1.702*sum) ); // GeLU Function } int main() { std::cout << phi(0.5) << '\n'; std::cout << phi2(0.5) << '\n'; getchar(); return 0; } |

and the output for the phi(0.5) is;

1 2 |
0.345714 0.350388 |

## Is there a simple C++ ANN example with a GELU Activation Function?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
#include <iostream> #define NN 2 // number of neurons double sqrt_2divPI= std::sqrt(2.0/M_PI); class Tneuron // neuron class { public: double a; // activity of each neurons double w[NN+1]; // weight of links between each neurons Tneuron() { a=0; for(int i=0; i<=NN; i++) w[i]=-1; // if weight is negative there is no link } // let's define an activation function (or threshold) for the output neuron double activation_function(double sum) { return( 0.5*sum*(1 + std::tanh( sqrt_2divPI*( sum + 0.044715*pow(sum,3)) ) ) ); // GeLU Function } }; Tneuron ne[NN+1]; // neuron objects void fire(int nn) { float sum = 0; for ( int j=0; j<=NN; j++ ) { if( ne[j].w[nn]>=0 ) sum += ne[j].a*ne[j].w[nn]; } ne[nn].a = ne[nn].activation_function(sum); } int main() { //let's define activity of two input neurons (a0, a1) and one output neuron (a2) ne[0].a = 0.0; ne[1].a = 1.0; ne[2].a = 0; //let's define weights of signals comes from two input neurons to output neuron (0 to 2 and 1 to 2) ne[0].w[2] = 0.3; ne[1].w[2] = 0.2; // Let's fire our artificial neuron activity, output will be fire(2); printf("%10.6f\n", ne[2].a); getchar(); return 0; } |