What You Need To Know About C++ Gaussian Error Linear Units

Yilmaz Yoru

2 years ago

What You Need To Know About C++ Gaussian Error Linear Units

In this post, you’ll learn what a Gaussian Error Linear Unit is. How does the GELU function work in ANN? Where can GELU be applied to AI technologies? By learning C++ Gaussian Error Linear units, it will help you to build C++ applications with the use of C++ IDE.

What is an activation function?

An Activation Function ( phi() ) also called as transfer function, or threshold function determines the activation value ( a = phi(sum) ) from a given value (sum) from the Net Input Function. Net Input Function, here, means the sum is a sum of signals in their weights, and activation function is a new value of this sum with a given function or conditions. In another term. The activation function is a way to transfer the sum of all weighted signals to a new activation value of that signal. There are different activation functions, mostly Linear (Identity), bipolar and logistic (sigmoid) functions are used. The activation function and its types are explained well here.

In C++ (in general in most Programming Languages) you can create your activation function. Note that sum is the result of Net Input Function which calculates the sum of all weighted signals. We will use some as a result of the input function. Here activation value of an artificial neuron (output value) can be written by the activation function as below,

What is a Gaussian Error Linear Unit or GELU?

A Gaussian Error Linear Unit is an alternative to RELU, ELU functions, defined and published by Dan Hendrycks and Kevin Gimpel in 2016. It is used to smooth RELU and ELU activations (Full paper can be found here)

The Gaussian Error Linear Unit (GELU) is a high-performing neural network activation function. The GELU activation function is xΦ(x), where Φ(x) the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (x1x>0). An empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations has been applied and there is performance improvements across all considered computer vision, natural language processing, and speech tasks.

GELU function can be written as

We can approximate the GELU with,

or if greater feedforward speed is worth the cost of exactness we can use this approximation below,

We can use different CDFs, i.e we can use Logistic Function, Cumulative Distribution Function CDF σ(x) to obtain activation value, that is call the Sigmoid Linear Unit (SiLU) xσ(x).

From the second formula we can code our phi() activation function with GELU as below,

[crayon-662202727f3a9474889229/]

From the third formula we can use a sigmoid function and we can code our phi() activation function as below,

[crayon-662202727f3b2465327443/]

These formulas can both be tested in this example below,

[crayon-662202727f3b5626237083/]

and the output for the phi(0.5) is;

[crayon-662202727f3b7044144269/]

Is there a simple C++ ANN example with a GELU Activation Function?

[crayon-662202727f3b9741525535/]