Artificial Neural Networks (ANN)
Inside a human brain, there is an intensely dense and complex inter-connection of neurons present - which I have already mentioned in the previous article: “The Idea of Deep Learning”.
This I believe couldn’t get any more fascinating than it is. The brain cannot see, hear, touch, smell or taste things out there in the world, it just receives signals and transmits them, as if it is enclosed in a black box (the bones, flesh and skin). And yet it is responsible for all our senses, thoughts, recognition and conscience. All these by individual functioning (transmission of electrical impulses) of the neurons in an intricately complex inter-connected network.
This is how a neuron actually looks like,
Dendrites are the receptors, they receive signals (electrical impulses) from other neurons, these signals gets processed in the nucleus of the neuron and then is passed through the axon and is transmitted to the other neurons via neurotransmitters across the synapses.
An artificial neural network is basically a model imagined or we can say inspired from the human brain. Let me clarify that the neural networks in our brain are not as simple and much complex than the artificial neural network.
The one shown above is a single neuron. Trillions and zillions of such neurons are inter-connected in an intricately complex way inside our brain. Although we can construct a complex neural network but the neural architecture of our brain is much more complex and yet unexplored.
The same concept of the network of neurons is used in machine learning algorithms. In this case , the neurons are created artificially on a computer . Connecting many such artificial neurons creates an artificial neural network. The working of an artificial neuron is similar to that of a neuron present in our brain.
You must have seen images like the one above before, or for that matter in the previous article.
And, I am sure you must be wondering, how is this even supposed to make any sense? And how the heck does a neuron actually work? What’s inside it? Well, this is where this subject gets even more fascinating.
To get the an idea, let’s zoom in to a single neuron in a network.
The diagram shown above is a single layer neuron (also called a perceptron).
The input feeds to the neuron are like the dendrites and the output to z’ is like an axon in an actual neuron.
The entire functioning of a neuron is processed mathematically, i.e, the actual biological neurons are simulated via mathematical functions. Or simply put, the entire human brain functioning can be mapped using a network of selectively triggered mathematical functions. How can it get any more fascinating? I mean it blew my mind at first when I realized this.
There are several categories of learning algorithms for the neurons to learn from. The most popular among these are:
- Supervised Learning: Supervised learning, is a type of learning in which both input and desired output data are provided. Input and output data are labelled for classification to provide a learning basis for future data processing. Supervised machine learning systems provide the learning algorithms with known quantities to support future judgments.
- Unsupervised Learning: Unsupervised machine learning claims to uncover previously unknown patterns in data, but most of the time these patterns are poor approximations of what supervised machine learning can achieve. Additionally, since you don’t know what the outcomes should be, there is no way to determine how accurate they are, making supervised machine learning more applicable to real-world problems.
And, there are three generic types of outputs that can be produced from as neural network:
- Continuous : These are continuous values and can be floating points.
- Binary : These are binary decisive values. These can be either 1 or 0.
- Categorical : These are the kind of outputs which are categorized into several classes.
- The inputs to the neurons, are first multiplied by some weights and are summed up.
- Inside a neuron, this weighted sum is passed into a mathematical function as a parameter, also called the activation function, which gives output for the immediate neuron.
- This output value z’ is then compared to the desired output value z and the squared deviation of the result is calculated using cost function.
- This cost value is then fed back to the network and the weights are tweaked.
- This process continues till the cost value is minimized (or rather, optimized). This process of feeding values backward into the network is called backpropagation.
The activation functions inside the neurons can be imagined as the small knobs and regulators that can turned and adjusted to tune the input-output mapping (the function generated). There are numerous activation functions out there to choose from, according to our requirements. However I will be discussing four most pre dominant ones in this article.
The threshold function is a binary output function. It is basically a yes/no type of function.
On the X axis you have weighted sum of the inputs and on the Y axis you have the values from 0 to 1.
Basically the Threshold function is as a very simple one where if the X axis value is less than 0 then the threshold function passes on 0.
If the X axis value is more than 0 then the threshold function passes one.
This was a very straight forward function. Let’s see a more complex one.
The Sigmoid function is given by, ϕ(x) = 1/( 1 + e^(-x))
Of course X is the value of the weighted sums. So, what is good about this function is that it is smooth. Unlike the threshold function this one doesn’t have those kinks in its curve and therefore it has a nice and smooth gradual progression. So, anything below 0 is dropped off to above 0 and its Y values lies between 0 and 1.
Sigmoid function is very useful in the output (final) layer for computing probabilistic outputs.
ReLU (Rectified Linear Unit)
The Rectified Linear Unit is the most commonly used activation function in deep learning models. The function returns 0 if it receives any negative input, but for any positive value x it returns that value back. So it can be written as f(x)=max(0,x).
Hyperbolic tangent function
The Hyperbolic tangent function is given by, ϕ(x) = (1 - e^(-2x))/(1 + e^(-2x))
Hyperbolic tangent function is very similar to sigmoid function but here the hyperbolic tangent function goes below 0, so the values 0 to 1 (approximately) in the positive X axis and from 0 to -1 (approximately) in the negative X axis.
Hi, I am Subham, I am a learning enthusiast, an active reader and the author of this article. Hope you liked this article.