“Deep Learning” has been the buzz word for quite some time now. Many of you must have heard others around proclaiming - using deep learning computers can perform tasks like a human, but not sure how exactly this happens. Some of these people may also proclaim that deep learning is very complicated to grasp. Well, not really. In this article I will try to debunk this notion by breaking down the concept of deep learning without making things difficult. At times the mathematics may seem a bit overwhelming, but do not worry, if you can treat the mathematics as just a tool to design deep learning algorithms and imagine the functioning like that of an actual human brain, it might appear intuitively easier to grasp. Besides, if you are a programmer and interested only with the implementation, good news for you, you won’t even need to get your hands dirty with all those tedious mathematics. I mean you can if you want to, that will only make your life easier but then it is not absolutely necessary. Just understanding a few necessary mathematical approaches is sufficient. There are actually a plenty of programming frameworks available out there for you (Tensorflow, Keras, PyTorch, Chainer just to name a few). And if you know python, man you are all set to roll. (There are lots of frameworks available for Python among other languages) Also I won’t be getting into the mathematics and its working in this article.
With the progress of technology, the need for automation became salient. And to perform complex tasks that requires decision making one needs the automated system itself to be intelligent. Conventionally programmed systems turns out to be dumb while performing tasks that involves making complex decisions based on assessment of the environmental conditions. Thus emerged the field of Artificial Intelligence, which I guess all of you have heard of it and know what it’s all about. For those of you who don’t, well artificial intelligence is a branch of computer science which aims to create intelligent machines. One of the driving principles of AI is that: Machines can often act and react like humans only if they have abundant information relating to the world. However, Artificial Intelligence programs are not actually considered intelligent enough to perform tasks efficiently when the task is concerned with pattern recognition, speech/object recognition, etc. This was due to their lack of learning capabilities and algorithm based approaches. The fact that, Human brains are incredibly efficient in recognition while Computers are incredibly fast and accurate with calculations and computations, sparked the idea of Machine Learning – which later advanced to deep learning (deep learning is basically a subfield of machine learning which in turn is a subfield of artificial intelligence) The whole idea of deep learning is to mimic the functioning of a human brain, by empowering the computer programs with learning capabilities. To get an intuition, consider a kid’s brain. A kid has a brain full of curiosity, wandering around and constantly absorbing knowledge (learning) from whatever it sees interesting or feels is important. Based on his knowledge the kid acts when similar circumstances arise – based on the kid’s actions, his parents either appraises him or retorts him. Thus under his parent’s supervision the kid realizes the consequences of his actions. Thus the kid learns whether what he did was right or wrong, and takes proper course of actions in similar situations in the future. Similarly, a deep learning model takes inputs and outputs – maps them (generates a function) – processes further inputs – assesses their correctness under supervision – and accordingly incorporates the corrected function into its program.
Within the brain, thousands of neurons are connected in an intricately complex network, all these neurons are firing at incredible speed and accuracy. Firing of each of these neurons triggers consecutive firing of inter-connected neurons in a chain, this helps us recognize patterns, text, images, and the world at large.
In computer science, a neural network is a programming model that simulates this structure (and function) of inter connected neurons in the human brain. More appropriately these are called Artificial Neural Networks (ANNs).
Diagrammatically a neural network would look something like this:
Each of these circles simulates a neuron and hence they are called the same (neurons or nodes). The first layer is the input layer, that is, these represent the input features. Input is passed into this layer. In between is the hidden layer. This is the layer responsible for complicated mathematical computations and number crunching. Normally in a neural network, there are n numbers of hidden layers. All these hidden layers can be considered as a black box – taking in inputs and generating outputs. The last layer is the output layer – this layer gives output. It can be any recognized speech, text, image, or some predicted value, etc.
To realize the working of neural networks, let's consider a Housing Price Prediction example.
Let's say you have a data set with six houses, so you know the size of the houses in square feet or square meters and you know the price of the house and you want to fit a function to predict the price of the houses as a function of the size. So if you are familiar with linear regression you will probably put a straight line to these data like that (the red line of best fit). But you might say well we know that prices can never be negative, right. So instead of the straight line fit which eventually will become negative, let's bend the curve at y = 0 (point where the line first touches the x axis). So this thick red line ends up being your function for predicting the price of the house as a function of this size. So you can think of this function that you've just fit the housing prices as a very simple neural network. It's almost a simplest possible neural network.
In reality, the price of houses depends on not just one but various factors (or features as they are called). Say the price depends on, 1. Size 2. No. of bedrooms 3. Zip code 4. Wealth
So we input these four features as x to a network of neurons. An incredible thing about neural networks is that, when you implement it, you just need to give the inputs and the corresponding outputs, and the NN will figure out the mapping -input -> output by itself. Inside the neurons some serious number crunching takes place and the output is conveyed to the next adjacent layer of neurons, this goes on until the output is obtained. In the first node both size and no. of bedrooms go in as input, which takes it one step closer towards the estimation of the price. So we can say that the first neuron (in the above diagram) estimates some new feature value which is dependent on both size of the house and the number of bedrooms in it – say, family size. Similarly, zip code goes into the second and the third neuron and wealth goes into the third neuron, and we can say that the second neuron estimates the walkability of the area and the third neuron estimates quality of school in the locality. All these factors then form the new inputs to the next hidden layer. This goes on till the output layer is reached.
ANNs don’t learn by themselves, we train them. To answer “How?” - Well it’s just random guesses of mathematical functions and reduction of the deviation (thus aligning) of our assumptions from the actual data with proper algorithms. If that was a lot to take in, here’s the simpler version - Training is the process by which Neural Network model learns on the already available data with results to align itself to predict on fresh data. Neural Networks have been a great area of research and a lot have been done and a lot is being done. Researchers have developed multiple techniques of training neural networks.
ANNs like the one above with limited number of layers and neurons can only do so much. This kind of learning (training) may also be referred to as shallow learning. To represent more complex features and to “learn” increasingly complex models for prediction and classification of information that depends on thousands or even millions of features, we need little more complex ANNs than the one above. This is accomplished by simply increasing the number of hidden layers and/or the number of neurons per hidden layer. More layers and more neurons can represent increasingly complex models but they also come at the cost of increasing time and power-consuming computations. Such neural networks which consist of more than three layers of neurons (including the input and output layer) are called as Deep Neural Networks. And training them is called as Deep Learning. Well that was Deep Learning in a nutshell!