Machine learning techniques are becoming more and more popular nowadays getting a crucial role in various industries. If you are at least a bit aware of the idea of artificial intelligence, then you certainly know it is not only about crazy smart robots or self-driving cars. The usage of the AI tools is a way more vast and it can be found in almost every layer of your daily life. Starting from the estimation of the effectiveness of a new medicine on different groups of patients up to recommender systems, the instruments offered by artificial intelligence have proved to be very handy almost everywhere. Yet, have you ever wondered what exactly machine learning looks like? Here you will find some simple explanation!
The meaning of machine learning
First of all, let’s decide whether we really understand what hides beyond the term machine learning. As you can understand, machine learning is the way in which people are teaching computers to be intelligent. Sounds so obvious, however, aren’t machines already pretty intelligent with all the perfection of calculation they are offering to their users? Well, this is not exactly the case.
The standard types of algorithms used in programming are based on creating a set of commands which is telling a computer what exactly it has to do in order to achieve a particular result. In the case of machine learning we are creating a very special type of algorithms which actually does not provide our computer with information. Instead of such an approach, we are feeding it with information which will help our machine to make intelligent decisions on its own.
It sounds somewhat abstract, doesn’t it? It is rather difficult to imagine how something like this can work in the reality. Yet, this is a way simpler than you might think. For example, take a popular case of usage of machine learning techniques such as a predicting whether a tumour cell is malignant or benign. Feeding a computer with a large dataset of photos of cells with a specially designed algorithm will let the computer understand what exactly a malignant cell looks like. Thus, in the future, when doctors will be analysing the photos of tumours, they will be able to use a programme based on the image recognition algorithm for more accurate diagnosis. Needless to say, this is extremely useful for everyone.
The types of information
So, now you know that machine learning is based on a different approach rather than traditional programming which just tell the computer what to do. Now, you are certainly wondering what the exact algorithms used by artificial intelligence are.
The most widely used algorithms of machine learning are divided into two categories. The first one is known as supervised machine learning whereas the other one is unsupervised. The difference between these two ways of teaching computers is in the type of data used which can be labelled or unlabelled. Depending on the form of data we are going to feed our computer with, we will be able to solve different problems, thus, in some cases supervised machine learning will be more useful whereas, in others, unsupervised one. Furthermore, in many situations a data scientist or a data engineer will be able to use only one category of algorithms.
Let’s look closer at the definitions of data.
Labelled data is a set of information in which we know the predictors and the target variable. For example, we have a dataset of parameters of machine engines as well as the average emission of CO2 in the case of each of them. If we want to predict the emission of CO2 for a new type of an engine, we will be able to use this set of data for supervised machine learning. That is so since we will show the computer the correlation between the predicting values such as, for example, the size of the engine and the real emission measured for this model. Based on this information, a computer will be able to use a machine learning model, such as for instance, a linear regression model, and predict the outcome for the engine with new parameters.
Yet, what if we have a set of data which seems to be useful for understanding a particular problem, however, its nature does not make it suitable for prediction of a particular target value. What this data can be actually used is analysis of the population and dividing it into clusters based on some patterns which might be rather difficult to distinguish for a human. Based on such clustering, new data entries can be assigned to the clusters as well.
A great example of clustering is spam filtering which is used by any person who has one’s own electronic mail.
Do not mistake clustering for classification though. The second one is another popular way of using labelled data which can be compared to prediction albeit it is working with categorical data rather than with continuous one.