What is Machine Learning? Why Machine Learning?
September 29, 2017
Motivation behind Machine Learning
Sometimes we encounter problems which are really hard to solve by writing a computer program. For example, let’s say we wanted to write a computer program to recognize handwritten digits:
Source: MNIST handwritten database
You could imagine trying to devise a set of rules to distinguish each individual digit. Zeros, for instance, are basically one closed loop. But what if the person didn’t perfectly close the loop. Or what if the right top of the loop closes below where the left top of the loop starts?
A zero that’s difficult to distinguish from a six
In this case, we have difficulty differentiating zeroes from sixes. We could establish some sort of cutoff, but how would you decide the cutoff in the first place? Similarly, we’d need to come up with a list of rules for each digit. As you can see, it quickly becomes quite complicated to compile a list of heuristics (i.e., rules and guesses) that accurately classifies handwritten digits.
There are many more problems that fall into this category. Recognizing objects, converting speech to text, checking if an email is spam, predicting the price of a house, etc. Often, we don’t even know what program to write because we still don’t know how it’s done by our own brains. And even if we did have a good idea about how to do it, the program might be horrendously complicated.
Machine Learning offers a better solution. The ML approach to this problem is, collect a thousand examples of each of the handwritten digits. Then, instead of devising the rules ourselves, we can write an algorithm to learn the patterns from examples. Using this experience, the computer can then solve the same problem in new situations.
Essentially, our goal is to teach the computer to solve by example, very similar to how we might teach a young child to distinguish a cat from a dog.
What is Machine Learning? - Definition
Here’s a widely used definition of Machine Learning coined by Arthur Samuels:
Machine Learning is the field of study that gives computers the ability to learn without being programmed explicitly.
You might ask, what does it mean to learn? Here’s another widely used definition which is more specific and technical:
A computer program is said to learn from experience E with respect to some task T and performance measure P if its performance on task T, as measured by P, improves with experience E. - Tom Mitchell
In our example of recognizing handwritten digits, the task T is recognizing handwritten digits, the experience E refers to the examples of handwritten digits and the performance measure P is the accuracy with which the computer can recognize a digit.
Machine Learning uses computer science and statistics to do two things:
- Make predictions about the future based on data about the past. In our digit recognition problem, the model would look at thousands of examples of handwritten digits, and then form a general idea of how each digit looks like. It would then be able to look at a new image of a handwritten digit and recognise the digit.
- Discover patterns in data (also known as inference). For example, given many examples of handwritten digits, it could discover that digits are usually made of strokes of straight lines, curved lines, sharp angles, etc. These patterns are common to many different digits.
Difference between ML and AI
Machine Learning (ML) and Artificial Intelligence (AI) are highly interconnected fields, and there is no universally agreed upon distinction between the two.
However, in general when people say artificial intelligence, they are usually referring to computers behaving intelligently. Whereas, when people say machine learning, they are referring to making machines (computers) learn certain patterns and then to make predictions using those learnt patterns.
That is, AI makes use of machine learning. Or, machine learning is a subset / component of AI.
What’s a statistical model?
Teaching a computer to make predictions involves feeding data into machine learning models. Models are representations of how the world supposedly works.
For example, let’s say we were training a machine learning model to predict the rent for a house. We know that in general, rent of a house increases with the number of rooms and restrooms.
Based on what we have seen, we may believe that a given house’s rent is, on average, equal to the number of rooms times 1200, plus the number of restrooms times 400. That is,
Rent = Rooms × $1200 + Restrooms ×$400
So, if it has 2 rooms and 1 restroom, then I’ll guess that the rent is probably $2800 / month. If it has 3 rooms and 2 restrooms, I think the rent is$4400 / month.
Here’s the main point: Machine learning refers to a set of techniques for estimating functions (like the one involving rent) based on datasets (room count, restroom count and rent for many many houses). These functions, which are called models, can then be used for predictions of future data.
Here, room count and restroom count are the features or variables, and rent is the target. In this problem, we have 2 features, but in general, we may have many (for example, size of rooms, the year the house was constructed, and so on).
What exactly is being learnt?
To explain what is being learnt in machine learning, let’s take another example application — detecting spam emails. Given some emails, we want to use machine learning to predict which emails are spam and which ones are good.
After looking at some of the spam emails, we may be led to believe that if certain words appear in the email, it is more likely to be spam. For example, the words - ‘credit’, ‘offer’, ‘lottery’, ‘password’, etc might indicate that the email is possibly spam.
Hence, one approach to write a computer program to classify spam emails from non-spam emails, is to maintain a list of words that appear more frequently in spam emails.
When a new email comes in, we split the email into individual words, and if the email has a substantial number of these spammy words, it should be classified as spam.
Although the strategy above might give fairly good results (say detect spam with an accuracy of 80%), the accuracy depends in large part on the list of words we maintain, and on the precise threshold we choose to classify an email as spam.
In machine learning, the strategy is to learn the list of words and the threshold from examples. In fact, in addition to which words are considered spammy, we could also learn how spammy each word is.
So, our machine learning model could look something like this:
- spam score = $\frac{frequency\;of\;'lottery'}{total\;words}\times5$ + $\frac{frequency\;of\;'credit'}{total\;words}\times2$
The higher the spam score, the higher the probability of email being a spam.
So in this case, the thing being learnt is, a notion of how spammy each word is (the numbers 5 and 2 in the equation above).
Note that this is not the only way to frame the problem. We framed the problem this way because we noticed a pattern that spam emails often contain specific words, and then we came up with a strategy that would analyze every possible word as a possible suspect.
Desirable properties of machine learning
You might notice that using machine learning to learn how bad each word is has many desirable properties over maintaining this list manually.
- It reduces the amount of manual work involved in creating the list. Think about how long this list could get if you try to do this manually. Also, if you’re trying to maintain the list manually, how would you deal with hundreds of languages across the world? This task can easily become infeasible without machine learning.
- The same strategy works for other similar tasks. Say we wanted to classify whether a movie review is speaking positively or negatively about a movie. If we were creating lists of words manually, then we would have to create a new list of words manually. But if we learn it, the same algorithm would work given that we already have some data (say ratings and reviews left by users on imdb).
- It updates automatically. Let’s say tomorrow the spammers become more advanced and start typing the word ‘password’ as ‘passw0rd’. Or they might try to sell you insurance, something we haven’t yet encountered. We can simply set the machine learning algorithm to be trained daily, and it will use the new data available and keep adapting over time to changing behavior.
Summary
- For many problems, writing an explicit program as the solution is extremely complicated
- Machine Learning — when computers learn from data.
- Can be used to make predictions or discover patterns in data
- Artificial Intelligence — a broader term referring to any ‘intelligent’ behavior from a computer.
- Machine Learning approaches are usually easier to update and extend.