Basic Concepts of Machine learning - Building Foundation.


 "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."  - Tom M. Mitchell


In simplified and easy terms, Machine Learning is a program or say an algorithm which learns from its past inputs, gains experience from its past inputs and predict the answers without being explicitly programmed.

In normal programming methodology, we have some inputs and we write a program for that input which in turn gives us the outputs.

But, In Machine Learning, we feed the inputs and the outputs of the program, and machine learning algorithm figures what must be the program for that inputs to obtain the desired outputs.

For example, suppose you have a dataset say

X    

Y
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100

Now say if I ask you that if  X = 11 then what will be the value of Y? you can figure out very easily that Y should be 121 as we can see that the relationship between X and Y is Y = X2

But, Can our computer can find it without being explicitly program for the formula of  Y = X ?

Yes, we can make that possible by using the same process our brain uses to find the answers.

Let's take a minute and think about how your brain figure out the relationship?

So, first, your brain looks at the dataset observes the pattern between X and Y and figures out the formula!

So basically your brain learns the formula from the given dataset or, I can say it learns from the past experience!.

Similarly, In machine learning, the computer learns from the past experience which is provided by the dataset and find out the relationship between inputs and outputs.

Some Professional Terms In Machine Learning.

So now as you are aware of the process, let's introduce you with some basic Machine Learning terms.

The first very important thing in Machine Learning is the Dataset. It is the dataset from which machine learns or gains experience.

Let's have a look at our example dataset again.


X    

Y
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100

We have 2 Columns and 10 Rows, But in Machine Learning we say we have 2 Features and 10 Instances.

Columns  =  Features
Rows        =  Instances

Okay, let's move on

We never feed the entire dataset to our computer. We generally divide our dataset into two sub-datasets which we call Training Set and Test Set.

For example, the above dataset can be divided into two sub-dataset as

Training Set

X    

Y
1 1
2 4
3 9
4 16
5 25
6 36
7 49

Test Set

    Y
8      64
9      81
10     100


We use the Training Set from which the computer learns the pattern and come up with the formula, whereas we use the Test set to check whether the formula generated by the computer is correct or not.

Data-Points

Often in machine learning, we treat the features as the basis vectors or the axis of a coordinate system.

For example, In the above example, we have two features X and Y. For mathematical calculations that involve a linear algebraic approach, We consider these features as dimensions or the axis of the coordinate system.

Considering the features as the axis of a coordinate system, the number of instances becomes our no. of points of that coordinate system.

For example, the row which contains X = 8 and Y = 64 is now a set of points belong to that coordinate system having 2 axes as X, Y.

Hence all the instances in this approach are called as data-points. Converting our data-set as linear algebra or say in a coordinate system helps us to visualize the trend or behavior of our data-set as we can point all the data-points and can estimate the model as a function represented by the shape we got after pointing all the data-points.

The Model

The model is the formula generated by our Machine Learning algorithm, For example in the above dataset, the model will be the formula Y = X

So, That's it. here is what you should know about before starting to learn Machine Learning.

For fun do remember...
Good dataset means good experiance or good learning, good learning means good behaviour, Bad Dataset means you have created a Terminator😉

 The Model Evaluation

After estimating the model or the formula generated by the machine learning algorithm, we check the accuracy of the model. We check if the model generated by the algorithm is correct or not.

There are many different ways from which we can evaluate our model. In general, we find out the error generated by that formula.

Error is calculated as the Actual value of the output - predicted value of the output.

Every model will generate some error. We evaluate our model from the amount of the error generated by the formula. 

If the error is more, the model is said to under-fitting or poor.
If the error is 0, the model is said to be biased or over-fit.
If the error is minimum, the model is good.

Confused? See if your error is 0, In that case, the model is correct for that data-set only which is used to gain the experience. if we fed the model some other value that is not present in that data-set, then the model will generate the wrong output.

So, we always look for minimum error. The model which generates minimum error is the ideal model.

We can calculate the minimum error as a mean squared error.

Mean Squared Error = mean((actual output - predicted output)**2)

Mean of the square of the actual-predicted value.

Next, we will see What are the different types of machine learning algorithms.