Welcome to the world’s best blog on machine learning. Here we will talk about the difference between machine learning and traditional programming.
Whenever you search for something on google, google will display various links related to your search query, and most probably your query gets resolved just after clicking 1st or 2nd link, but wait there are thousands of links available for the same query, how is it possible to reach to correct page in the 1st or 2nd link? and the answer is due to machine learning, Google uses ML algorithm to rank a page.
Every time you open your Gmail to look for mail, it has already filtered the spam from the genuine one using an ML algorithm, each time you look for weather before going on a trip uses ML to predict the correct weather. But always remember no algorithm can predict the result with 100% accuracy. That’s the reason why sometimes you have to move to a spam filter to look for a genuine email or you might face heavy rainfall on your trip.
There are many other examples of machine learning, like the Netflix movie recommendation system, how youtube suggests videos, and how amazon recommends your product this is happening due to machine learning, How? will figure it out?
How is machine learning different from traditional programming?
in traditional programming, you have to explicitly type every possible input and output, but what if you are writing a program for speech-to-text conversion how will you do it using traditional programming? speech is also considered as the movement of particles at 44khz frequency, which means 44,000 data points for 1 sec. So even for 1 second, we have to keep 44K points in consideration which is not possible for any person in this world, that is the reason we want the machine to learn themselves without being explicitly programmed.
That’s why ML came into the picture, now we will look at
- how machine learning algorithms work.
- how can we write our own machine-learning algorithm?
- What are the various algorithms available?
- What is a neural network and how does it work?
Before moving into the concept of machine learning let us first talk about a real machine learning model, which I have created using python. Here I have used NumPy and pandas for data manipulation and scikit learn for machine learning algorithms.
Using the models, we will predict the type of fruit using various feature values like width, height, and mass of fruit. let us first import required libraries like NumPy and pandas, then we’ll explore our dataset, in this dataset, we have labels(fruit name) and feature values(mass, height, width).
This dataset has 67 values but we are only showing the first five because of less space, one more thing to notice here is that we have one more label named fruit_label this is because our model can only predict mathematical values, so if our output is 1 it means the fruit is “apple” and if the output is 3 it means that the fruit is “orange”.
Now, let’s split the dataset into training and test sets. Why? because we want to check, how well our model is doing on unseen data, here training set is used to train the model, and a test set is used to check how well our model is doing on unseen data, we’ll talk about the training set and test set in detail later, now to split data into 2 parts we are using sklearn train_test_split function, as shown.
here we have 2 variable x and y, x contains all the feature values and y contain all the labels, after splitting the data you can see we have 50 rows and 3 columns in x_train(i.e. training feature values) and 17 rows and 3 columns in x_train, and so on.
The training set always has more examples than the test set and the reason is clear we need to show more data to train than to test.
Now let’s use a linear regression algorithm to train our model, but wait! How will this linear regression algorithm work? we’ll see how this works, and we’ll see what’s going inside this, but first, see a real example of implementation.
So that’s it, here we have created our first model to predict the type of fruit, but our accuracy is just 43.92 which is the worst, it means using a linear regression algorithm for this particular problem is not a great idea, so how can we choose the best algorithm for our problem?
Note – Before applying any algorithm first we need to think of supervised learning or unsupervised learning.
What is the difference between supervised learning and unsupervised learning?
Supervised learning is a type of problem in which we have some feature values and we want to predict a label, or we can say if in our dataset we have both features(like height, width, mass in the above example) and label(like fruit name) we can use supervised learning, whereas if we only have feature and we want to cluster them into groups we can use unsupervised learning, we will see some real examples of clustering or unsupervised learning later on.
In the above example had we used supervised learning or unsupervised learning?
In the above examples, we have used a supervised learning method which ideally we should do, but where we went wrong?
Before solving any machine learning problem 1 more thing that we have to keep in mind is, the problem that we are solving is a regression type or classification type problem. now let’s talk about this.
What is the difference between a classification-type problem and a regression-type problem?
- In the above example is it worth using a regression algorithm?
- When should we use a classification algorithm?
- When should we use a regression algorithm?
A regression algorithm is used whenever we want to predict continuous values for e.g. let’s assume we want to find, how many runs will India score in today’s match, it could be 270, 271,300,305, and so on… that’s the example of a regression problem, but on the other hand in a classification problem, we have to predict discrete values for e.g. if a fruit given is orange, apple, or mango.
Now we know that in the above example we should use a classification algorithm as we are predicting only discrete values, but we have used a regression algorithm that’s why everything went wrong, so always remember choosing a better algorithm is always based on the success of your machine learning project.
- Always first think of the problem that you want to solve, is it a classification or regression-type problem?
- Do you need to use supervised learning or unsupervised learning based on the dataset and problem statement?