Header Ads

Machine Learning - Simple Linear Regression



    In this post , we will see how to make a Simple linear Regression Model in Machine Learning.

     A simple linear regression model is a model which follow the equation -
                                                  Y = aX + b
       Where , Y is our dependent variable which we are going to predict . X is the dependent feature available to us , a is the coefficient of X , b is the initial value of Y when X=0 .

     We will see how to implement this model in Machine Learning and predict the dependent variable.


  • Data Preprocessing
In our last post , we already discussed about Data preprocessing . So, here we will directly see the code .
Click Here for Download the Data set.

#Machine Learning series Simple_linear_regression #created by @the ai datascience
 import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

 #import_datset
 dataset = pd.read_csv('Salary_Data.csv')
x = dataset.iloc[:,:-1 ].values
y = dataset.iloc[:, 1 ].values

 #spliting_deataset_into_test_and_training_data
 from sklearn.cross_validation import train_test_split
 x_train , x_test , y_train , y_test = train_test_split(x, y , test_size = 1/3, random_state = 0)


Here we take 30% of our data set for testing purpose and 70% for training purpose.


  • Fitting the Dataset and Prediction
Here we will fit the training data set into simple linear regression model using regressor object  and train the model.Then predict the dependent variable using testing data .

#fitting_simple_linear_regression_to_the_training_set
from sklearn.linear_model import LinearRegression
 regressor = LinearRegression()
regressor.fit(x_train, y_train)

 #predict_test_data
y_pred = regressor.predict(x_test)



  • Visualize the Model using training Data
here we will plot the  training data .Then we will draw the regression line training data and prediction from training data using matplolib library .

#visualizing training data
plt.scatter(x_train , y_train , color = 'red' )
plt.plot(x_train , regressor.predict(x_train) , color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('years of experience')
plt.ylabel('salary')
plt.show()


Here is the Output-
From this Diagram we can say that though our all the training data does not match with the prediction line , so our  model is not perfectly predictable in linear regression . But somehow the real values are near to the predicted values.

  • Visualize the Model using test Data
here we will plot the  test data .Then we will draw the regression line training data and prediction from training data using matplolib library .

#visualizing test set data
plt.scatter(x_test , y_test , color = 'red' )
plt.plot(x_train , regressor.predict(x_train) , color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('years of experience')
plt.ylabel('salary')
plt.show()

Here is the output-
Now we can say that our model works great on the test data as maximum real data points lies on our predicted regression line.


That's all for Simple Linear Regression. In the next post we will see the snippets of Multi-linear Regression.
Stay Tuned :)


No comments

Powered by Blogger.