Regression Explanation [ols-Maths ,Evaluation Matrix,Assumptions….Full in depth explaned]
Read this blog one’s before going for the Regression ML model

so’ let's get started
#Index
-Regression : [Linear Regression, when to use,formula]-Assumptions of Linear Regression :[Linear Relationship, Multicollinearity, Homoscedasticity, Autocorrelation, Normal Distribution of error terms]
-Mathematics of OLS : [y=mx+c]
-Model Evaluation Matrix :[MAE,RMSE, R square,Adjusted R-squared]
Regression : [Linear Regression, when to use , formula]
!when scatter plot of our data looks like below image

Linear Regression: Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). if there are more than 1 feature then it is called multiple linear equations
when to use: It is used when we want to predict the value of a variable(dependent) based on the value of another variable(independent). The variable we want to predict is called the dependent variable (or sometimes, the outcome variable).
Formula: y= Dependent column value, x=Independent column value
b= ‘b’ in the formula is the slope of the line so we may have different naming for slope like m,b, B1,θ1 easy way to remember it whenever we have any alphabet with X in formula its a slope because you may find a formula for regression in different pattens
y=mx+c, y=θ0 +θ1 x1, y=B0+B1x, and so on
a= ‘a’ in the formula is an Intercept on y-axes when we pass a line on data the line get intersect on y you can see in the image below

-Assumptions of Linear Regression :[Linear Relationship, Multicollinearity, Homoscedasticity, Autocorrelation, Normal Distribution of error terms]
Linear relationship: According to this assumption, the relationship between response (Dependent Variables) and feature variables (Independent Variables) should be linear.

Multicollinearity: when independent variables in a regression model are correlated. Remove the highly correlated independent variables. little Multicollinearity is ok but tries to remove Multicollinearity from data
Homoscedasticity: Homoscedasticity describes a situation in which the error term (that is, the “noise” or random disturbance in the relationship between the independent variables and the dependent variable) is the same across all values of the independent variables.

Autocorrelation: There should be little or no autocorrelation in the data. Autocorrelation occurs when the residual errors are not independent of each other. Autocorrelation when col1 data row1 correlation with row2,row2 with row3 and soo on….

Normal Distribution of error terms: A common misconception about linear regression is that it assumes that the outcome Y is normally distributed. Actually, linear regression assumes normality for the residual errors varepsilonε, which represent variation in Y not explained by the predictors.
Why you should care about these assumptions?
In a nutshell, your linear model should produce residuals that have constant variance and are normally distributed, features are not correlated with themselves or other features, etc. If these assumptions hold true, the OLS procedure (discussed in the next chapter) creates the best possible estimates for the coefficients of linear regression.
Another benefit of satisfying these assumptions is that as the sample size increases to infinity, the coefficient estimates converge on the actual population parameters.
Mathematics of OLS : [y=mx+c]:
we want to find a line that minimizes all the distance the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances.
MATHS:y=mx+c
Let do this’ with an example:

suppose we have data like this
Room size as X1 features we have one feature column we may have k number of features columns and Room price column as target Y
note: This below numeric example is just to explain math note the best perfect model
Let us first calculate the mean of X
and y
xˉ=10 +16 +12 +21 +27=86
…………….…5…………
yˉ=180 +288 +216 +378+ 486=1548
…………………5………………
We now need to calculate ∑ y ∗ x
∑ y ∗ x=(180*10)+(288*16)+(216*12)+(378*21)+(486*27)=30060
Next, we need to calculate ∑ x²
∑ x ²=10² +16² +12² +21² +27²=1670
Now we will calculate the sum of cross deviations and the sum of squared deviations
sum of cross deviations:
SSxy=i=1∑ yi*xi−nxˉyˉ ………………….n is 5 length of number-of-row in table
SSxy=30060–5*86*1548=-635580
sum of squared deviations
SSxx=i=1∑n(xi−xˉ)²=i=1∑nxi²−n(xˉ)²………………….n is 5 length of number-of-row in table
SSxx=5*1670–5(86)²=-28630
Now that we have all the values let us calculate the slope and intercept
slope=m=θ1= SSxy =-635580 =22.1
……………….. SSxx… -28630
intercept=θ0=yˉ−θ1xˉ=1548-22.1*86=-352.6
So now if we want to predict house price using a linear regression model fit on these 10 data points we can use the following equation

Prediction time
y=mx+c
m=slope=22.1
c=intercept=-352.6
Room Price = -352.6+ 22.1*24=177.8
test data y=432 a y_predict =177.8 so the value is not accurate which means our model has not predicted well on unseen data because we have less data for training so
y-y_predict=error or loss function 432–177.8= 254.2 this is known as Residual
Residual: This difference between the actual value of the target and the predicted value is called the residual. You should always select the line with the least residual
-Model Evaluation Matrix :[MAE,RMSE, R square,Adjusted R-squared]
Mean Absolute Error[MAE]: The mean absolute error (MAE) is the simplest regression error metric to understand. We’ll calculate the residual for every data point, taking only the absolute value of each so that negative and positive residuals do not cancel out. Mean absolute error is nothing but the average of absolute values of these residuals.
Root Mean Squared Error[RMSE]: Root mean Square Error (RMSE) is nothing but the square root of the mean/average of the squares of all the errors. Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how to spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.
R-squared: The most common interpretation of r-squared is how well the regression model fits the observed data. For example, an r-squared of 60% reveals that 60% of the data fit the regression model. Generally, a higher r-squared indicates a better fit for the model.
Adjusted R-squared: So how is R-squared different from Adjusted R-squared? R-squared tells you how well your model fits the data points whereas Adjusted R-squared tells you how important is a particular feature to your model.
