Regression analysis is a form of predictive modelling technique that establishes the relationship between single or multiple independent variable/s and one dependent variable. This technique is used for forecasting or finding the cause-and-effect relationship between the variables. Thehttps://datapatrons.com/ sign of a regression coefficient tells us whether there is a positive or negative correlation between each independent variable and the dependent variable. A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase.
There are various benefits of using Regression Analysis.
- It indicates the significant relationships between dependent variable and independent variable/s.
- It indicates the strength of impact of multiple independent variables on a dependent variable.
There are various kinds of Regression techniques available to make predictions or to understand the cause-and-effect relationship.
Types of Regression Techniques
- Linear Regression: In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. It is used extensively and is perhaps the first topic one would want to learn if learning predictive modelling techniques. In Linear Regression the dependent variable is continuous while the independent variable can be continuous or discrete. The regression line is linear and is considered line of best fit that aims to minimize the error or residual. The line is represented by the equation y=b0+b1x1+b2x2+….+bnxn+e ;where y is the dependent variable, x1,x2…xn are the independent variables, e is the epsilon, b0 is the slope and b1,b2..bn are the coefficients for the respective independent variables.
- Logistic Regression: In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events. Here the dependent variable is categorical and independent variable can be categorical or continuous. The model predicts the probability of occurrence of an event by fitting data to a logit function. The Logit Function is formed as p=e^y/(1+e^y) where p is the probability and y = b0+b1x1+b2x2+…+bnxn. The Logistic Regression line takes the shape of a Sigmoid curve where on the lower end the curve asymptotically reaches 0 and on the upper end asymptotically reaches 1 but never goes below 0 or above 1.
- Polynomial Regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. A regression equation is a polynomial regression equation if the power of independent variable is more than 1. The equation below represents a polynomial equation. The equation is formed as y= b0+b1x1^2+b2x2^2+…+bnxn^2. In Polynomial regression the best fit line is a curved line that fits into the data points.
- Stepwise Regression: This form of regression is used when we deal with multiple independent variables. The aim of this model is to maximize prediction using minimum number of independent variables. In this technique, the selection of independent variables is done with the help of an automatic process. Some of the commonly used Stepwise Regression methods are Standard Stepwise Regression, Forward Selection and Backward Elimination.
- Ridge Regression: Ridge regression is a model tuning method that is used to analyze any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values to be far away from the actual values. Ridge regression solves the multicollinearity problem through shrinkage parameter λ (lambda). This model shrinks the value of coefficients but doesn’t reaches zero.
- Lasso Regression: Lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. Lasso regression differs from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing values which causes some of the parameter estimates to turn out exactly zero. Larger the penalty applied, further the estimates get shrunk towards absolute zero. This results to variable selection out of given n variables.
- ElasticNet Regression: Elastic Net is an extension of linear regression that adds regularization penalties to the loss function during training. This method is hybrid of Lasso and Ridge Regression techniques. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.