Regression Family Members

Balaji Subudhi

Jan 8
10 min

Regression Family Members
Linear and Logistic regressions are usually the first algorithms I learnt in predictive modelling. I used to think they are the only regression models available may be due to their wide popularity and acceptance.

The truth is that there are innumerable forms of regressions, which can be performed. Each form has its own importance and a specific condition where they are best suited to apply. And few of them I tried to introduce to you.

Linear regression: This is the father of all the regression. In statistics, linear regression is an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. (This term should be distinguished from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.)

Logistic regression: In statistics, logistic regression, or logit regression, or logit model is a regression model where the dependent variable is categorical. This regression technique covers the case of binary dependent variables—that is, where it can take only two values, such as pass/fail, win/lose, alive/dead or healthy/diseased. Cases with more than two categories are referred to as multinomial logistic regression, or, if the multiple categories are ordered, as ordinal logistic regression.

Ridge regression: The Ridge regression is a technique which is specialised to analyse multiple regression data which is multicollinearity in nature. Ridge regression performs L2 regularisation. Here the penalty equivalent is added to the square of the magnitude of coefficients. The minimisation objective is as followed.

Taking a response vector y ∈ Rn and a predictor matrix X ∈ Rn×p, the ridge regression coefficients are defined as

Here λ is the turning factor that controls the strength of the penalty term.

If λ = 0, the objective becomes similar to simple linear regression. So we get the same coefficients as simple linear regression.

If λ = ∞, the coefficients will be zero because of infinite weightage on the square of coefficients as anything less than zero makes the objective infinite.

If 0 < λ < ∞, the magnitude of λ decides the weightage given to the different parts of the objective.

In simple terms, the minimisation objective = LS Obj + λ (sum of the square of coefficients)

Where LS Obj is Least Square Objective that is the linear regression objective without regularisation.

As ridge regression shrinks the coefficients towards zero, it introduces some bias. But it can reduce the variance to a great extent which will result in a better mean-squared error. The amount of shrinkage is controlled by λ which multiplies the ridge penalty. As large λ means more shrinkage, we can get different coefficient estimates for the different values of λ.

Lasso regression: Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimises prediction error for a quantitative response variable.

The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model.

Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both.

To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you haven’t already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximise your experience with lasso regression analysis.

Ecological regression: This is a statistical technique used especially in political science and history to estimate group voting behavior from aggregate data. For example, if counties have a known Democratic vote (in percentage) D, and a known percentage of Catholics, C, then run the linear regression of dependent variable D against independent variable C. This gives D = a + bC. When C = 1 (100% Catholic) this gives the estimated Democratic vote as a+b. When C = 0 (0% Catholic), this gives the estimated non-Catholic vote as a. For example, if the regression gives D = .22 + .45C, then the estimated Catholic vote is 67% Democratic and the non-Catholic vote is 22% Democratic.

Bayesian regression: In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model's parameters.

Quantile regression: This is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares results in estimates of the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating either the conditional median or other quantiles of the response variable. Essentially, quantile regression is the extension of linear regression and we use it when the conditions of linear regression are not applicable.

Quantile regression is desired if conditional quantile functions are of interest. One advantage of quantile regression, relative to the ordinary least squares regression, is that the quantile regression estimates are more robust against outliers in the response measurements. However, the main attraction of quantile regression goes beyond that. Different measures of central tendency and statistical dispersion can be useful to obtain a more comprehensive analysis of the relationship between variables

LAD regression: Least absolute deviations (LAD), sum of absolute deviations, or the L1 norm condition, is a statistical optimality criterion and the statistical optimisation technique that relies on it. Similar to the popular least squares technique, it attempts to find a function which closely approximates a set of data. In the simple case of a set of (x,y) data, the approximation function is a simple "trend line" in two-dimensional Cartesian coordinates. The method minimises the sum of absolute errors (SAE) (the sum of the absolute values of the vertical "residuals" between points generated by the function and corresponding points in the data). The least absolute deviations estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution.

Jackknife regression: In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife predates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size n, the jackknife estimate is found by aggregating the estimates of each (n-1)-sized sub-sample.