L1 Regularization | one minute summary

Have you wrangled with concept of LASSO Regression?

Jeffrey Boschman
One Minute Machine Learning

--

L1 Regularization / LASSO Regression encourages sparsity

L1 Regularization (also called LASSO regression) is used less often than L2 Regularization, but has some key advantages in some situations; the 80–20 Rule (a.k.a. the Pareto Principle), “80% of the consequences come from 20% of the causes”, comes to mind.

Prerequisite Info: Regularization, L2 Regularization

  1. Why? For any given model, some weights will be more important than others. However, random noise during training will cause some of the less important weights to have influence. Therefore, one way to prevent a model from overfitting to noise (and also make it easier to see which features are actually important) is to get rid of weights that are less predominant.
  2. What? L1 Regularization is a technique to reduce model complexity by zeroing out some less important weights (i.e. it encourages sparsity), thereby making a model more visually interpretable.
  3. How? L1 Regularization adds the absolute value of a weight as the penalty term to the loss function (multiplied by a lambda hyperparameter). This means that during gradient descent, all weights are repeatedly penalized, such that only the weights that are important (i.e. are repeatedly increased in the positive direction because training examples use them, not just because of random noise) will survive, and the rest will go to 0.

--

--

Jeffrey Boschman
One Minute Machine Learning

An endlessly curious grad student trying to build and share knowledge.