L1 Regularization | one minute summary

Have you wrangled with concept of LASSO Regression?

Published in

One Minute Machine Learning

2 min readJul 30, 2021

L1 Regularization / LASSO Regression encourages sparsity

L1 Regularization (also called LASSO regression) is used less often than L2 Regularization, but has some key advantages in some situations; the 80–20 Rule (a.k.a. the Pareto Principle), “80% of the consequences come from 20% of the causes”, comes to mind.

Prerequisite Info: Regularization, L2 Regularization

Why? For any given model, some weights will be more important than others. However, random noise during training will cause some of the less important weights to have influence. Therefore, one way to prevent a model from overfitting to noise (and also make it easier to see which features are actually important) is to get rid of weights that are less predominant.
What? L1 Regularization is a technique to reduce model complexity by zeroing out some less important weights (i.e. it encourages sparsity), thereby making a model more visually interpretable.
How? L1 Regularization adds the absolute value of a weight as the penalty term to the loss function (multiplied by a lambda hyperparameter). This means that during gradient descent, all weights are repeatedly penalized, such that only the weights that are important (i.e. are repeatedly increased in the positive direction because training examples use them, not just because of random noise) will survive, and the rest will go to 0.

L1 Regularization | one minute summary

Have you wrangled with concept of LASSO Regression?

Written by Jeffrey Boschman