Given a figure comparing some mean values with error bars, the author is likely trying to do one of two things: 1. show the variability of observations in that sample (descriptive statistics), or 2. show how well the sample mean represents the population mean (inferential statistics; where the term “statistical significance” comes into play).
Unfortunately, people often forget this and misinterpret error bars — for example, people often assume that a gap between bars of two sample means implies a statistically significance difference (i.e. statistical significance, given a standard P value of 0.05, …
Although Recurrent Neural Networks (RNNs) are relatively old (from the 1980's), they are still the basic model underlying a lot of speech, text, audio, and financial data applications. This article is an introduction to the basic idea of RNNs.
L1 Regularization (also called LASSO regression) is used less often than L2 Regularization, but has some key advantages in some situations; the 80–20 Rule (a.k.a. the Pareto Principle), “80% of the consequences come from 20% of the causes”, comes to mind.
One of the most common technique to prevent overfitting is called L2 Regularization (because it uses the L2 norm). It is also known as Ridge Regression (from the original 1970 paper) or Weight Decay (in deep learning frameworks, because that’s essentially what it does).
Prerequisite Info: Regularization
The purpose of dropout is alluded to in the title of its 2014 paper by Srivastava et al: “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. It has since become one of the most popular regularization techniques.
The 2020 paper: “An Image is Worth 16x16 Pixels: Transformers for Image Recognition at Scale” by Dosovitskiy, A., et al (Google) introduced the Vision Transformer, which at first just seemed like a cool extension of NLP Transformers but which has now proved to be very effective for computer vision tasks.
BERT was introduced in the 2018 paper: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, by Devlin, J., et al (Google) to take on the specific task of creating a good language representation model.
This technique was introduced by the 2015 paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioffe and Szegedy (Google) and has become a staple regularization method for many models ever since.
The 2014 paper, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, from Oxford’s Visual Geometry Group (VGG) introduced what has become known as VGG16, a well known model that placed second behind Inception-v1 (GoogLeNet) at ILSVRC-2014.
The 2014 paper: “Going deeper with convolutions” from Google introduced the Inception module architecture, which has come to be known as Inception-v1 or GoogLeNet (which was the team-name when they won ILSVRC 2014).