“The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.”

Krzywinski, M., Altman, N. Error bars. Nat Methods 10, 921–922 (2013). https://doi.org/10.1038/nmeth.2659


This is a recurring concept that you should make sure you understand

Image idea from: https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/
  1. Why? Data like images can often simply be processed one at a time by a feed-forward network. However sometimes data is not independent of other data, and you actually need to: 1. input multiple pieces of data together because they rely on each other (e.g. words in a sentence), or 2. the current data is influenced by past inputs (e.g. …


Have you wrangled with concept of LASSO Regression?

L1 Regularization / LASSO Regression encourages sparsity
  1. Why? For any given model, some weights will be more important than others. However, random noise during training will cause some of the less important weights to have influence. …


You too can understand L2

Math to really understand how L2 Regularization / Weight Decay works
  1. Why? Large weights in a neural network are often a sign of a too complex network that has overfit the training data. Therefore, one way to prevent a model from becoming too complex is to stop weights from becoming too large.
  2. What? L2 Regularization is a technique to reduce model complexity by lowering…


A regularization strategy motivated by a theory of the role of sex in evolution

Image modified from Srivastava, Nitish, et al. “Dropout: a simple way to prevent neural networks from overfitting.” The journal of machine learning research 15.1 (2014): 1929–1958.
  1. Why? Theoretically, a great way to make predictions with a given model is to train many versions of it separately and then average the outputs. However, this is a really computationally expensive idea.
  2. What? Dropout is a regularization technique that prevents overfitting by approximating training multiple different neural network architectures in parallel.
  3. How? During training, some hidden units are…


New idiom: “An Image is Worth 16x16 Pixels”

Image modified from Dosovitskiy, Alexey, et al. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020).
  1. Why? For computer vision tasks, all the best models have typically been ConvNets (e.g. ResNet, Vgg, Inception). But with Transformers being successful for NLP tasks, can they be used for computer vision as well?
  2. What? The Vision Transformer (ViT) is a modified NLP Transformer (encoder only) for image…


It took the Transformer, and transformed it to make it even more useful

Example idea from https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-groundbreaking-nlp-framework/
  1. Why? For computer vision tasks, one can often take a pre-trained model like Inception or ResNet and then fine-tune it for their specific task. But for NLP, there has not been a generic pre-trained model. One reason for this is because before transformers, language models have largely been unidirectional (i.e. …


Batch norm has become the norm

Image idea from https://deepai.org/machine-learning-glossary-and-terms/batch-normalization
  1. Why? For each training pass (e.g. each mini-batch), as the parameters of the preceding layers change, the distribution of inputs to the current layer changes accordingly, such that the current layer needs to continuously readjust to new input distributions. This problem is called internal covariate shift.
  2. What? Batch normalization is a regularization technique that standardizes the inputs to each layer, supposedly reducing internal covariate…


The original super deep ConvNet

Modified from https://neurohive.io/en/popular-networks/vgg16/
  1. Why? Previous ConvNets (like AlexNet) had typically used pretty large convolution filters, but this limited how deep the networks could practically be.
  2. What? VGG16 is a typical ConvNet architecture, but one that uses a small convolution filter size and then uses the now-freed-up space to make the network really deep.
  3. How? VGG16 has 16 weight layers: 13 convolutional layers with 3x3 filters (the smallest…


Machine learning inspired by the “we need to go deeper” meme

  1. Why? The most straightforward way to increase CNN performance is to increase their size, but bigger sized models are prone to overfitting and require more computational resources.
  2. What? The Inception-v1 model is an efficient architecture for computer vision that introduced a few fancy techniques to achieve a “deeper” model while keeping a reasonable number of parameters (i.e. computational resources)
  3. How? Inception-v1 uses a combination of: 1. repeated…

Jeffrey Boschman

An endlessly curious grad student trying to build and share knowledge.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store