Explaining an application of machine learning in medicine, line-by-line

This article, published in Nature Communications in May 2021, has a lot of biological jargon that makes it hard to read for the average machine learning practitioner. …

“The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.”

Given a figure comparing some mean values with error bars, the author is likely trying to do one of two things: 1. show the variability of observations in that sample (descriptive statistics), or 2. …

This is a recurring concept that you should make sure you understand

Although Recurrent Neural Networks (RNNs) are relatively old (from the 1980's), they are still the basic model underlying a lot of speech, text, audio, and financial data applications. This article is an introduction to the basic idea of RNNs.

  1. Why? Data like images can often simply be processed one at…

Have you wrangled with concept of LASSO Regression?

L1 Regularization (also called LASSO regression) is used less often than L2 Regularization, but has some key advantages in some situations; the 80–20 Rule (a.k.a. the Pareto Principle), “80% of the consequences come from 20% of the causes”, comes to mind.

Prerequisite Info: Regularization, L2 Regularization

  1. Why? For any given…

You too can understand L2

One of the most common technique to prevent overfitting is called L2 Regularization (because it uses the L2 norm). It is also known as Ridge Regression (from the original 1970 paper) or Weight Decay (in deep learning frameworks, because that’s essentially what it does).

Prerequisite Info: Regularization

  1. Why? Large weights

A regularization strategy motivated by a theory of the role of sex in evolution

The purpose of dropout is alluded to in the title of its 2014 paper by Srivastava et al: “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. It has since become one of the most popular regularization techniques.

Prerequisites: Regularization

  1. Why? Theoretically, a great way to make predictions with…

New idiom: “An Image is Worth 16x16 Pixels”

The 2020 paper: “An Image is Worth 16x16 Pixels: Transformers for Image Recognition at Scale” by Dosovitskiy, A., et al (Google) introduced the Vision Transformer, which at first just seemed like a cool extension of NLP Transformers but which has now proved to be very effective for computer vision tasks.

It took the Transformer, and transformed it to make it even more useful

BERT was introduced in the 2018 paper: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, by Devlin, J., et al (Google) to take on the specific task of creating a good language representation model.

Prerequisite knowledge: Tranformers, transfer learning, encoder-decoder

  1. Why? For computer vision tasks, one can often take…

Batch norm has become the norm

This technique was introduced by the 2015 paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioffe and Szegedy (Google) and has become a staple regularization method for many models ever since.

  1. Why? For each training pass (e.g. each mini-batch), as the parameters of the preceding…

The original super deep ConvNet

The 2014 paper, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, from Oxford’s Visual Geometry Group (VGG) introduced what has become known as VGG16, a well known model that placed second behind Inception-v1 (GoogLeNet) at ILSVRC-2014.

  1. Why? Previous ConvNets (like AlexNet) had typically used pretty large convolution filters, but this…

Jeffrey Boschman

An endlessly curious grad student trying to build and share knowledge.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store