YOLOv1 network — the main points/most important concepts to remember

I wish I started making flashcards a long time ago

Jeffrey Boschman
4 min readJun 24, 2022

This post will cover the flashcard content I made to remember the most important points about the YOLOv1 algorithm, originally developed by Joseph Redmon in 2015 for object detection.

Eventually I will make a post describing why I make flashcards, the software I use (Anki), and how exactly I make and use them. However, this post will just show what my flashcards look like in case anyone else wants to replicate them. The following gif shows an example:

This post will not go into detail describing the YOLOv1 algorithm — there are many other resources for that, including:

What the main flashcards look like

Card 1

If you are using Anki, this is the Cloze deletion text:

The name of the YOLO object detection algorithm is because {{c1::YOLOv1 was one of the first single-stage object detector models — which means that it only has to look at the image once (YOLO stands for You Only Look Once)}}

Card 2

If you are using Anki, this is the Cloze deletion text:

A single-stage object detector model means that it does {{c1::object identification (e.g., predicting bounding boxes)}} and {{c1::classification}} in an end-to-end differentiable network.

Card 3

If you are using Anki, this is the Cloze deletion text:

The YOLOv1 model was the first to frame object detection as a {{c1::regression}} problem.

Card 4

If you are using Anki, this is the Cloze deletion text:

The YOLOv1 model architecture is 24 convolution layers followed by 2 fully connected layers, with alternating {{c1::1×1 convolution}} layers reducing the feature space from preceding layers. The output is a tensor with height 7, width 7, and depth ~30, which can be thought of as {{c1::dividing the input image}} into a 7×7 grid and having ~30 features that relate to {{c1::the bounding boxes and confidences in the output.}}

Card 5

If you are using Anki, this is the Cloze deletion text:

The YOLOv1 loss function is {{c1::a modified sum-squared error loss.}}

The Extras

All the flashcards above have a section called “Extra” that only shows up after I already look at the answer on the back of the card (see the gif at the beginning). This extra information is useful if I struggled with a card, and after I see the answer want to look at some text or images that describe some of the algorithm’s details or give some more context.

The first three figures are from the original paper: You Only Look Once: Unified, Real-Time Object Detection

The next two images are screenshots from this blog post by Sik-Ho Tsang.

There you go! I hope this helps some people trying to be life-long learners like me!

Originally published at http://fiveminutemachinelearning.wordpress.com on June 24, 2022.

--

--

Jeffrey Boschman

An endlessly curious grad student trying to build and share knowledge.