What is YOLO

YOLO (You Only Look Once) is CNN-based Object Detector written by Joseph Redmon (retired from CV back in 2018).

The original work of YOLO was the first object detection network to combine the problem of drawing bounding boxes and identifying class labels.

It used a custom framework called Darknet for real time object detectors. YOLO predicts object bounding boxes together with class labels from images.

YOLO adds a grid system to an existing image, where each grid detects objects separately, but later the grid detections are combined.


YOLO and YOLO2 used a nn.Linear layer at the end to detect objects. These models could predict a number of bounding boxes.

To predict multiple bounding boxes, we’ll need to create a grid system with a number of anchor boxes (grid cells).

Starting from YOLO3 nn.Conv2D is used for the final detection based on the SSD (Single Shot Detection) paper.

YOLOv2 made a number of iterative improvements on top of YOLO including BatchNorm, higher resolution, and anchor boxes.

It was YOLO3 when YOLO architecture switched to using SSD and since then kept this method.


The key part of the YOLO4 model was data augmentation.

YOLO4 use the focal Loss designed to address an extreme imbalance between positive and negative samples (1:1000) and to improve negative sample confidence.

Previously Cross Entropy Loss was used to train the model.


YOLO5 is nearly 90 percent smaller than YOLO4. It was released by Glenn Jocher founder of Mosaic Augmentation.

YOLO format (darknet)

This format contains one text file per image (containing the annotations and a numeric representation of the label) and a label map which maps the numeric IDs to human readable strings. The annotations are normalized to lie within the range [0, 1] which makes them easier to work with even after scaling or stretching images. It has become quite popular as it has followed the Darknet framework’s implementations of the various YOLO models.

Roboflow can read and write YOLO Darknet files so you can easily convert them to or from any other object detection annotation format. Once you’re ready, use your converted annotations with our training YOLO v4 with a custom dataset tutorial.


The initial release of YOLOv5 is very fast, performant, and easy to use. While YOLOv5 has yet to introduce novel model architecture improvements to the family of YOLO models, it introduces a new PyTorch training and deployment framework that improves the state of the art for object detectors.