Data Augmentation

Computer vision models can get too comfortable with your training data. This leads to overfitting – they become great at recognizing the specific images shown in the training set but stumble on new ones. Data augmentation is one of our secret weapons to combat this! Data augmentation takes the images in the training set and makes altered, or transformed, copies to add variation to the training data. Put simply, augmentations are like showing your model the same object from different angles, with different lighting, different levels of zoom, etc. This forces it to learn the core features that matter, not memorize specific details of the images in the training data. What sorts of transformations are available depends on the framework you use: below, we will cover some of the most common options found across frameworks.

Transforming Images

Here are examples of common image transformations:

  • Rotations: Turning images clockwise or counterclockwise.
  • Flipping: Vertical and horizontal flipping of an image.
  • Cropping: Zooming in on different parts of the image.
  • Shearing: Distorts the image along an axis to simulate looking at the image at a different angle.
  • Color Jitter: Changing brightness, contrast, etc., so the model doesn’t rely too much on color or specific lighting circumstances.
  • Noise Addition: Simulating real-world imperfections.

These are typically used in some combination, with the goal to augment the images in ways that will likely be found in future input data. For example, a model meant to recognize apples and oranges would probably benefit from rotation, large changes to color could cause problems for the model.

Some things to keep in mind while using data augmentation:

  • Augmentations are applied randomly during training. Each time the model sees an image from the training set, it’s slightly different!
  • Not all computer vision tasks are equally augmentable or use the same augmentation functions. When using a tool to augment your data, make sure that the transformations work with the constraints of your model. For example, transformations on an object detection dataset would also need to modify the labels; otherwise, the boxes for each image would no longer correctly bound the area they were meant to.
  • Augmentation usually comes as part of a Data Generator function. To save space and memory, these functions create the transformed images before feeding them into the model. While this has obvious upsides, it does mean that you must be more careful when coding. You must define how many batches the model should process per epoch, or else the Data Generator will keep creating images… forever!
  • Consider what aspects of the objects you are trying to classify are important for the model to learn. If color is important in identifying a class, carefully consider whether color augmentations will help or hurt the model. Similarly, flipping images of signs will likely not help a model distinguish “no left turn” signs from “no right turn” signs.
  • Experimentation is key. Start with a reasonable set of augmentations and observe how they affect your model’s performance.

Return to Module 3 or Continue to Navigating Hyperparameter Space