Implementing Computer Vision Models with Python APIs

Understanding the strengths of the various deep learning APIs will help in the development of your model. As we’ve mentioned previously, this space is rapidly evolving. With that in mind, we won’t go too in-depth, as the information is likely to become outdated quickly.

Key Python Frameworks

In our Deep Learning Foundations course as part of the PracticumAI Beginner Series, we touched on the two most popular AI frameworks, Google’s TensorFlow and Meta’s (formerly Facebook AI Research) PyTorch. In addition to those, there is another important framework for computer vision tasks, OpenCV. While not a dedicated deep learning framework, OpenCV is an essential toolkit for computer vision. It is optimized for real-time image and video processing.

OpenCV’s strengths include:

Extensive collection of image processing algorithms.
Excellent for rapid prototyping and real-time computer vision applications.

The notebooks in this course will use PyTorch for most things, but will also draw on OpenCV, Yolo and more.

API Considerations for Computer Vision

Ease of Use: Keras (within TensorFlow) offers a particularly beginner-friendly approach, especially for those without extensive deep learning experience. Though its popularity has been declining in recent years. As such, most Practicum AI courses use, or are transitioning to, PyTorch.

Pre-trained Models: Both TensorFlow and PyTorch offer wide selections of pre-trained CNN and ViT models to speed up development and leverage state-of-the-art research. In this course, we will use PyTorch and PyTorch Lightning.

Community and Support: Both have plenty of resources to help you.

Additional Libraries/Frameworks

Although we focus on TensorFlow and PyTorch, there are many other frameworks that can also be useful for computer vision tasks. Here are two that you might want to investigate if you have time and interest:

FastAI: A high-level library built upon PyTorch, simplifying model development and training. Kornia: A PyTorch-based library specifically focusing on computer vision tasks.

Installing Frameworks

Depending on your compute situation, you may have to manage your own virtual environment. Managing virtual environments, installing frameworks and other supporting software, and navigating between repos and large directories are some of the trickier elements in model development. A few paragraphs would not do the topic justice. Be sure to check out our forthcoming Power Tools for Compute course for more information on these topics. If you are using managed compute resources, such as in most HPC scenarios, you will need to do some research into what software is already available and what the local process is for getting new software installed.

For this course, instructions are provided for running the notebooks on Google Colab (where packages are installed with pip each time you run the notebook) or on HiPerGator (where we have pre-staged an environment, and the 00_kernel_setup.ipynb notebook will help you get the Jupyter kernel set up).

YOLO!

When we started developing the Practicum AI courses, we focused on TensorFlow for its relative simplicity. Ideally, we would continue using TensorFlow for all our notebooks. As you may be figuring out, keeping up in the rapidly changing field of AI requires adapting to new frameworks. As it turns out, when we developed the object detection and segmentation notebooks for this course, it became clear that the best models and most-supported tools for these tasks are the YOLO models (You Only Look Once), which are Pytorch-based. As such, we opted to use those and add some supporting material in the notebooks to help you apply your understanding of deep learning and TensorFlow to Pytorch.

Return to Module 2 or Continue to Data Manipulation for Computer Vision Models