Let’s look at how we’ve reached today’s incredible level of speed and accuracy with computer vision tasks.
1950s-1970s - The Dawn of Computer Vision: Hubel and Wiesel’s Pioneering Work
Computer vision had its conceptual roots laid by David H. Hubel and Torsten Wiesel in the late 1950s and 1960s. Their groundbreaking research in neurophysiology, exploring the brain’s visual cortex, became the cornerstone for understanding how vision works. Their discovery of feature detectors in the cat’s visual cortex, cells that responded specifically to edges, lines, and movement, gave us a basis for the fundamental mechanisms of visual processing. This work earned them a Nobel Prize in 1981 and laid the conceptual groundwork for later developments in computer vision.
Also, in the 1960s, Larry Roberts published the paper “Machine Perception of Three-Dimensional Solids”. While Roberts is usually credited as one of the founders of the Internet (he was the team lead on ARPANET, the technological precursor of the modern Internet), his paper on “machine perception” kick-started research into using computers to analyze objects in images!
1970s-1990s - The Rise of Digital Image Processing
In the 1970s and 1980s, the field of computer vision began to take shape more formally, with digital image processing emerging as a key area. This era saw the development of basic techniques for image enhancement, restoration, and transformation. The advent of digital cameras and personal computers provided the necessary tools for researchers to experiment and develop algorithms that could interpret visual data. The focus during this period was primarily on understanding and processing static images, laying the groundwork for more complex interpretation of visual data.
Building off Hubel and Wiesel’s work, in the late 1970s, David Marr proposed a computational framework for modeling the neurological processes of sight.
1990s-2010s - The Era of Feature Detection and Machine Learning
The 1990s and early 2000s marked a significant shift towards using machine learning techniques in computer vision. Researchers began to focus on feature detection and extraction, where algorithms were developed to identify and track specific features within images. This era witnessed the development of various algorithms for facial recognition, object detection, and optical character recognition (OCR), which became crucial in various applications, from security systems to data entry automation.
2010-2020 - Deep Learning Revolutionizes Computer Vision
The 2010s brought a revolutionary change with the advent of deep learning. The use of deep neural networks, especially CNNs, transformed computer vision. These networks, inspired by the human brain’s neural structure, could learn hierarchical representations of visual data, leading to breakthroughs in accuracy and performance in tasks like image classification, object detection, and segmentation. In 2012, the image classification model AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and cemented CNNs as the primary architecture for image tasks. This era saw the practical application of computer vision in autonomous vehicles, augmented reality, medical image analysis, and numerous other fields.
2020-Today - Present and Future: Expanding Horizons
Today, computer vision continues to grow, integrating with artificial intelligence and other technological domains to create increasingly sophisticated applications. Advancements in real-time processing, 3D image reconstruction, edge computing, and multi-modal models are opening new frontiers. Integrating computer vision with Internet-of-things and robotics is shaping the future in areas like smart cities, advanced manufacturing, and environmental monitoring. Recently, vision transformers (ViTs) have made strides towards even more accurate and efficient image computing tasks. See optional module 2 for more information on ViTs. The journey from understanding the brain’s visual processing to enabling machines to ‘see’ and interpret the world autonomously marks a remarkable chapter in the history of technology.
How is Computer Vision different from Machine Vision?
Machine vision is more application-focused and is often considered a subset of industrial automation. It pertains to the use of computer vision in industrial environments, primarily for inspection and process control. Machine vision systems are designed to perform specific, repetitive tasks such as quality assurance in manufacturing processes, where they might inspect products for defects, guide assembly robots, or track items through production lines. These systems typically involve a specific combination of hardware and software, including cameras, lighting, data acquisition, and processing units. The emphasis here is less on mimicking human vision and more on achieving reliable, accurate measurement and decision-making in a controlled environment. Machine vision is characterized by its high speed, reliability, and precision in constrained scenarios versus computer vision’s emphasis on flexibility and understanding.