At this point, you might be thinking, “Neat! But uh… What would I use neural networks for?” Below is a list of some of the more popular tasks for which neural networks are used. We have provided a small description of the underlying network types that make the task possible for each task. Beneath the list, there is a high-level description of the network types mentioned. This list is not intended to be exhaustive; you can do many more things with neural networks, with the limit mainly being your imagination and access to large, clean datasets. Network architectures are constantly changing as technology improves, so don’t be confused if some of these are outdated by the time you take this course!
Computer visions tasks
Computer vision is like giving eyes to a computer. It’s the science of making machines “see” and interpret visual information from the world, like how humans use their eyesight.
Image classification, object detection, and segmentation are some sub-tasks within computer vision.
Image Classification
- What is it? Classifying the primary object in an image into one of a predetermined set of categories. For example, the ImageNet dataset has 1,000 categories. Image classification algorithms predict the probability that an image belongs in each of those 1,000 categories. The highest probabilities are used to classify the image.
- Underlying Neural Network Technology: Traditional image classification techniques used top-down, rules-based algorithms, but now, deep learning models, especially convolutional neural networks (CNNs), dominate the scene. Initially designed for text, the transformer architecture is also making its way into computer vision, offering more flexibility and often better performance in certain tasks.
Object Detection
- What is it? Object detection involves identifying individual objects within an image, which is more complex than classifying an image as a whole. Multiple objects can be identified with an image, and each can be classified. Object detection typically involves placing a rectangular bounding box around individual objects within an image. Multiple objects can be identified with an image, and each can be classified. Object detection typically involves placing a rectangular bounding box around individual objects within an image.
- Underlying Neural Network Technology: As with image classification, transformers are becoming increasingly popular in object detection. They’re especially useful in tasks that require understanding the context and relationships between different parts of an image.
Image Segmentation
- What is it? Going a step beyond object detection, image segmentation identifies the pixels of objects in an image. As with all these methods, there are sub-categories with slightly different tasks and challenges. At the simplest level, the segmentation goes a step beyond object detection by labeling the pixels (vs a bounding box) of the detected objects (e.g., labeling the pixels of an image that are cat pixels and dog pixels). Image segmentation can be extended to instance segmentation, which labels each object, even of the same category, as distinct (e.g., dog1, dog2, etc.).
Image Processing
- What is it? Image processing involves manipulating or altering images to achieve a desired result. This can range from simple tasks like adjusting brightness and contrast to more complex ones like removing unwanted objects from a photo.
- Underlying Neural Network Technology: Traditional image processing techniques used top-down, rules-based algorithms, but now, deep learning models, especially CNNs, dominate the scene. Initially designed for text, the transformer architecture is also making its way into image processing, offering more flexibility and often better performance in certain tasks.
Natural Language Processing (NLP)
Natural Language Processing is the magic that allows computers to understand, interpret, and generate (!!!) human language. It’s why chatbots can chat and translation apps can translate.
Speech recognition, translation, and text generation are everyday tasks.
Speech Recognition
- What is it? Speech recognition is about converting spoken words into written text. It’s the tech behind how voice assistants like Siri or Alexa “understand” what you’re saying.
- Underlying Neural Network Technology: Transformers’ ability to capture long-range dependencies in data makes them well-suited for understanding the nuances and rhythms of spoken language.
Translation
- What is it? Translation involves converting text or speech from one language to another. Text translation used to use standard computer algorithms but has moved to deep learning models because they’re more flexible and can “sense” context better than rule-based applications. It’s like having a digital interpreter to translate sentences between languages instantly.
- Underlying Neural Network Technology: While earlier models used architectures like Long Short-Term Memory (LSTMs), the current state-of-the-art for translation is the transformer Transformers have largely overshadowed other models due to their ability to handle long-range text dependencies and parallel processing capabilities. They’ve revolutionized machine translation, making it more accurate and efficient.
Text Generation
- What is it? Tools like ChatGPT have recently become the face of AI in many ways. While they remain one small part of what AI is and how it is applied, the idea of a computer writing sentences, poems, computer code, and more has captured the imagination.
- Underlying Neural Network Technology: The Transformer architecture has revolutionized NLP. Models like Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-trained Transformer (GPT), and their successors are all based on the transformer and have set new performance benchmarks in various language tasks. These models are trained on vast amounts of text, enabling them to understand context, nuances, and even sarcasm to some extent.
Text to Speech
- What is it? From simply using prebuilt voice models to voice cloning technologies, text-to-speech tools convert text to spoken audio. You encounter this regularly with your smart assistants (Seri, Alexa, Google, etc.). Voice cloning and voice customization technologies are improving, and it is possible to train a model to generate audio that sounds like someone.
- Underlying Neural Network Technology: As with much of NLP, transformers are responsible for huge advances in the field. Generative Adversarial Networks (GANs—see below) are used in voice cloning. Common methods convert input audio to an image, known as a spectrogram. The GAN is trained to generate spectrograms that look like the original. A different network, a Vocoder network, is used to convert the generated spectrograms to audio.
Time Series Analysis
- What is it? A time series is like a chronological diary of numbers. Imagine noting your heartbeat rate every minute during a workout; that list forms a time series. Stock tickers are also time series data. Time series analysis detects patterns in the series to highlight anomalies or forecast trends.
- Underlying Neural Network Technology: While Recurrent Neural Networks (RNNs) and LSTMs were traditionally used for time series, transformers are now being adapted for this purpose. Their ability to capture patterns over long sequences makes them a promising tool for predicting future values in time series data.
Generative methods
Image Generation
- What is it? This is about crafting new images from scratch. Tools that design pictures of fantasy creatures that have never been seen before are examples of Image Generators.
- Underlying Neural Network Technology: While Generative Adversarial Networks (GANs) were the go-to for image generation, newer methods like Stable Diffusion have taken center stage. Stable Diffusion is a process that evolves an image over time, starting from noise and gradually refining it to produce a clear picture. It’s like watching a foggy scene slowly come into focus.
Image Translation
- What is it? Image translation is the digital art of morphing one image into another. Think of turning a pencil sketch into a vibrant digital art piece.
- Underlying Neural Network Technology: While Conditional Generative Adversarial Networks (cGANs) were popular for image translation, the combination of Transformers and Stable Diffusion techniques is now gaining prominence. This combo allows for more detailed and nuanced translations of images, capturing intricate details and styles.