Personal tools

Foundations of Computer Vision

The University of Chicago_052921C
[The University of Chicago]


- Overview

Computer vision (CV) is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs - and take actions or make recommendations based on that information. 

CV systems use techniques like deep learning, neural networks, and image processing to extract information from visual data.

If AI enables computers to think, computer vision enables them to see, observe and understand. CV aims to replicate human perception and associated brain functions to acquire, analyse, process, understand and thereafter work on an image. 

CV is a subfield of AI that studies how computers acquire, process, and analyze digital images and videos to gain advanced understanding. Replicating this process is extremely challenging as designers find it difficult to analyze which hardware and software are needed to perform an exact match to customer requirements with the greatest probability of selection. 

After years of hard work, companies deploying deep learning technology using computer vision hardware and software algorithms have achieved success in identifying objects.

Computer vision generally works in three basic steps:

  • Acquiring the image/video from a camera
  • Processing the image
  • Understanding the image


Some common computer vision problems include: 

  • Image classification
  • Object localization and detection
  • Image segmentation

 

 

- Convolutional Neural Networks and Computer Vision 

Convolutional Neural Networks (CNNs) have revolutionized computer vision by enabling automatic feature extraction through hierarchical, brain-inspired layers (convolution, pooling, fully connected). 

Originating from 1959 neurology studies, they evolved from Fukushima's 1980 Neocognitron to LeCun's LeNet-5 (1989) for character recognition. 

Modern CNNs exploded in the 2010s with large datasets (ImageNet) and GPU computing, starting with AlexNet (2012).

These developments allow CNNs to remain foundational in tasks like medical image analysis and object recognition.

Key advancements include:

  • Deep Networks & Stability: ResNet introduced residual connections, overcoming vanishing gradients to train very deep models.
  • Efficiency: EfficientNet optimized accuracy and parameter count, while MobileNet/Xception utilized depthwise separable convolutions to reduce computational costs.
  • Explainability: Tools like Grad-CAM are used for model transparency in critical fields.

 

[More to come ...] 

 
Document Actions