Personal tools

How Does Computer Vision Work

 US_Fighter_Plane_Travelling_Near_The Speed_of_Sound_in_Air_101120A
[Upper limit: a US Navy F/A-18 travelling near the speed of sound in air. The white halo comprises water droplets that have condensed from the air because of the sudden drop in pressure behind the shock cone around the aircraft - John Gay/US Navy]

 

- Computer Vision vs. Human Vision

Human vision requires the coordination of the eyes and the brain to function. Computer vision (CV) uses machine learning techniques and algorithms to identify, differentiate and classify objects by size or color, and to discover and interpret patterns in visual data such as photos and videos.

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs, and to take action or make recommendations based on that information. If artificial intelligence allows computers to think, computer vision enables them to see, observe and understand. 

Computer vision works much like human vision, only with a head start. The strength of human vision is that context can be used to train how to distinguish objects, how far away they are, whether they are moving and whether there is a problem in the image. 

Computer vision trains machines to perform these functions, but it must do it in a fraction of the time with cameras, data, and algorithms rather than the retina, optic nerve, and visual cortex. Because a system trained to inspect products or observe production assets can analyze thousands of products or processes a minute, noticing hard-to-detect flaws or problems, it can quickly surpass human capabilities. 

Without having machines able to see, it will be difficult to teach machines to think. The difficulty is that computers see only digital image representations. Humans can understand the semantic meaning of an image, but machines rarely do. They detect pixels. 

Semantic gap is the main challenge in computer vision technology. The human brain – or natural neural networks – distinguishes between components on images and analyzes these components in a certain sequence. Each neuron is responsible for a particular element. 

That is why building an artificial solution as superb as the human brain took decades of research and prototyping. And artificial neural networks became the greatest breakthrough in machine learning. 

Computer Vision gives us the ability to teach a computer to make meaning of the physical world through vision. These tools allow us to develop applications that can make meaning from the input of cameras, photos, and videos to mind-bending degrees.

As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems. Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, object pose estimation, learning, indexing, and image restoration. 

 

- The Tasks of Computer Vision 

Computer vision is the field of computer science that focuses on creating digital systems that can process, analyze, and make sense of visual data (images or videos) in the same way that humans do. The concept of computer vision is based on teaching computers to process an image at a pixel level and understand it. Technically, machines attempt to retrieve visual information, handle it, and interpret results through special software algorithms.

A fundamental task in computer vision has always been image classification. Thanks to the use of deep learning in image recognition and classification, computers can automatically generate and learn features -- distinctive characteristics and properties. And based on several features, machines predict what is on the image and show the level of probability.

Here are a few common tasks that computer vision systems can be used for:

  • Object classification. The system parses visual content and classifies the object on a photo/video to the defined category. For example, the system can find a dog among all objects in the image.
  • Object identification. The system parses visual content and identifies a particular object on a photo/video. For example, the system can find a specific dog among the dogs in the image.
  • Object tracking. The system processes video finds the object (or objects) that match search criteria and track its movement.

There are plenty of other technology-related tasks, and they work well in combinations, like the following computer vision solutions:

  • Semantic segmentation
  • Instance segmentation
  • Object detection
  • Action recognition
  • Image enhancement

 

- Computer Vision and Big Data

One of the driving factors behind the growth of computer vision is the amount of data we generate today that is then used to train and make computer vision better. As the field of computer vision has grown with new hardware and algorithms so has the accuracy rates for object identification. Today, a lot of things have changed for the good of computer vision: 

  • Mobile tech with HD cameras has made quite a huge collection of images and videos available to the world. 
  • Computing power has increased and has become easily accessible and more affordable. 
  • Specific hardware and tools designed for computer vision are more widely available. We have discussed some tools later in this article. 

These advancements have been beneficial for computer vision. Accuracy rates for object identification and classification have gone from 50% to 99% in a decade, resulting in today’s computers being more accurate and quick than humans at detecting visual inputs


- Computer Vision and AI

Computer vision technology tends to mimic the way the human brain works. But how does our brain solve visual object recognition? One of the popular hypothesis states that our brains rely on patterns to decode individual objects. This concept is used to create computer vision systems.

Computer vision is the field of computer science that focuses on replicating parts of the complexity of the human vision system and enabling computers to identify and process objects in images and videos in the same way that humans do. Until recently, computer vision only worked in limited capacity. Thanks to advances in AI and innovations in deep learning and neural networks, the field has been able to take great leaps in recent years and has been able to surpass humans in some tasks related to detecting and labeling objects.  

Much of what we know today about visual perception comes from neurophysiological research conducted on cats in the 1950s and 1960s.  By studying how neurons react to various stimuli, two scientists observed that human vision is hierarchical. Neurons detect simple features like edges, then feed into more complex features like shapes, and then eventually feed into more complex visual representations. Armed with this knowledge, computer scientists have focused on recreating human neurological structures in digital form. Like their biological counterparts, computer vision systems take a hierarchical approach to perceiving and analyzing visual stimuli. 

 

- Machine Perception  

Robot Vision's Family Tree

Machine Perception gives a machine the ability to explain, in a human manner, why it is making its decisions, to warn when it is about to fail, and to provide an understandable characterization of its failures. 
 
Computer Vision builds machines that can see the world like humans do, and involves designing algorithms that can answer questions about a photograph or a video.

  • [The Ohio State University]: The goal of computer vision is to make useful decisions about real physical objects and scenes based on sensed images and video. It is the process of discovering from images “what" is present in the world, “where" it is, and “what" it is doing, with the overall aim of constructing scene descriptions from the imagery. Algorithms require representations of shape, motion, color, context, etc. to perform the task.
  • [The British Machine Vision Association]: "Humans use their eyes and their brains to see and visually sense the world around them. Computer vision is the science that aims to give a similar, if not better, capability to a machine or computer. Computer vision is concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding." 
  • [SUNY-Buffalo]: "Computer vision is an interdisciplinary field drawing on concepts from signal processing, artificial intelligence, neurophysiology, and perceptual psychology. The primary goal of computer vision research is to endow artificial systems with the capacity to see and understand visual imagery at a level rivaling or exceeding human vision."

 

- Computer Vision in Deep Learning

Computer Vision refers to the entire process of emulating human vision in a non-biological apparatus. This includes the initial capturing of images, the detection and identification of objects, recognizing the temporal context between scenes, and developing a high-level understanding of what is happening for the relevant time period. 

This technology has long been commonplace in science fiction, and as such, is often taken for granted. In reality, a system to provide reliable, accurate, and real-time computer vision is a challenging problem that has yet to be fully developed. 

As these systems mature, there will be countless applications that rely on computer vision as a key component. Examples of this are self-driving cars, autonomous robots, unmanned aerial vehicles, intelligent medical imaging devices that assist with surgery, and surgical implants that restore human sight.

Computer vision algorithms that we use today are based on pattern recognition. We train computers on a massive amount of visual data—computers process images, label objects on them, and find patterns in those objects. For example, if we send a million images of flowers, the computer will analyze them, identify patterns that are similar to all flowers and, at the end of this process, will create a model “flower.” As a result, the computer will be able to accurately detect whether a particular image is a flower every time we send them pictures.

 

- Computer Vision and Neural Networks

In many ways, the story of computer vision is a story about artificial intelligence (AI). Both disciplines imitate biological processes based on an understanding of how the brain works and each has been advanced by the emergence of artificial neural networks, better computing resources, and big data.

Much of what we know today about visual perception comes from neurophysiological research conducted on cats in the 1950s and 1960s.  By studying how neurons react to various stimuli, two scientists observed that human vision is hierarchical.  Neurons detect simple features like edges, then feed into more complex features like shapes, and then eventually feed into more complex visual representations. Armed with this knowledge, computer scientists have focused on recreating human neurological structures in digital form. Like their biological counterparts, computer vision systems take a hierarchical approach to perceiving and analyzing visual stimuli.  

One of the critical components to realizing all the capabilities of AI is to give machines the power of vision. To emulate human sight, machines need to acquire, process and analyze and understand images. 

The tremendous growth in achieving this milestone was made thanks to the iterative learning process made possible with neural networks. It starts with a curated dataset with information that helps the machine learn a specific topic. If the goal is to identify videos of cats as it was for Google in 2012, the dataset used by the neural networks needs to have images and videos with cats as well as examples without cats. Each image needs to be tagged with metadata that indicates the correct answer. 

When a neural network runs through data and signals it's found an image with a cat; it's the feedback that is received regarding if it was correct or not that helps it improve. Neural networks are using pattern recognition to distinguish many different pieces of an image. Instead of a programmer defining the attributes that make a cat such as having a tail and whiskers, the machines learn from the millions of images uploaded.

 

- OpenCV Library and Computer Vision Algorithms

Computer vision is behind some of the most interesting recent advances in technology. From algorithms that can identify skin cancer to cars that drive themselves, it’s computer vision algorithms that are behind these advances.

Computer algorithms are what make computer vision possible and best for many tasks is currently a convolutional neural network. This is a form of deep learning that attempts to mimic how the brain understands objects in images. 

OpenCV (Open Source Computer Vision Library)  is the most popular free and open source solutions for computer vision. OpenCV algorithms range from being able to pixelate faces in images, to being able to smartly crop images automatically, to finding objects in images.

OpenCV is written in C++ and its primary interface is in C++, but it still retains a less comprehensive though extensive older C interface. All of the new developments and algorithms appear in the C++ interface. There are bindings in Python, Java and MATLAB/OCTAVE.

 

 

[More to come ...]

 

 

Document Actions