Personal tools

Computer Vision Technologies

Rice University_Brandon Martin_083021A
[Rice University - Brandon Martin]


- Overview

In 1995 everyone in tech wanted a slice of the dot-com boom, but today, fields like artificial intelligence (AI), machine learning (ML) and big data drive the tech venture capital (VC) of the world to dig into their pockets. Computer vision is at the intersection of all these data-driven innovations. While uses for computer vision are well-known within the tech world, the term is still virtually unknown to the general public, even though many of them are already benefiting from it. 

Computer vision is a form of artificial intelligence in which computers "see" the world, analyze visual data, and then make decisions or understand environments and situations from it. One of the drivers behind the growth of computer vision is the amount of data we generate today, which is then used to train and improve computer vision. 

Our world is filled with countless images and videos from the built-in cameras of mobile devices. But while an image can include photos and videos, it can also mean data from thermal or infrared sensors and other sources. In addition to the vast amount of visual data (more than 3 billion images are shared online every day), the computing power required to analyze the data is now accessible and more affordable. 

With the development of new hardware and algorithms in the field of computer vision, the accuracy of object recognition has also improved. In less than a decade, today's systems have gone from 50 percent accuracy to 99 percent, making them more accurate than humans at responding quickly to visual input.


 - Computer Vision Technologies

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that can interface with other thought processes and elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. 

Two basic techniques are used to do this: a type of machine learning called deep learning and convolutional neural networks (CNNs). 

Machine learning uses algorithmic models that enable computers to teach themselves the context of visual data. If enough data is fed through the model, the computer will "look at" the data and teach itself to distinguish one image from another. Algorithms enable machines to learn on their own, rather than being programmed by humans to recognize images. 

CNNs help machine learning or deep learning models "see" by breaking down images into labeled, or labeled, pixels. It performs a convolution (a mathematical operation on two functions to produce a third function) with the labels and predicts what it "sees". The neural network runs convolutions and checks the accuracy of its predictions in a series of iterations until the predictions start to come true. It then recognizes or views images in a similar way to humans. 

Just like humans craft images at a distance, the CNN first discerns hard edges and simple shapes, then fills in the information as it runs its prediction iterations. CNNs are used to understand a single image. Recurrent neural networks (RNNs) are used in video applications in a similar way to help computers understand how pictures in a series of frames relate to each other.


- Artificial Neural Networks in Machine Learning: Computer Vision & Neural Networks

In the simplest terms, an artificial neural network (ANN) is a computer system designed for machine learning that mimics the way the human brain (a natural neural network) works. CNNs are sometimes referred to as "computer neural networks," and these systems learn by modeling the human learning process: They take in data, look for patterns, and absorb those patterns to create logical rules for processing data or recognizing things.

For example, you can provide a neural network with two datasets:

  • Dataset one is called "Cats" and contains pictures of cats.
  • Dataset two is called "No Cat" and consists of images without cats.

Based on these datasets, an ANN can learn to recognize pictures of cats without being told what a cat is. It creates rules to understand what each image set has in common across all images and what differs between image sets.

Of course, such complex systems and processes are not well described in the simplest terms. In fact, artificial neural networks are very complex systems, and scientists still have a lot of work to do in understanding and utilizing machine learning. Like human learning and memory, neural networks perform tasks that we cannot directly program them to do, or that are impractical to program directly, such as:

  • Forecast stock market development
  • Determine authorship of manuscripts based on word choice and style

Although they are complex, idiosyncratic, and we still have a lot to learn about ANNs and the machine learning that goes with them, you can already find this technique in many practical applications. One of the most common settings for ANNs is in the field of computer vision.


Beverly Hills_CA_053122A
Beverly Hills, California - Civil Engineering Discoveries]

- Research of Computer Vision

There is a lot of research being done in the computer vision field, but it’s not just research. Real-world applications demonstrate how important computer vision is to endeavors in business, entertainment, transportation, healthcare and everyday life. A key driver for the growth of these applications is the flood of visual information flowing from smartphones, security systems, traffic cameras and other visually instrumented devices. This data could play a major role in operations across industries, but today goes unused. The information creates a test bed to train computer vision applications and a launchpad for them to become part of a range of human activities. 

Computer vision is used in industries ranging from energy and utilities to manufacturing and automotive. With Deep Learning (DL), a lot of new applications of computer vision technologies have been introduced. For example, we may use computer vision technologies to process medical images. These technologies help doctors detect malign changes such as tumors and hardening of the arteries and provide highly accurate measurements of organs and blood flow. 

Some medical startups claim they’ll soon be able to use computers to read X-rays, MRIs, and CT scans more rapidly and accurately than radiologists, to diagnose cancer earlier and less invasively, and to accelerate the search for life-saving pharmaceuticals. Hospitals and imaging centers that can interpret images faster and more accurately with the use of fewer radiologists. 

Business enterprises are developing computer vision systems embedded into deep learning systems hosted on the edge of the Internet of Things (IoT), in on-board systems, performing inference analysis in the cloud.


- Research Topics in Computer Vision

  • Optical character recognition (OCR)
  • Computer vision and speech understanding 
  • Vision and Language in Computer Vision
  • Computer vision applications (autonomous navigation, visual surveillance, or content-based image and video indexing)
  • Fusing 3D scene reconstruction
  • 3D perception
  • Vision, language, and cognition 
  • Probabilistic graphical models
  • Computational photography
  • Medical vision
  • Computer vision and machine learning
  • Video understanding
  • Visual recognition and visual search
  • Large-scale image/video retrieval
  • Unsupervised visual discovery
  • Image and video segmentation
  • Vision and language
  • Video summarization.
  • Learning-based visual reconstruction 
  • Understanding video contents 
  • Visual sensing for ecology and conservation
  • Smart graphics
  • Graphics, human computer interaction, & user experience 
  • People counting tool
  • Colors detection
  • Object tracking in a video
  • Pedestrian detection
  • Hand gesture recognition
  • Human emotion recognition
  • Road lane detection
  • Business card scanner
  • License plate recognition
  • Handwritten digit recognition
  • Iris Flowers Classification 
  • Family photo face detection
  • LEGO Brick Finder
  • PPE Detection
  • Face mask detection



[More to come ...]



Document Actions