Personal tools

Classification

DukeUniversity_IMG252
(Duke University - Cheng-Yu Chen)


- Overview

In statistics, classification is the process of determining which category an observation belongs to. For example, classifying an email as spam or not spam, or assigning a diagnosis to a patient based on their characteristics. 

Statistical classification is a supervised learning method that trains a program to categorize new information based on its relevance to known data. It's a machine learning method that classifies data into groups based on similarities. This is done by training a classifier on a dataset, which is then used to predict the class of new data. 

Statistical classification deals with rules of case assignment to categories or classes. The classification, or decision rule, is expressed in terms of a set of random variables. 
 

The most common methods for classification are based on Euclidean spaces. One widely used method is support vector machines(SVM). SVM is based on optimizing the gap in feature space between the training cases in the two classes. 

Please refer to the following for more information: 

 

- Classification Trees

A classification tree is a type of decision tree that is used in machine learning (ML) to classify data as part of a known object class. Classification trees are a type of supervised ML algorithm that uses conditional statements to divide training data into subsets. Each split adds complexity to the model, which can be used to make predictions. 

Classification trees are used to predict categorical data, such as whether an email is spam or not. They determine whether an event happened or didn't happen, usually involving a “yes” or “no” outcome. 

Classification trees are complex and can lead to overfitting. One way to avoid this is to set a minimum number of training inputs to use on each leaf. For example, you can use a minimum of 10 passengers to reach a decision, and ignore any leaf that takes less than 10 passengers.

In ML, a decision tree is an algorithm that can create both classification and regression models. The decision tree is so named because it starts at the root, like an upside-down tree, and branches off to demonstrate various outcomes.
 
 

- Pattern Recognition and Classification

Pattern recognition (PR) and classification is the process of taking raw data and using features to take an action. It's a type of data classification that uses machine learning (ML) algorithms to automatically recognize patterns and regularities in data.

Here are some steps in PR: 
  • Identify common elements
  • Identify and interpret common differences
  • Identify individual elements
  • Describe patterns
  • Make predictions based on patterns

 

Feature extraction is a crucial step in pattern recognition. It involves selecting and representing the most relevant information or attributes from the raw data. 

Hybrid pattern recognition combines two or more methods to identify patterns. For example, a system may use both statistical and structural methods to identify patterns in medical images. 

 

 

[More to come ...]



Document Actions