Personal tools

Decision Trees

The University of British Columbia_022424A
]The University of British Columbia]
 
 

- Overview

Decision trees are a type of supervised machine learning algorithm that uses a collection of questions to create a model for classification and regression. The questions are organized hierarchically in the shape of a tree, with each non-leaf node containing a condition and each leaf node containing a prediction. 

Decision trees are used to: 

  • Make decisions: Decision trees lay out the problem and all the possible outcomes, allowing developers to analyze the possible consequences of a decision.
  • Solve classification problems: Decision trees are used to categorize or classify an object.
  • Predict outcomes: As an algorithm accesses more data, it can predict outcomes for future data.


Decision trees in machine learning provide an effective method for making decisions because they lay out the problem and all the possible outcomes. It enables developers to analyze the possible consequences of a decision, and as an algorithm accesses more data, it can predict outcomes for future data.


- The Advantages of Decision Trees

Decision trees are popular in machine learning because they are a simple way to structure a model. The tree-like structure makes it easy to understand the decision-making process of the model.

Here are some other advantages of decision trees: 

  • Flexible: Decision trees come in many forms for most business decision-making applications.
  • Use heterogeneous data: Decision trees can use numeric and text data.
  • Use dirty data: Decision trees can use data that doesn't need a lot of cleaning or standardization.
  • Use missing data values: Decision trees can use missing data values in their training process.
  • Work for numerical or categorical data: Decision trees can work for numerical or categorical data and variables.
  • Model problems with multiple outputs: Decision trees can model problems with multiple outputs.
  • Require less data cleaning: Decision trees require less data cleaning than other data modeling techniques.
  • Easy to explain: Decision trees are easy to explain to those without an analytical background.


Decision trees can get to 100% accuracy on any data set where there are no 2 samples with the same feature values but different labels. However, decision trees tend to overfit, especially on many features or on categorical data with many options.


- Classification Tree Analysis

A classification tree is a predictive algorithm that explains how to predict a target variable's values based on input values. It's a structural mapping of binary decisions that lead to a decision about an object's class. 

Classification trees are also known as decision trees, but they are more specifically a type of decision tree that leads to categorical decisions.

Classification trees are built through a process called binary recursive partitioning. This is an iterative process of splitting data into partitions, and then splitting it up further on each of the branches. 

In classification tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. 

Classification trees are used in machine learning to generate future predictions based on previous values. 

Some limitations of decision trees include:

  • They can be prone to overfitting and underfitting.
  • They can have difficulty handling continuous and nonlinear features.
  • They can be biased by the order and frequency of the features.
  • They can be influenced by random factors and noise.

Some popular decision tree algorithms include:
  • ID3: This algorithm uses entropy and information gain to evaluate candidate splits.
  • Hunt's algorithm: This algorithm was developed in the 1960s to model human learning in Psychology.
  • C5.0: This algorithm uses entropy and information gain to measure the disorder in the collection of attribute and effectiveness of an attribute.
 
 

[More to come ...]


Document Actions