Personal tools

Probability, Statistics and AI

Harvard (Charles River) IMG 7721
(Harvard University - Harvard Taiwan Student Association)


- Overview

Artificial intelligence (AI) and probability theory are closely related. Statistics and probability theory provide the mathematical foundations for many AI techniques. For example, Bayesian statistics is used to model uncertainty and make predictions in AI systems. 

Here are some ways probability and statistics are used in AI:

  • Probabilistic reasoning: A form of knowledge representation that uses the concept of probability to indicate the degree of uncertainty in knowledge.
  • Probabilistic models: Used to examine data using statistical codes. 
  • Machine learning: Heavily utilizes statistics, which is built upon probability theory.
  • Statistical techniques: Help identify the most informative features and discard redundant or irrelevant ones.


Probability enables us to reason about uncertainty, while statistics quantifies and explains it.

Statistics are important for understanding and improving AI systems. Statistical models enable AI algorithms to learn from data, adapt to new information, and make informed decisions. Statistical inference is also important for evaluating the performance and reliability of AI systems. 

Statistical techniques are essential for validating and refining ML models. For example, techniques like hypothesis testing, cross-validation, and bootstrapping help quantify the performance of models and avoid problems like over-fitting. 

Many performance metrics used in ML algorithms, such as accuracy, precision, recall, f-score, and root mean squared error, use statistics as the base.

A basic understanding of probability and statistical concepts and their application to solving real-world problems is necessary. This prerequisite provides a solid background in applications of probability and statistics that will serve as a foundation for AI and advanced techniques, including statistical concepts, probability theory, random and multivariate variables, data and sampling distributions, descriptive statistics, and hypothesis testing.

Please refer to the following for more information:

 

- Modern Statistics

Statistics is both a body of theory and methods of analysis. The subject matters of statistics covers a wide range - extending from the planning of experiments and other studies that generate data to the collection, analysis, presentation, and interpretation of the data. Numerical data constitute the raw material of the subject matter of statistics.

The essence of modern statistics, however, is the theory and the methodology of drawing inferences that extend beyond the particular set of data examined and of making decisions based on appropriate analysis of such inferential data.  

Please refer to the following for more information.

 

- Types of Statistics

The essence of modern statistics is the theory and the methodology of drawing inferences that extend beyond the particular set of data examined and of making decisions based on appropriate analyses of such inferential data. 

Statistics can be classified into two different categories. The two different types of Statistics are: Descriptive Statistics and Inferential Statistics. 

In Statistics, descriptive statistics describe the data, whereas inferential statistics help you make predictions from the data. In inferential statistics, the data are taken from the sample and allows you to generalize the population. 

In general, inference means “guess”, which means making inference about something. So, statistical inference means, making inference about the population. To take a conclusion about the population, it uses various statistical analysis techniques. 

Please refer to the following for more information:

 

- Probability Vs. Statistics

Probability and statistics are related areas of mathematics which concern themselves with analyzing the relative frequency of events. Still, there are fundamental differences in the way they see the world:  

  • Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.  
  • Probability is primarily a theoretical branch of mathematics, which studies the consequences of mathematical definitions. Statistics is primarily an applied branch of mathematics, which tries to make sense of observations in the real world.

 

Both subjects are important, relevant, and useful. But they are different, and understanding the distinction is crucial in properly interpreting the relevance of mathematical evidence. Many a gambler has gone to a cold and lonely grave for failing to make the proper distinction between probability and statistics.

This distinction will perhaps become clearer if we trace the thought process of a mathematician encountering her first craps game:

  • If this mathematician were a probabilist, she would see the dice and think "Six-sided dice?" Presumably each face of the dice is equally likely to land face up. Now assuming that each face comes up with probability 1/6, I can figure out what my chances of crapping out are.
  • If instead a statistician wandered by, she would see the dice and think "Those dice may look OK, but how do I know that they are not loaded?" I'll watch a while, and keep track of how often each number comes up. Then I can decide if my observations are consistent with the assumption of equal-probability faces. Once I'm confident enough that the dice are fair, I'll call a probabilist to tell me how to play.

 

In summary, probability theory enables us to find the consequences of a given ideal world, while statistical theory enables us to to measure the extent to which our world is ideal.

Please refer to the following for more information:

 

- AI and Probability

Probability is the likelihood of an event occurring. In artificial intelligence (AI), probability is used to model and reason about uncertain situations. For example, AI can calculate the probability that a person with a certain height and weight will be obese. 

Probabilistic reasoning is a form of knowledge representation that uses probability to indicate the degree of uncertainty in knowledge. In AI, probabilistic models are used to examine data using statistical codes. Probabilistic reasoning was one of the first machine learning methods. 

AI can be used to predict outcomes, scenarios, and actions based on simulations, models, and optimization. This can help test hypotheses, explore options, and make informed decisions. For example, in industrial settings, AI can facilitate predictive maintenance by monitoring machinery and equipment data.

Please refer to the following for more information:

 

Amsterdam_Netherlands_060321A
[Amsterdam, Netherlands - Civil Engineering Discoveries]

- Machine Learning, Probability and Statistics

Machine learning (ML) heavily relies on probability and statistics. Probability theory provides the tools to model uncertainty and randomness inherent in data, while statistics provides methods for analyzing and interpreting data to build effective ML models. 

Probability and statistics provide the foundation for understanding, building, and evaluating ML models. They are not separate fields but rather complementary tools that work together to enable intelligent systems to learn from data.

In essence, probability helps quantify uncertainty, and statistics helps us make sense of data, both crucial for ML.

Here's a breakdown:

  • Probability: ML algorithms often need to predict outcomes or classify data, which inherently involves uncertainty. Probability theory provides the mathematical framework to quantify this uncertainty, allowing models to make predictions with associated probabilities. 
  • Statistics: Statistics provides the tools to summarize, analyze, and interpret data. This includes techniques like descriptive statistics (mean, median, variance), hypothesis testing, and regression analysis, which are essential for building, evaluating, and understanding machine learning models.
  • Interplay: Probability and statistics are deeply intertwined in machine learning. Probabilistic models, like Bayesian networks, utilize probability distributions to represent knowledge and make predictions. Statistical methods are then used to estimate parameters of these models from data.

 

Specific Applications

  • Model Selection and Evaluation: Statistical techniques like hypothesis testing and cross-validation are used to compare different models and choose the best one for a given task.
  • Parameter Estimation: Statistical methods like Maximum Likelihood Estimation (MLE) and Bayesian estimation are used to find the optimal parameters for machine learning models.
  • Uncertainty Quantification: Probabilistic models, like those using Bayes' theorem, allow for reasoning about uncertainty and making predictions with associated probabilities.
  • Data Analysis and Feature Engineering: Statistical methods are used to explore, clean, and transform data, preparing it for use in machine learning models.


[More to come ...]



Document Actions