Personal tools

Text Mining and Data Mining

Angel Island State Park_SF_2
(Angel Island State Park, San Francisco/Bay Area, U.S.A. - Jeffrey M. Wang)


Text mining and data mining are becoming increasingly widespread as companies try to tackle their unstructured information, or big data, for business value. While the goal is often the same - exploiting information for knowledge discovery—these techniques vary significantly when it comes to data complexity, deployment time and application.


Data Mining


Machine learning allows computers to learn and discern patterns without actually being programmed. When Statistical techniques and machine learning are combined together they are a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials.

Data mining is concerned with the applications of statistical machine learning for exploratory analysis and predictive modeling from large data sets. Causal discovery is concerned with algorithms for eliciting the underlying causal (as opposed to the merely predictive) relationships from observational and experimental data.


  • [UCLA]: "Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases."
  • [Oracle]: Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. The key properties of data mining are: automatic discovery of patterns, prediction of likely outcomes, creation of actionable information, focus on large data sets and databases. Data mining can answer questions that cannot be addressed through simple query and reporting techniques.


Text Mining


Text mining, also referred to as text data mining, roughly equivalent to Text Analytics (Unlocking the Value of Unstructured Data), refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.

The purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms.

Text analytics software created for data mining is evolving to include artificial intelligence and machine learning. This new generation of text analytics software is unifying structured and unstructured textual data, providing contextual analysis, and helping businesses execute data driven decisions. Data Mining and Text Analytics Platforms can unify huge volumes of data in minutes to provide near real-time insight into text analytics for any business. 


[More to come ...]






Document Actions