Personal tools

Text Mining and Data Mining

Angel Island State Park_SF_2
(Angel Island State Park, San Francisco/Bay Area, U.S.A. - Jeffrey M. Wang)



Data Becomes The New Language For Innovation

Data have become a torrent flowing into every area of the global economy.Companies churn out a burgeoning volume of transactional data, capturing trillions of bytes of information about their customers, suppliers, and operations. millions of networked sensors are being embedded in the physical world in devices such as mobile phones, smart energy meters, automobiles, and industrial machines that sense, create, and communicate data in the age of the Internet of Things. Indeed, as companies and organizations go about their business and interact with individuals,

they are generating a tremendous amount of digital “exhaust data,” i.e., data that are created as a by-product of other activities. Social media sites, smartphones, and other consumer devices including PCs and laptops have allowed billions of individuals around the world to contribute to the amount of big data available. And the growing volume of multimedia content has played a major role in the exponential growth in the amount of big data. Each second of high-definition video, for example, generates more than 2,000 times as many bytes as required to store a single page of text. In a digitized world, consumers going about their day—communicating, browsing, buying, sharing, searching - create their own enormous trails of data

Exploitation of this vast data and information resource can generate significant economic benefits, including enhancements in productivity and competitiveness, as well as generating additional value for consumers. Techniques such as text and data mining and analytics are required to exploit this potential.

Text mining and data mining are becoming increasingly widespread as companies try to tackle their unstructured information, or big data, for business value. While the goal is often the same - exploiting information for knowledge discovery - these techniques vary significantly when it comes to data complexity, deployment time and application.


- The Key Properties of Data Mining

Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. The key properties of data mining are: automatic discovery of patterns, prediction of likely outcomes, creation of actionable information, focus on large data sets and databases. Data mining can answer questions that cannot be addressed through simple query and reporting techniques.


- The Process of Data Mining

Data Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related - also known as "big data") in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. The process of data mining consists of three stages: (1) the initial exploration, (2) model building or pattern identification with validation/verification, and (3) deployment (i.e., the application of the model to new data in order to generate predictions). 

Data mining is a process that is used by an organization to turn the raw data into useful data. Utilizing software to find patterns in large data sets, organizations can learn more about their customers to develop more efficient business strategies, boost sales, and reduce costs. Effective data collection, storage, and processing of the data are important advantages of data mining. 


- Data Mining Tool To Train Machine Learning Models

Data mining method is been used to develop machine learning models. Machine learning allows computers to learn and discern patterns without actually being programmed. When statistical techniques and machine learning are combined together they are a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials.

Data mining is concerned with the applications of statistical machine learning for exploratory analysis and predictive modeling from large data sets. Causal discovery is concerned with algorithms for eliciting the underlying causal (as opposed to the merely predictive) relationships from observational and experimental data.


- Text Mining

Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) to transform the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms. Text mining is one of the most important tools currently used by business professionals and established companies. 

Text mining, also referred to as text data mining, roughly equivalent to Text Analytics (Unlocking the Value of Unstructured Data), refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. The purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms.

Text analytics software created for data mining is evolving to include artificial intelligence and machine learning. This new generation of text analytics software is unifying structured and unstructured textual data, providing contextual analysis, and helping businesses execute data driven decisions. Data Mining and Text Analytics Platforms can unify huge volumes of data in minutes to provide near real-time insight into text analytics for any business. 


- The Benefits of Data Mining

As data mining works on the structured data within the organization, it is particularly suited to deliver a wide range of operational and business benefits. For example, it can organize and analyze data from IoT systems to enable the predictive maintenance of factory equipment or it can combine historical sales data with customer behaviors to predict future sales and patterns of demand.

The knowledge or information which is acquired through the data mining process can be made used in any of the following applications:

  • Market Analysis
  • Production Control
  • Customer Retention
  • Science Exploration
  • Fraud Detection
  • Sports
  • Astrology
  • Internet Web Surf-Aid

- The Benefits of Text Mining

Businesses use data and text mining to analyse customer and competitor data to improve competitiveness; the pharmaceutical industry mines patents and research articles to improve drug discovery; within academic research, mining and analytics of large datasets are delivering efficiencies and new knowledge in areas as diverse as biological science, particle physics and media and communications.

Text mining can take this a stage further by synthesizing vast amounts of content into easily understood information and allowing you to understand what people are actually saying about them. Sentiment analysis has become a major business use case of text mining as it uncovers the opinions and concerns of customers and partners by tracking and analyzing social content.

The main benefits of text mining:

  • Efficiency. A key benefit of text mining is that it enables much more efficient analysis of extant knowledge. ...
  • Unlocking 'hidden' information and developing new knowledge. ...
  • Exploring new horizons. ...
  • Improved research and evidence base. ...
  • Improving research process and quality. ...
  • Broader benefits.


[More to come ...]






Document Actions