Personal tools

Data Cleansing

Oklahoma_111220A
[Oklahoma State - Forbes]
 

- Data Cleansing

Data cleansing, also known as data cleaning or data scrubbing, is the process of repairing incorrect, incomplete, duplicate, or other wrong data in a dataset. It involves identifying data errors and then changing, updating or deleting the data to correct the errors. Data cleansing improves data quality and helps provide more accurate, consistent, and reliable information for an organization's decision-making. 

Data cleansing is a critical part of the overall data management process and one of the core components of data preparation, preparing datasets for business intelligence (BI) and data science applications. It is usually done by data quality analysts and engineers or other data management professionals. But data scientists, BI analysts, and business users can also cleanse data for their own applications or participate in the data cleansing process. 

Although many sources use the terms "data scrubbing" and "data cleaning" interchangeably, this is not accurate.

 

- Data Cleansing vs. Data Cleaning vs. Data Scrubbing

Data cleansing, data cleaning, and data scrubbing are often used interchangeably. For the most part, they are considered the same thing. However, in some cases, data scrubbing is seen as an element of data cleansing, which specifically involves removing duplicate, bad, unwanted or old data from a dataset. 

Data cleansing is the process of editing, correcting, and structuring data in a dataset so that it is often unified and ready for analysis. This includes removing corrupt or irrelevant data and formatting it into a language the computer can understand for optimal analysis. 

Think of data scrubbing as a subset of data cleansing. Data scrubbing uses actual tools to do "deeper cleaning" rather than just having users poring over database spreadsheets and making corrections. 

Data scrubbing also has a different meaning in terms of data storage. In this case, it's an automatic feature that checks the disk drive and storage system to make sure the data it contains can be read and any bad sectors or blocks identified.

 

 

[More to come ...]

 

 

 
Document Actions