Personal tools

Data Science Life Cycle

RWTH Aachen University_Martin Braun_020722A
[RWTH Aachen University, Germany - Martin Braun]


- Overview

The data science life cycle is the process of data from its creation to its destruction. It involves many stages, including: problem definition, data collection, preprocessing, exploratory analysis, model building, deployment.

Other stages of a data science project's life cycle include:

  • Business problem understanding
  • Data cleaning and processing
  • Model communication
  • Model evaluation and monitoring


The time required to complete a data science project is subjective and depends on the data set. It can take months or even years for a model to start showing results. 

The data processing phase is usually the longest and most important phase of a data science project. This is because the quality of the input data determines the quality of the output.

Data preparation is the process of preparing raw data for further processing and analysis. It involves:

  • Collect data from various sources
  • Clean and label data
  • Handle missing data
  • Explore and visualize data


[More to come ...]




Document Actions