Personal tools

Big Data Architectures

Big Data Pipeline_050423A
[Big Data Pipeline - Microsoft]


- Overview

Big data refers to the huge and rapidly expanding data volume. Due to the size and complexity of the data, no typical data management system can efficiently store or analyze this data. 

Big data architectures address some of these issues by providing a scalable and efficient approach to data storage and processing. Some of this is batch-related data that happens at a specific time, so jobs must be scheduled in the same way as batch data. Streaming jobs need to build real-time streaming pipelines to meet all their needs. This process is accomplished through a big data architecture.

Big data is the collection of organized, semi-structured, and unstructured information collected by businesses that can be mined for information and used in advanced analytical applications such as predictive modeling and machine learning.

Along with technologies that support big data analytics purposes, systems that process and store big data have become a regular part of business data management infrastructure. Understanding what big data can do and how to use it requires a solid understanding of its properties.


- Big Data Platforms

A big data platform acts as an organized storage medium for large amounts of data. Big data platforms utilize a combination of data management hardware and software tools to store aggregated data sets, usually in the cloud.

Due to the constant influx of data from numerous sources that will only become more intense, many sophisticated and highly scalable cloud data platforms are emerging to store and parse ever-expanding amounts of information. These types of platforms have become known as big data platforms.

Big data platforms strive to process this volume of information, store it in an organized and understandable manner, and extract useful insights. Big data platforms utilize a combination of data management hardware and software tools to aggregate data at scale, usually to the cloud.


- Big Data Architectures

Big data architectures are designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems.

A big data architecture is a framework that defines the components, processes, and technologies required to capture, store, process, and analyze Big Data. Big data architectures typically include four big data architecture layers: data collection and ingestion, data processing and analysis, data visualization and reporting, and data governance and security. Each layer has its own set of technologies, tools and processes. 

Big data solutions typically involve one or more of the following types of workloads:

  • Batch processing for static big data sources.
  • Real-time processing of dynamic big data.
  • Interactive exploration of big data.
  • Predictive analytics and machine learning.


- Big Data Architecture Layer

Big Data Architecture There are four main Big Data Architecture layers:

  • Data Ingestion: This layer is responsible for collecting and storing data from various sources. In big data, the data ingestion process of extracting data from various sources and loading it into a data repository. Data ingestion is a key component of a big data architecture as it determines how data will be ingested, transformed and stored.
  • Data Processing: Data processing is the second layer and is responsible for collecting, cleaning and preparing data for analysis. This layer is critical to ensure data is high quality and ready for future use.
  • Data Storage: Data storage is the third layer and is responsible for storing data in a format that is easy to access and analyze. This layer is critical to ensuring that data is accessible and usable by other layers.
  • Data Visualization: Data visualization is the fourth layer responsible for creating data visualizations that humans can easily understand. This layer is important to make data accessible.



[More to come ...]




Document Actions