Personal tools

Big Data Ecosystems

Data Scientist Skillset_121321A
[Data Scientist Skillset]


Big Data, Big Opportunities


- Big Data Systems

Big data is everywhere. A fundamental goal across numerous modern businesses and sciences is to be able to utilize as many machines as possible, to consume as much information as possible and as fast as possible. The big challenge is "how to turn data into useful knowledge". This is a moving target as both the underlying hardware and our ability to collect data evolve. 

The traditional databases are not capable of handling unstructured data and high volumes of real-time datasets. Diverse datasets are unstructured lead to big data, and it is laborious to store, manage, process, analyze, visualize, and extract the useful insights from these datasets using traditional database approaches. However, many technical aspects exist in refining large heterogeneous datasets in the trend of big data. 

A big data system consists of the mandatory features Data, Data Storage, Information Management, Data Analysis, Data Processing, Interface and Visualization, and the optional feature, System Orchestrator. Check out the Feature Model of Big Data Systems for more details.

Key data-driven areas, including relational systems, distributed systems, graph systems, noSQL, newSQL, machine learning, and neural networks. Specifically, for example, areas include: cluster architecture, big Data stacks: Hadoop, Spark, Scheduling and Resource Management, batch and stream analytics, graph processing. serverless platforms, etc..


- The Role of Cloud Computing

Big data and cloud computing go hand-in-hand, with many public cloud services performing big data analytics. With Software as a Service (SaaS) becoming increasingly popular, keeping up-to-date with cloud infrastructure best practices and the types of data that can be stored in large quantities is crucial.  

Cloud computing is the delivery of computing services like servers, storages and more over the Internet. The companies that offer these computing services are called cloud providers. They charge for cloud computing services based on usage.

Cloud computing is usually classified on the basis of location, or on the service that the cloud is offering. Based on a cloud location, we can classify cloud as: Public, Private, Hybrid, and Community Cloud. Based on a service that the cloud is offering, we classify as: IaaS (Infrastructure-as-a-Service), PaaS(Platform-as-a-Service), SaaS(Software-as-a-Service), or, Storage, Database, Information, Process, Application, Integration, Security, Management, Testing-as-a-service.

Although you do not realize you are probably using cloud computing right now, most of us use an online service to send email, edit documents, watch movies, etc. It is likely that cloud computing is making it all possible behind the scenes. 


- Big Data Integration

Big data integration describes the connection between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications.

Data integration is now a practice in all organizations. Data needs to be protected, governed, transformed, usable, and agile. Data supports everything that we do personally and supports organizations’ ability to deliver products and services to us. Whatever your big data application is, and the types of big data you are using the real value will come from integrating different types of data sources, and analyzing them at scale. 

Data integration means bringing together data from diverse sources and turning them into coherent and more useful information (or knowledge). The main objective here is taming or more technically managing data and turning it into something you can make use of programmatically. A data integration process involves many parts. It starts with discovering, accessing, and monitoring data and continues with modeling and transforming data from a variety of sources. Moreover, integration of diverse datasets significantly reduces the overall data complexity. The data becomes more available for use and unified as a system of its own. Such a streamlined and integrated data system can increase the collaboration between different parts of your data systems. Each part can now clearly see how their data is integrated into the overall system, including the user scenarios and the security and privacy processes around it. 


 - Big Data Tools and Techniques

Nowadays, large volume of data is generated in the form of text, voice, video, images and sound. It is very challenging job to handle and to get process these different types of data. It is very laborious process to analysis big data by using the traditional data processing applications. Due to huge scattered file systems, a big data analysis is a difficult task. So, to analyses the big data, a number of tools and techniques are required. Some of the techniques of data mining are used to analyze the big data such as clustering, prediction, and classification and decision tree etc. Apache Hadoop, Apache spark, Apache Storm, MongoDB, NOSQL, HPCC are the tools used to handle big data.


[More to come ...]



Document Actions