Personal tools
You are here: Home Research Trends & Opportunities New Media and New Digital Economy Big Data Science and Analytics

Big Data Science and Analytics

Data Science Lifecycle_061621A
[Data Science Lifecycle - Microsoft]

 

New Data Economy: Turning Big Data into Smart Data

 

 

- Data Science Overview 

We already know that data science is one of the most trending buzzwords in today’s tech world, with an exceptional potential of opportunities for aspirants. If you belong to this league and are planning to pursue a career in this field, being familiar with the fundamental concepts is of utmost importance.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.

Data science has become a very helpful tool for solving problems in almost any field. In economics you can assess risks or forecast trends. Health care processes the information generated to construct case studies for studying certain diseases while medical device manufacturers implement artificial intelligence to help hospital administrators improve efficiencies and clinician’s productivity.

The computer processing power available today, combined with the explosion in the amount of data available to us in a digital world, means smart, self-teaching machines are now commonplace. Although, they are often hidden away behind services or web interfaces where we may not even notice them, unless we know what we’re looking for! But behind the scenes at Google, Facebook, Netflix or any of the hundreds of organisations which have deployed this revolutionary technology, vast data warehouses and lightning-fast processing units crunch through huge volumes of information to make this a reality. 

 

- The Future of Data Science and Analytics

The future of industry is intelligent and powered by data. Big data refers to extremely large datasets that are difficult to analyze with traditional tools. It is often boiled down to a few varieties of data generated by machines, people, and organizations. When needs for data collection, processing, management, use, and analysis go beyond the capacity and capability of available methods and software systems. These constraints are often defined by volume, variety, velocity, veracity, etc.. Big Data can create efficient challenging solutions in health, security, government and more; and usher in a new era of analytics and decisions.

Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data can be either structured, semi-structured, or unstructured. IDC estimates that 90 percent of big data is unstructured data. Many of the tools designed to analyze big data can handle unstructured data. The unstructured data usually refers to information that doesn't reside in a traditional row-column database. It is the opposite of structured data - the data stored in fields in a database.

Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. In most business use cases, any single source of data on its own is not useful. Real value often comes from combining these streams of big data sources with each other and analyzing them to generate new insights. The organization that can quickly extract insight from their data AND leverage the data achieves an advantage. 

Analyzing large data sets, so-called big data, will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.

  

- Four Stages for The Big Data Life Cycle

Big data is an emerging term referring to the process of managing huge amount of data from different sources, such as, DBMS, log files, postings of social media, and sensor data. Big data (text, number, images... etc.) could be divided into different forms: structured, semi-structured, and unstructured. Big data could be further described by some attributes like velocity, volume, variety, value, and complexity. The emerging big data technologies also raise many security concerns and challenges.

Big data must pass through a series of steps before it generates value. Namely data access, storage, cleaning, and analysis. One approach to solve this problem is to run each stage as a different layer. And use tools available to fit the problem at hand, and scale analytical solutions to big data. 

The big data life cycle consists of four stages, namely: Data Acquisition, Data Awareness, Data Analytics and Data Governance.

 

Murren_Switzerland_053021A
[Murren, Switzerland - Civil Engineering Discoveries]

- Data Acquisition

Data acquisition has been understood as the process of gathering, filtering, and cleaning data before the data is put in a data warehouse or any other storage solution. The acquisition of big data is most commonly governed by four of the Vs: volume, velocity, variety, and value. Most data acquisition scenarios assume high-volume, high-velocity, high-variety, but low-value data, making it important to have adaptable and time-efficient gathering, filtering, and cleaning algorithms that ensure that only the high-value fragments of the data are actually processed by the data-warehouse analysis. 

 

- Data Awareness

Data Awareness is the task of creating a scheme of relationships within a set of data, to allow different users of the data to determine a fluid yet valid context and utilise it for their desired tasks. It is a relatively new field, in which most of the work is currently being done on semantic structures to allow data to gain context in an interoperable format, in contrast to the current system where data is given context using unique, model specific constructs. (such as XML Schemes, etc.) 

Prior to the Big Data revolution, organizations were inward-looking in terms of data. During this time, data-centric environments like data warehouses dealt only with data created within the enterprise. But with the advent of data science and predictive analytics, many organizations have come to the realization that enterprise data must be fused with external data to enable and scale a digital business transformation. 

This means that processes for identifying, sourcing, understanding, assessing and ingesting such data must be developed.

 

- Data Processing and Analytics

Data Processing largely has three primary goals: a. determines if the data collected is internally consistent; b. make the data meaningful to other systems or users using either metaphors or analogy they can understand; and (what many consider most importantly) provide predictions about future events and behaviours based upon past data and trends. Being a very vast field with rapidly changing technologies governing its operation, this section will largely concentrate on the most commonly used technologies in data analytics. Data analytics requires four primary conditions to be met in order to carry out effective processing: fast, data loading, fast query processing, efficient utilisation of storage and adaptivity to dynamic workload patterns. The analytical model most commonly associated with meeting this criteria and with big data in general is MapReduce, detailed below. 

 

- Data Governance

Data governance is a requirement in today’s fast-moving and highly competitive enterprise environment. Now that organizations have the opportunity to capture massive amounts of diverse internal and external data, they need a discipline to maximize their value, manage risks, and reduce cost. 

Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. It establishes the processes and responsibilities that ensure the quality and security of the data used across a business or organization. Data governance defines who can take what action, upon what data, in what situations, using what methods. 

A well-crafted data governance strategy is fundamental for any organization that works with big data, and will explain how your business benefits from consistent, common processes and responsibilities. Business drivers highlight what data needs to be carefully controlled in your data governance strategy and the benefits expected from this effort. This strategy will be the basis of your data governance framework. 

Data Governance is the act of managing raw big data as well as the processed information that arises from big data in order to meet legal, regulatory and business imposed requirements. While there is no standardized format for data governance, there have been increasing call with various sectors (especially healthcare) to create such a format to ensure reliable, secure and consistent big data utilisation across the board. 

For example, if a business driver for your data governance strategy is to ensure the privacy of healthcare-related data, patient data will need to be managed securely as it flows through your business. Retention requirements (e.g. history of who changed what information and when) will be defined to ensure compliance with relevant government requirements, such as the GDP

 
 

[More to come ...]

 

 



 

Document Actions