Personal tools

Big Data Ecosystems

University of Oxford_061522H
[University of Oxford]


- Overview

Big data is everywhere. Data science is being applied across multiple industries with transformative results. Significant improvements in operational efficiency resulted in increased revenue margins.

The explosion of information presents exciting opportunities for industries to grow through data science. Healthcare, finance, energy, media, and several other industries are using data science to uncover insights from big data, helping businesses make strategic decisions and optimize outcomes. 

A fundamental goal across numerous modern businesses and sciences is to be able to utilize as many machines as possible, to consume as much information as possible and as fast as possible. The big challenge is "how to turn data into useful knowledge". This is a moving target as both the underlying hardware and our ability to collect data evolve. 


- Data Convergence: HPC, Big Data and Cloud Technologies

The convergence of HPC and big data and the impact of the cloud are playing a major role in the democratization of HPC. 

The growing demand for computing power for data analytics has added new areas of focus for HPC facilities, but it has also created new issues, such as interoperability and ease of use with the cloud. 

These infrastructures are now required to handle more complex workflows combining machine learning, big data, and HPC, in addition to typical HPC applications. 

This creates challenges at the resource management, scheduling, and environment deployment layers. Therefore, enhancements are needed to allow multiple frameworks to be deployed under a common system management, while providing the right abstractions to facilitate adoption.


- HPC, Big Data and Cloud Computing: the way forward to the future of mankind  

Progress for both science and mankind is going to depend more and more on “supercomputer brains” that can process large amounts of data in real time, providing them with a meaning and - subsequently - turning it into actionable knowledge.  

The Internet of Things and the convergence of HPC, big data and cloud computing technologies are enabling the emergence of a wide range of innovations. Building industrial large-scale application test-beds that integrate such technologies and that make best use of currently available HPC and data infrastructures will accelerate the pace of digitization and the innovation potential in key industry sectors (for example, healthcare, manufacturing, energy, finance & insurance, agri-food, space and security).  


- High Performance and Super Computing

 In the Age of Internet Computing, billions of people use the Internet every day. As a result, supercomputer sites and large data centers must provide high-performance computing services to huge numbers of Internet users concurrently. We have to upgrade data centers using fast servers, storage systems, and high-bandwidth networks. The purpose is to advance network-based computing and web services with the emerging new technologies. 

The general computing trend is to leverage shared web resources and massive amounts of data over the Internet. The evolutionary trend towards parallel, distributed, and cloud computing with clusters, MPPS (Massively Parallel Processing), P2P (Peer-to-Peer) networks, grids, clouds, web services, and the Internet of Things.  

"Supercomputer" is a general term for computing systems capable of sustaining high-performance computing applications that require a large number of processors, shared or distributed memory, and multiple disks. Supercomputers are primarily are designed to be used in enterprises and organizations that require massive computing power. A supercomputer incorporates architectural and operational principles from parallel and grid processing, where a process is simultaneously executed on thousands of processors or is distributed among them.  

Performance of a supercomputer is measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). As of today, there are supercomputers which can perform up to nearly a hundred quadrillions of FLOPS, measured in P(eta)FLOPS. As of today, all of the world's fastest 500 supercomputers run Linux-based operating systems.  


- Turning Big Data into Smart Data

 Big data refers to extremely large datasets that are difficult to analyze with traditional tools. It is often boiled down to a few varieties of data generated by machines, people, and organizations. Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data can be either structured, semi-structured, or unstructured. IDC estimates that 90 percent of big data is unstructured data.

 Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. In most business use cases, any single source of data on its own is not useful. Real value often comes from combining these streams of big data sources with each other and analyzing them to generate new insights.

 Analyzing large data sets, so-called big data, will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. Big data must pass through a series of steps before it generates value. Namely data access, storage, cleaning, and analysis.  


Big Data Ecosystem_071423A
[Big Data Ecosystem - SelectHub]

- Future Cloud and Edge Computing

Cloud computing is the delivery of computing services—servers, storage, databases, networking, software, analytics, and more - over the Internet (“the cloud”). Companies offering these computing services are called cloud providers and typically charge for cloud computing services based on usage, similar to how you’re billed for water or electricity at home.  

Most cloud computing services fall into three broad categories: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (Saas). These are sometimes called the cloud computing stack, because they build on top of one another. There are three different ways to deploy cloud computing resources: public cloud, private cloud, and hybrid cloud. Knowing what they are and how they’re different makes it easier to accomplish your business goals.  

Cloud computing provides a simple way to access servers, storage, databases and a broad set of application services over the Internet. A Cloud services platform such as Amazon Web Services owns and maintains the network-connected hardware required for these application services, while you provision and use what you need via a web application.  


- A Health Data Revolution

 The very beginning of the bio (big) data revolution is already upon us with the emergence of wearable, constantly connected tech that collects information (data) about our health. There’s an overall belief that this could be a great thing for a society, the ability to actually have reams of data that can be applied to create better health care practices. 

We'll have a lot more information about how people really eat, exercise, and conduct their daily lives, which will allow doctors and researchers to better tailor programs to serve our needs and help us become healthier.


[More to come ...]

Document Actions