Personal tools

HPC Infrastructure

Summit Supercomputer
(Summit Supercomputer - Oak Ridge National Lab, U.S.A.)

- HPC Infrastructure

High-performance computing (HPC) arose decades ago as an affordable and scalable method for tackling difficult math problems. Today, many organizations turn to HPC to approach complex computing tasks such as financial risk modeling, government resource tracking, spacecraft flight analysis and many other "big data" projects.

In today’s world, it’s vital to have access to HPC resources that can tackle whatever you throw at them. Simulation, data storage and analysis, artificial intelligence (AI), and machine learning (ML) technology all demand robust, scalable compute power. 

High-performance computing (HPC) infrastructures are complex by nature. With hundreds or even thousands of nodes running in parallel, architectural and management complexities abound. As technologies such as artificial intelligence and Internet of Things become the norm, organizations across all industries are turning to HPC solutions to gain or maintain their competitive advantage. 

When building an HPC infrastructure, there are nearly endless options for the required compute, networking, and storage components. HPC capabilities include: large-scale HPC systems, high-speed networking, and multi-petabyte archival mass storage systems.

The challenges that many organizations struggle with are threefold: 

  • How do you choose the components that best meet your business and budget requirements?
  • How do you integrate the components into a solution that works seamlessly?
  • How do you make sure that your solution will be able to accommodate changes in workloads and storage requirements over time?


- Distributed HPC Architecture

HPC solutions have three main components: Compute, Network, Storage. To build a HPC architecture, compute servers are networked together into a cluster. An HPC cluster consists of hundreds or thousands of compute servers that are networked together. Each server is called a node. The nodes in each cluster work in parallel with each other, boosting processing speed to deliver high performance computing. 

Software programs and algorithms are run simultaneously on the servers in the cluster. The cluster is networked to the data storage to capture the output. Together, these components operate seamlessly to complete a diverse set of tasks. 

To operate at maximum performance, each component must keep pace with the others. For example, the storage component must be able to feed and ingest data to and from the compute servers as quickly as it is processed. Likewise, the networking components must be able to support the high-speed transportation of data between compute servers and the data storage. If one component cannot keep up with the rest, the performance of the entire HPC infrastructure suffers. 


- HPC and Data Center

HPC combines hardware, software, systems management and data center facilities to support a large array of interconnected computers working cooperatively to perform a shared task too complex for a single computer to complete alone. Some businesses might seek to lease or purchase their HPC, and other businesses might opt to build an HPC infrastructure within their own data centers. 

Distributed HPC architecture poses some tradeoffs for organizations. The most direct benefits include scalability and cost management. Frameworks like Hadoop can function on just a single server, but an organization can also scale them out to thousands of servers. This enables businesses to build an HPC infrastructure to meet its current and future needs using readily available and less expensive off-the-shelf computers. Hadoop is also fault-tolerant and can detect and separate failed systems from the cluster, redirecting those failed jobs to available systems.

Building an HPC cluster is technically straightforward, but HPC deployments can present business challenges. Even with the ability to manage, scale and add nodes over time, the cost of procuring, deploying, operating and maintaining dozens, hundreds or even thousands of servers -- along with networking infrastructure to support them -- can become a substantial financial investment. Many businesses also have limited HPC needs and can struggle to keep an HPC cluster busy, and the money and training a business invests in HPC requires that deployment to work on business tasks to make it worthwhile.


[More to come ...]



Document Actions