Transforming High Performance Computing with AWS
Introduction
This case study focuses on a customer's transformation of their High Performance Computing environment, highlighting challenges with scalability and automation. The objective was to create an elastic HPC cluster, automate job submissions and enhance visualization using AWS services. The resulting solution led to significant performance improvements and increased user productivity, showcasing the advantages of a cloud-native HPC approach.
The Challenges

The customer faced multiple challenges while operating HPC workloads, particularly regarding scalability, cost efficiency and visualization:
- Limited automation for on-demand provisioning of the HPC platform
- Difficulty in scaling the HPC platform to meet growing demand
- A non-elastic HPC cluster that lacks dynamic provisioning of compute resources
- Lack of integration with pre- and post-processing tools, which restricts end-to-end cloud workflows
- No available visualization solution for job monitoring, submission and interaction
- Hosting of simulation solvers (Fluent, Mechanical, CST, Altair Flux) on AWS Cloud
The Objective

The primary objectives were to modernize the HPC infrastructure by adopting automation, elastic compute scaling and visualization, while ensuring ease of cost tracking and improving user productivity.
- Build an elastic HPC cluster that can scale up/down dynamically
- Automate job submission and monitoring with centralized dashboards
- Host solvers and post-processing tools fully on AWS Cloud
- Enable remote visualization for pre/post-processing and solver tasks
- Integrate SPDM Minerva
 
      The Solution

The implemented solution leveraged AWS's HPC capabilities and SOCA (Scale-Out Computing on AWS) to build a flexible, automated and cost-efficient environment:
Assessment
- Evaluated existing HPC infrastructure and identified bottlenecks
- Mapped application landscape and performance requirements
- Chose appropriate instance types (HPC 6a, c6i, c7i, r7i) based on benchmarking
Build
- Deployed SOCA for application-specific job templates and workload orchestration
- Set up remote visualization using Nice DCV
- Configured job submission, monitoring dashboards and workflow orchestration via REST API
- Enabled SPDM (Ansys Minerva) integration with AWS SOCA
- License server setup and application installation
Operate
- End-to-end HPC lifecycle on cloud: pre-processing, solver execution and post-processing
- Elastic compute provisioning for cost efficiency
- Continuous monitoring and operations management
The Benefits

The customer realized significant technical and business benefits from the new cloud-native HPC environment:
- No cost for compute nodes when there are no active jobs running
- Comprehensive lifecycle support for HPC jobs, with pre-processing, post-processing and solvers fully accessible in the cloud
- On-demand remote visualization for job interaction through DCV (Desktop Cloud Visualization)
- Performance improvements of 30–40% compared to on-premises infrastructure
- Enhanced user productivity by eliminating wait times and enabling remote work capabilities
AWS services
- AWS SOCA
- Amazon EC2 with Auto Scaling
- Amazon Elastic File System (EFS)
- Amazon S3
- Amazon Fsx for Netapp
- Amazon DCV (formerly Nice DCV)
- Amazon Opensearch Service
- AWS Secrets Manager
- AWS Cloud Watch
