Transforming high-performance computing for a Swiss pharmaceutical corporation

5 min Lesen
Teilen
5 min Lesen
Teilen

The facing significant challenges in managing a complex high-performance computing (HPC) landscape. With over 50 nodes, including more than 10 GPU nodes, the client struggled with scalability, operational inefficiencies and a lack of infrastructure visibility. These issues resulted in frequent downtimes and restricted performance, hindering their ability to respond swiftly to business demands. The solution involved migrating to an automated, AWS-based HPC environment, advanced by AMD EPYC™ processor-based EC2 instances and HCLTech’s cloud modernization expertise, which streamlined operations, enhanced scalability and reduced downtime, ultimately enabling faster innovation and improved overall performance for the client.

The Challenge

Our client is a Swiss multinational pharmaceutical corporation. The customer has a 50+ node HPC landscape with 10+ GPU nodes and faces challenges around operations and management of the cluster. A lack of scalability leads to restricted performance and limited infrastructure visibility.

Challenge
  • Complex HPC landscape and lack of streamlined operations- The customer faced challenges in performing the day-to-day cluster administration and carrying out various HPC-related operations, which led to downtimes.
  • Lack of elastic and scalable infrastructure- This challenge caused a lack of faster provisioning (infra and services) in response to business needs, which created bottlenecks and restricted performance.
  • Lack of flexibility and visibility- A lack of flexible pay–as–you–go consumption models and limited infrastructure visibility hampered transparency.
  • Absence of on-demand automation- The HPC platform lacked automation capabilities, resulting in time-consuming manual processes.

The Objective

The objective was to ensure a reliable, scalable HPC environment for the customer, leading to enhanced overall performance and taking care of end-to-end management of the landscape while enhancing landscape visibility and increasing uptime.

Objective
  • Automate infrastructure deployment: Use infrastructure as code (IaC) to build HPC resources on-demand, ensuring scalability and efficiency.
  • Enhance elasticity: Implement a more flexible, responsive HPC cluster to support on-demand scaling with Amazon EC2 instances.

The Solution

The solution revolves around enhanced performance and built-in automation capabilities to provide seamless experience and migration of data from on premise to AWS. The elasticity provided by AWS provided the required scalability and TCO optimization. HCLTech designed a multi-phase modernization roadmap combining AWS-native automation with AMD EPYC™ powered Amazon EC2 instances. This approach improved workload throughput, reduced cloud spend and strengthened operational agility for computationally heavy tasks.

Solution

Assessment

  • Conducted a thorough assessment of the current HPC landscape and also identified specific bottlenecks.
  • Identified the specific requirements and constraints of the client's HPC environment.
  • Evaluated workload profiles to prioritize non-processor-intensive and compute-optimized use cases ideal for AMD-based EC2 instances.

Build

  • Automated provisioning – Used terraform pipelines to automate the deployment of HPC resources, using terraform.
  • Auto scaling – Set up elastic infrastructure able to meet the high resource requirements and configured Auto Scaling of HPC nodes based on the EDA simulation.
  • Streamlined job management – Configured job scheduling and management, with REST API integration to automate job submissions based on user personas.
  • Software for monitoring – Implemented software that monitors the HPC cluster and alerts your administrators to problems before they impact users.
  • Integrated AMD EPYC™ based EC2 instances into the HPC landscape for higher compute density and optimized per-core pricing, ensuring maximum output with minimal idle cost.

Operate

  • Delivered end-to-end operations management, ensuring seamless, efficient management of day-to-day HPC tasks.
  • Set up continuous monitoring of the HPC environment to ensure optimized performance and cost efficiency.
  • Introduced software-based monitoring to monitor all resources in order to identify and respond to anomalies faster.
  • Implemented ongoing performance tuning leveraging nodes powered by AMD to balance performance and budget efficiency.

The Impact

The helped the customer achieve their HPC transformation goals and achieve faster time-to-value and innovation.

Impact
  • Streamlined operations— HCLTech managed the end-to-end HPC platform support, performed optimization and enhanced HPC operations by introducing automation and a single point of management with Bright Cluster Manager.
  • Reduced downtime – The HPC landscape transformation and management by HCLTech massively reduced the downtimes and enhanced time to value.
  • Scalability and elasticity – The AWS-based HPC environment now dynamically scales with demand, ensuring that the client can meet peak workloads without paying for idle resources. Zero waiting time for execution of the simulations
  • Cost Efficiency - Delivered substantial cost savings by migrating appropriate workloads to AMD EPYC™ powered EC2 instances, improving both performance efficiency and overall ROI.
  • The migration enabled use of latest generation EC2 instances, improving end-of-life support.

AWS Services and AMD Product

  • Amazon EC2 with auto scaling
  • Amazon Elastic File System (EFS)
  • Amazon FSx for NetApp
  • AWS Elastic Load Balancer (ELB)
  • AMD EPYC™ powered Amazon EC2 instances
_ Cancel

Kontakt

Möchten Sie weitere Informationen? Lassen Sie uns verbinden