-
›
- Careers ›
- Careers in America ›
-
Senior support lead - it operations
Job Description
Senior support lead - it operations
Job Summary
-
Location: California
-
Project role: Senior support lead - it operations
-
Skills: L3)
- Secondary Skills:
- IT Operations
-
No. of positions: 1
-
Pay Range Minimum: $83000
-
Pay Range Maximum: $128000
Job description:
About HCLTech
HCLTech is a global technology company, spread across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. We re powered by our people a global, diverse, multi-generational talent - representing 161 nationalities whose unique spark, perspective and boundless passion drive our culture of proactive value creation and problem-solving.
Our purpose is to bring together the best of technology and our people to supercharge progress for everyone, everywhere our clients, partners, their stakeholders, communities, and the planet. As a company, we are deeply focused on accelerating our ESG agenda. We are also creating technology-enabled sustainable solutions with and for our clients and partners. We embed ESG imperatives into every aspect of our business and ensure that the progress we supercharge is responsible, inclusive and beneficial to all our stakeholders in the long term. We have committed to achieving net zero by 2040.
To learn more about how we can supercharge progress for you, visit www.hcltech.com
Title: Linux Admin
Location: Remote.
Key Responsibilities
● Infrastructure Management: Provision, deploy, and maintain scalable, secure, and
high-availability cloud infrastructure on platforms such as Cloud to support
AI workloads.
● System Management: Administer and maintain Linux-based servers and clusters
optimized for GPU compute workloads, ensuring high availability and performance.
● GPU Infrastructure: Configure, monitor, and troubleshoot GPU hardware (e.g., NVIDIA
GPUs) and related software stacks (e.g., CUDA, cuDNN) for optimal performance in
AI/ML and HPC applications.
● Troubleshooting: Diagnose and resolve hardware and software issues related to GPU
compute nodes and performance issues in GPU clusters.
● High-Speed Interconnects: Implement and manage high-speed networking
technologies like RDMA over Converged Ethernet (RoCE) to support low-latency,
high-bandwidth communication for GPU workloads.
● CI/CD Pipelines: Build and optimize continuous integration and deployment (CI/CD)
pipelines for testing GPU-based servers and managing deployments using tools like
GitHub Actions.
● Monitoring & Performance: Set up and maintain monitoring, logging, and alerting
systems (e.g., Prometheus, Victoria Metrics, Grafana) to track system performance,
GPU utilization, resource bottlenecks, and uptime of GPU resources.
● Security and Compliance: Implement network security measures, including firewalls,
VLANs, VPNs, and intrusion detection systems, to protect the GPU compute
environment and comply with standards like SOC 2 or ISO 27001.
Required Qualifications
● Experience: 3+ years of experience in DevOps, Site Reliability Engineering (SRE), or
cloud infrastructure management, with at least 1 year working on GPU-based compute
environments in the cloud.
● Linux Administration: Strong knowledge of Linux system administration for managing
network services and tools in a GPU compute environment.
● High-Speed Interconnects: Experience with high-performance networking technologies
like RoCE, or 100GbE Ethernet in compute-intensive environments.
● GPU-Specific Networking: Proficiency with NVIDIA GPU networking technologies,
such as Mellanox ConnectX adapters, and configuring Netplan to support their drivers
and firmware.
● Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS,
Azure, GCP).
● Networking & Security: Knowledge of networking concepts (VPC, subnets) and
security best practices (IAM, encryption, firewall configurations).
Compensation and Benefits
A candidate’s pay within the range will depend on their work location, skills, experience, education, and other factors permitted by law. This role may also be eligible for performance-based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need-based leave with no designated number of leave days per year); and 10 paid holidays per year.
Disclaimer
HCLTech is an equal opportunity employer, committed to providing equal employment opportunities to all applicants and employees regardless of race, religion, sex, color, age, national origin, pregnancy, sexual orientation, physical disability or genetic information, military or veteran status, or any other protected classification, in accordance with federal, state, and/or local law. Should any applicant have concerns about discrimination in the hiring process, they should provide a detailed report of those concerns to secure@hcltech.com for investigation.