Cooling, cleansing, and powering AI systems sustainably | HCLTech
Engineering

Cooling, cleansing and powering AI systems sustainably

Explore how the Open Compute Project is publishing roadmaps and collaborating on sustainable power, cooling and supply chain management of data centers.
 
5 minutes read
Brian Lake

Author

Brian Lake
Global Technology Director
Ayan  Hajra Chowdhury

Co-author

Ayan Hajra Chowdhury
Solution Director
5 minutes read
Share
Cooling, cleansing, and powering AI systems sustainably

Businesses are rapidly adopting AI to stay competitive, driving transformative sustainability innovations. As companies deploy new computing and AI platforms – such as Edge Data Centers (EDCs), Data Centers (DCs), and Cloud Service Providers (CSPs) – sustainability considerations are becoming essential.

System-level optimizations, combined with next-generation platforms, enhance computing capability, improve energy and cooling efficiency and reduce long-term operating costs. The fast-evolving AI hardware and software presents businesses with unique challenges and tradeoffs.

Solution

CSPs are driving rapid innovation and the Open Compute Project (OCP) fosters open-source collaboration to advance sustainability alongside AI growth. Through transparent engineering and sustainability leadership, OCP provides a marketplace of solutions today and clear community roadmaps for future advancements. These technical improvements benefit both operational efficiency and business outcomes.

This blog highlights key sustainability advancements presented at the 2024 OCP Global Summit. Although many presentations included video and slides, this summary references slide links only.

DCs:

CSPs are collaborating on the most innovative build-outs of new DCs, integrating green concrete and renewable energy with advancements in energy efficiency, cooling and composable computing infrastructures.

OCP’s 2018 has a summary of 400V DC power distribution at the data center level highlights a shift driven by telecom providers to address inefficiencies with both 48VDC and AC power distribution and cost of operations. These design improvements have increased typical power efficiency from 77% to 90% by reducing intermediate power conversion stages. Servers utilizing HVDC power supplies benefit from lower power consumption, reduced heat generation and improved power feed reliability – leading to cost savings and reduced energy consumption.

Power and Racks:

OCP 2022 specified a standardized rack architecture that is now a cornerstone of power efficiency improvements. The associated marketplace is vibrant and growing, offering businesses a sustainable path forward. If your company is purchasing new hardware that doesn’t fit this rack standard, it’s essential to understand why and what sustainability tradeoffs are involved. This rack is essential for multiple physical plant improvements for reducing energy overhead costs associated with AC/DC power conversions and utilization inefficiencies.

Note that traditional racks feeding AC into servers with dual-redundant AC/DC power supplies often operate inefficiently with power supplies rarely exceeding 50% utilization. Instead, adopting N+1 power architectures at the rack level can significantly improve efficiency while ideally minimizing power conversions.

OCP also communicates the future: Note that 48V bus bars, originally introduced by Google in 2017, set the foundation for efficient rack architecture and upgrading this to +/-400Vdc is an upcoming rack-level effort. This gives comparable efficiency and reliability benefits and can exploit the component ecosystem for EVs. It is necessary to consider this for ML, as next-generation racks won’t have space to accommodate power and battery components.

Cooling:

OCP provides Cooling Projects reports and certified cooling components, offering insights into the latest advancements. Liquid cooling outperforms air cooling by handling higher temperatures and absorbing and transferring heat more efficiently. Newer heatsinks may be may incorporate liquid-cooling and in some cases entire motherboards are immersed in liquid for optimal thermal management.

Liquid cooling can also extend the lifespan of existing DCs by enabling them to support higher heat/power technology. As efficiency, cost and noise concerns grow air cooling’s role in data centers is diminishing in favor of more effective cooling solutions.

Sustainability:

The OCP Sustainability Project is summarized here. It encompasses carbon modeling, data center efficiency metrics, lifecycle workstreams and more. For an overall summary of the current status, see Dcf-Sustainability.

OCP provides the Embodied Carbon Disclosure Form for DC operators. See "Implementing Effective Carbon Reduction Strategies" for an introduction and a summary of the complexities of working with different supply chains.

The sustainability risk analysis of the growth of edge computing shows exceptional foresight and utilizing digital twins in the HEATWISE project to model and validate the entire energy ecosystem from buildings and districts represents leading thoughts on how edge facilities can be combined with a city’s infrastructure. See https://sovereignedge.eu/ for a European Open Source next-gen edge cloud stack.

OCP Panels addressed Digital twins for design and operations and DOE funding for next-generation DC Power efficiency and Sustainability.

Regulatory compliance for new and upcoming DC building and operations standards is critical to planning and the OCP contributors participate with iMasons Climate Accord Working Groups. There is an opportunity to participate with these leaders on sustainability topics concerning equipment, materials and power.

Servers, networking and systems:

The pace of server innovation is accelerating, with open-source collaboration driving advancements across CSPs, before propagating to Telco and Enterprise DC and EDC locations.

Improvements in performance, security, networking, provisioning and composable systems are innovation tracks led by OCP.

As AI processing grows in complexity and scale software/hardware combined system engineering is becoming more prevalent with niche configurations tailored to specific workloads.

For 2024, OCP ran tracks specifically on this optimization – see the Open Systems for AI Strategic Initiative.

Conclusion

Innovations from multiple vendors are suitable for customer evaluation and adoption. OCP fosters open discussions and profound contributions across various technical domains. Companies working with leading engineering companies and subject matter experts should leverage OCP insights to assess both current and future computing needs.

Important items to consider in your existing and new planning for infrastructure capex and opex spending is ongoing monitoring. Ensure a power audit is performed and considers energy efficiency and sustainability regulations and guidelines that are in effect or upcoming.

Find partnerships with engineering experts in modern servers, storage devices, network hardware, cooling systems, power systems and rack and system-level verifications for on-prem and hyperscale business partners.

Ensure all business stakeholders' perspectives and views are accommodated when pursuing new AI machinery, whether in EDC, DC or CSP environments.

References

  1. https://www.opencompute.org/events/past-events/2024-ocp-global-summit#sustainability
  2. https://climateaccord.org/news/greener-concrete-for-data-centers-an-open-letter/
  3. https://drive.google.com/file/d/1Tzz4WQgXvbVqtHyfYW6BVod4Y5qFYorR/view
  4. https://www.opencompute.org/files/OCP18-400VDC-Efficiency-02.pdf
  5. https://www.opencompute.org/documents/open-rack-base-specification-version-3-pdf
  6. https://www.opencompute.org/contributions?contributions%5BrefinementList%5D%5Bfamily%5D%5B0%5D=OpenRack%20v3
  7. https://drive.google.com/file/d/1BCiJ5XJg1bbhF7l_BKvnSzjoUClUdGk2/view
  8. https://drive.google.com/file/d/1vdRSMsmjsO_4uMrDlBSfakt20hzIUiVC/view
  9. https://www.opencompute.org/contributions?contributions%5BrefinementList%5D%5Bproject%5D%5B0%5D=Cooling%20Environments
  10. https://www.opencompute.org/products-chiplets?cloud_products%5BrefinementList%5D%5Bhardware.categories.Liquid%20Cooling%20%26%20Thermal%20Management%5D%5B0%5D=Liquid%20Cooling%20%26%20Thermal%20Management&cloud_products%5BrefinementList%5D%5Bhardware.categories.Liquid%20Cooling%20%26%20Thermal%20Management%5D%5B1%5D=Blind-Mate%20Quick%20Connectors&cloud_products%5BrefinementList%5D%5Bhardware.categories.Liquid%20Cooling%20%26%20Thermal%20Management%5D%5B2%5D=Universal%20Quick%20Disconnect%20%28UQD%29&cloud_products%5BrefinementList%5D%5Bhardware.categories.Liquid%20Cooling%20%26%20Thermal%20Management%5D%5B3%5D=Large%20Quick%20Connector&cloud_products%5BrefinementList%5D%5Bhardware.categories.Liquid%20Cooling%20%26%20Thermal%20Management%5D%5B4%5D=Heat%20Transfer%20Fluids&cloud_products%5BrefinementList%5D%5Bhardware.categories.Liquid%20Cooling%20%26%20Thermal%20Management%5D%5B5%5D=Immersion%20Fluids
  11. https://drive.google.com/file/d/1A6ElINY2VGSUGsXdfmtWu_CwRRNp4PaB/view
  12. https://drive.google.com/file/d/1CM6sLeA0MulunQ1-KWfyM1h5Ll8JoOEF/view
  13. https://drive.google.com/file/d/1iIJXIZpR5NJnCYFPN2TRzFfHXfbYvybe/view
  14. https://sovereignedge.eu/
  15. https://drive.google.com/file/d/1EabbaW8JP-YpguIUcxEv5OqmvvHgAvA9/view

 

Share On