Overview
Deploying liquid cooled infrastructure for AI-enabling compute isn’t for the faint of heart. Power and cooling pose some of the greatest risks to overall data center reliability, which makes understanding component level reliability in a liquid cooling system foundational for robust rack infrastructure and efficient data center operation.
To address this need, CPC experts Mac Liu and Dylan Osiecki share with the audience a reliability roadmap including principles of reliability engineering, parameters of importance within a quick disconnect, and tools to initiate risk assessment and reliability analysis of a liquid cooling loop. This discussion on quality performance explores predictive techniques using artificial aging with thermal accelerating factors. To complement quantitative predictions, the team also examines a physics of failure approach, highlighting specific failure mechanisms within a cooling loop. This provides the audience with a broadly applicable technique for system-specific analysis and illustrates how these methods are applicable to crucial variables within a quick-disconnect.