How can we achieve lasting sustainable change in HPC?
This is part three of a six-part series on what sustainability means for the High Performance Computing Community. Want to read the first two pieces? Click here for Part One, and here for Part Two
When the University of York launched their Viking HPC Cluster in 2018 their goal was to create inclusive access to supercomputing for all faculty, students and staff. Headed into their 2023 refresh the Research Computing Team decided to take a step towards environmental responsibility by providing HPC resources on 100% renewable energy.
We were delighted to have Dr. Emma Barnes from the Research Computing Team walk us through their HPC Cluster, Viking, showcasing its history, future and lessons learned at our Sustainable Reality event at the National Space Centre. Emma and her team’s experiences are now serving as guiding principles fuelling their change management strategy across the entire system refresh. Here’s what Emma taught us.
Collaboration is key to building an openly-accessible HPC cluster
With over 1,300 registered users, 350 projects, and more than 190 results-driven materials produced, looking after a system that encompasses such a wide range of people and skill sets requires collaborative engagement. In setting out to establish the Viking HPC cluster the Research IT team knew early on that managing the day-to-day aspects, as well as the physical aspects, would require specialist help. This is why their initial installation utilised Alces Flight as managed service providers and the Leeds AQL facility to house the physical hardware.
Emma and her team turned to developing a robust engagement model with the departments, teams, and students that engaged with the system. This level of direct interaction allowed them to have a better understanding of how the system was being utilised, and develop plans for how Viking would evolve within its next iteration.
Outside factors driving change management in HPC
Over time the usage of Viking has only gone up, with demand for more powerful and higher performant CPUs and GPUs. This requires four things – more space, more flexibility, more power, and a drive to net-zero. Looking at each in turn:
- Data Centre space is currently at a high premium within the UK. Overall compute and storage demands are increasing in line with the capabilities of research and private institutions being able to gather and process data at pace. This competitive state has made it more difficult for any institution to increase footprint without incurring significant costs.
- Flexible HPC, such as public cloud, has done some good in easing the expansion and resource demand. This is especially vital in short-term projects which require large amounts of resources rapidly deployed. While an important tool, York realised that maintaining an on-premises system was key to their open-access commitment.
- The sharp increase in power and cooling costs served as one of the most significant factors in the system refresh. Balancing the needs of research against the cost of energy meant the team needed to consider where and how the system was powered.
- Add this to a university-wide goal for 2030 to be the point at which the entire institution fully commits to net zero and the team realised their iterative plan needed to be approached holistically.
Harnessing the strengths of the Research Computing Team – Software Sustainability
With their overall systems management handled through partnership, the team looked first at what they could achieve in transforming Viking to its next iteration. Their team of Research Software Engineers reviewed existing methodologies around data efficiency and workflows, optimisation of code and applications, and detailing data acquisition and storage.
They then took this into their infrastructure review with Alces Flight, noting the efficiencies needed in scheduling, as well as reducing idle compute. This knowledge helped guide them towards their next decision – where Viking 2.0 should reside.
With sustainability in software already being planned out for their next iteration, the talk shifted to power and cooling. Realising that ecological impact needed to have equal weight within hardware and software choices, the team began searching the UK and Europe for renewable energy data centres. As the team had already determined that latency would not be overly-impacted, they could go beyond UK borders. They reviewed several sites, landing eventually on EcoDataCentre in Sweden.
This totally renewable energy site was chosen not only due to the low carbon footprint, but also because of its work with reusing waste energy, having built itself up next to a combined heat and power plant. Its innovative approach appealed to York, as it’s the site’s commitment to total transparency in energy reporting. This clarity allows the Research Computing team to continue its open-access policy at a new level, with hopes of taking the lessons learned with this shift to renewable energy HPC back to other UK institutions.
Get the full details.
We are delighted to present Emma’s complete presentation:
Next time…
In our next part of the sustainability series we delve into the most important (and often overlooked) aspect of HPC — that of the people who work and support the projects, services, and systems our field provides.