Collaborating to Close the HPC Skills Gap

his September the Alces Flight Crew has been talking less technical, more practical, for delivering exceptional HPC within the Research Software Engineering (RSE) and Computational Biomedical Communities.

This September we spent time with not one, but two of the fastest growing disciplines in the field of HPC: Research Software Engineering (RSE) and Computational Biomedicine (CompBioMed). Both are growing up out of a need to be able to take the huge amount of scientific data being produced and turning it into better process and, in the case of the CompBioMed group, better treatment of the individual.

Much like what we are seeing in cloud HPC, both communities are growing and changing at rapid clip. For RSEs this means working as an intermediary between researcher and platform. Their aim is to take a range of ideas, projects, and older workloads and create better code, better process, and complete workload lifecycles for their user base. For those in CompBioMed this means finding fast, ethical means to look at ways of treatment — ranging from patient heart simulations through to risk models for severe depression — using both traditional methods and those being discovered as HPC becomes more prevalent in non-intrusive medical decisions.

With both these groups skills are in high demand… as is the drive to learn more and to experiment with technology in order to push their fields forward. But how do you achieve this without potentially asking too much of your team? Our time spent at the RSE UK and CompBioMed Conferences was in deep discussion on teasing out practical goals in order to allow everything from business as usual, to experimentation, to the prospect of transitioning to cloud.

There’s no wrong way.

The beauty of HPC is that there is more than one way to approach a problem, but it’s also the bane of anyone working to evolve the field. With so many options available for those looking it’s easy to get lost in the detail, putting projects at risk and killing off the motivation for each group to actively engage.

Our discussions weren’t about the latest instance type or debating which cloud or hardware platform is best. Instead, we worked on developing non-technical questions to yield not only pathways to better technical solutions — but opened the door to understand where collaboration fits in with the system. As projects grow in size, as datasets get larger, and as more fields begin to see the benefits of HPC, there comes a point where the both RSE and CompBioMed researcher needs to pull in the right tools and collaborators to get the job done. So where do you start?

Step One: Understand

Part of our research over the summer involved looking at what currently constitutes an HPC workload and tease out the elements that a researcher might look at as prospective gaps for the project being successful. Taking time to analyse the parts of a workload from a complete and practical perspective can lead to stronger questions when looking to test or evolve their current work.

Understanding non-technical character aspects of HPC workloads. Mayers, Wil and Norledge, Steve — Alces Flight Limited, 2019. All rights reserved.

Step Two: Listen

Over our three years working in cloud and hybrid HPC we’ve developed a laundry list of questions which have slowly been honed down to the basics. When speaking to both the communities about our list they asked us very simply: What question matters the most? In our experience it’s been this one:

“What calls you to work in cloud (or hardware/specific platform)?”

This question, which we call the ‘listening question,’ is our opportunity to get firsthand details about the history and motivations for the project. The answers can be simple and straightforward or complex (and sometimes very detailed). By asking it, however, we often get greater insight than just asking what type of network connection or storage they are after.

Step Three: Value your money AND your time

A step we often see missed out on is asking the often painful question of budget… and not just monetary budget… time budget. While certain elements of cloud have progressed forward in terms of ease and automation, the current state of cloud and HPC is one where time investment must be taken into account. Getting a sense of priority can lead to better organised time — but it also can open a window into what level of outside collaboration our researchers should engage in. Our work has shown that projects can range from experimental to needing commercial wrappers, and within each a range of prospective bespoke elements have to be taken into account. This is why at Alces Flight we’ve got everything from open-source, to pre-packaged clusters, to managed services available — all of which work on AWS, Microsoft Azure, and Google for cloud as well as bare metal hardware.

Valuing the RSE and CompBioMed Communities

The core of both Research Software Engineering and CompBioMed is engagement with the users and patients. Time and time again at these conferences we saw people keen to work with their communities to grow up projects and expand access to HPC and Big Compute. We are very pleased to be working with these fields and listening to their desires, goals, and ideas in order to develop across the board of collaboration.

If you are interested in joining the RSE UK Community they have now opened applications for membership, or to find out more about what a RSE career can be, visit their website: rse.as.uk.

If you would like to find out more about CompBioMed and get involved in their long-term aim of personalised medicine and healthcare then check out their website: compbiomed.eu.

Wait, there’s more...

Discover our other blog posts