What Are We Optimising For?

High-Performance Computing has always been a field built around optimisation.

Faster simulations. More efficient software. Better utilisation. Lower runtimes. Larger systems.

For decades, progress has often been measured through improvements in performance and scale. Those advances have enabled remarkable achievements across science, engineering, healthcare, and industry. The assumption underlying much of this work is relatively simple: if something can be made faster, more efficient, or more capable, it probably should be.

Yet a growing number of conversations across HPC and AI suggest that optimisation is becoming a more complicated concept than it once appeared.

Researchers continue to push the boundaries of simulation fidelity and scale. Infrastructure teams are balancing performance with operational complexity. Organisations are introducing new requirements around governance, sustainability, security, and accessibility. At the same time, users increasingly expect systems to be easier to use, access, and integrate into existing workflows.

None of these goals is unreasonable.  The challenge is that they do not always point in the same direction.  A gain in one area often introduces new considerations elsewhere.

Containerisation, for example, can improve portability and reproducibility. Applications become easier to move between systems, and environments become easier to share. At the same time, questions emerge regarding performance, lifecycle management, software supply chains, and the best way to integrate with highly specialised infrastructure.

Trusted Research Environments present a similar pattern. They can improve governance, accountability, and confidence in the handling of sensitive data. Yet they also introduce constraints that may change how researchers interact with systems and collaborate with one another.

Even highly specialised hardware follows the same principle. Optimising for a particular workload can deliver remarkable performance improvements, but often at the cost of increased complexity, reduced flexibility, or additional operational burden.

In each case, the challenge is not whether optimisation is possible.  The challenge is deciding which outcomes matter most.

Historically, technical performance often dominated these discussions. Faster systems enabled more science, more engineering, and more discovery. Performance remains critically important today, but it increasingly sits alongside a broader set of priorities:

  • Reliability and availability.
  • Sustainability and energy consumption.
  • Security and governance.
  • Ease of use and accessibility.
  • Long-term maintainability.
  • Workforce development and skills.

None of these priorities is new. What appears to be changing is their relative weight in the decision-making process.

As HPC becomes more deeply embedded within research, industry, and public services, success depends on more than extracting maximum performance from individual components. Organisations are increasingly evaluating systems as complete services, considering not only what technology can do, but how effectively people can use, support, govern, and sustain it over time.

This may be one reason why discussions around abstraction have become so prominent.

Abstraction allows people to focus on outcomes rather than implementation details. It can reduce barriers to entry, improve productivity, and make powerful technologies accessible to a wider audience. Yet abstraction also introduces trade-offs around visibility, control, and optimisation.

The answer is not to eliminate these trade-offs.  That is rarely possible.  Instead, the goal is to understand them clearly enough to make deliberate choices.

At Alces Flight, we often describe supercomputers as records of decisions. Every system reflects choices about architecture, software, workflows, governance, operations, and people. Long after procurement decisions have been made and technologies have evolved, those choices continue to shape how a system behaves and what it enables.

The same principle applies to optimisation.  Every optimisation reflects a decision about what is being prioritised.  Performance remains important.  So do reliability, sustainability, usability, trust, and long-term resilience.

As HPC continues to evolve, the most significant question may no longer be how to optimise a particular component. It may be understanding what we are collectively choosing to optimise for.

The answer will influence not only the next generation of systems, but also the communities, skills, and services that grow around them.

This piece was inspired by the incredible work that went into the recent Durham HPC Days, with specific thanks to presenters Will Roper, John McCalpin, Rich Knepper and Melyssa Fratkin. Will Roper’s slide was used as the image piece for this post.

Enjoyed this piece? We publish occasional essays, event reflections, community opportunities, and practical perspectives on HPC and AI. Join our list.

Wait, there’s more...

Discover our other blog posts