Aurora supercomputer heralds a new era of scientific innovation
Aurora's exascale capabilities are set to transform scientific research by accelerating breakthroughs across a wide range of disciplines, from cancer treatment to clean energy solutions.
Take a look at your smartphone. The processor inside of it runs at about a gigahertz, or a billion cycles per second. Imagine a computer operating at a billion times that speed. This is exascale, meaning a computer system that can calculate a quintillion operations a second or exaFLOPS, FLoating-point Operations Per Second. That's the power of the massive new Aurora exascale supercomputer at the U.S. Department of Energy's (DOE) Argonne National Laboratory.
What's the point of having an unprecedentedly fast and powerful supercomputer? Well, supercomputers are critical to scientific innovation. These enormous machines run simulations of real-world situations, which is how we get advancements in everything from pharmaceutical design to cancer research to aeronautical engineering. There aren't many scientific innovations without origins in a supercomputer simulation. And a more powerful supercomputer with advanced capabilities for artificial intelligence (AI)-driven research means more breakthroughs happening faster than ever before.
In the near future, the Aurora supercomputer will be ready for science at the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility. All 10,624 compute blades have been installed, and Argonne scientists are at work testing parts of the machine to ensure the hardware is running smoothly from stem to stern. But then what?
"If you think about doing benchtop science, go all the way back to your high school biology lab. You get the slide, slide it under the microscope and you look at it, and then you look up, talk to your classmates, and raise your hand, and your teacher comes over. I hope Aurora will again give us some of that capability, where we're in a more engaged conversation."—Michael Papka, ALCF director
We know Aurora, and exascale computing generally, will advance science in previously unimaginable ways. And if thinking about Aurora's labyrinthine collection of central processing unit (CPU) and graphic processing unit (GPU) chips and wires and cooling towers housed in gargantuan infrastructural facilities makes your brain hurt, then trying to detail its many scientific applications might require a weeklong beachside vacation. The question isn't, "What will it do and why is it important?" It's more along the lines of, "How do you even begin to describe what it will do and convey its importance?"
With exascale computing poised to revolutionize research across many fields, it can be challenging to convey its widespread impact on science and engineering. When asked which of Aurora's early science projects he's most excited about, Timothy Williams, deputy director of Argonne's Computational Science division, demurred. "It's hard to pick favorites," he said.
Williams is a bespectacled, soft-spoken man who's exceedingly humble about his position. As part of his responsibilities, he manages the ALCF's Aurora Early Science Program (ESP), which supports several research teams in their efforts to prepare a diverse set of codes and software tools to run efficiently on Aurora ahead of its deployment. In other words, he oversees all the research projects who were first granted permission to use Aurora before anyone else does. And to prepare those select teams and organizations to work on Aurora, he's one of the first people who gets to see what the machine can do.
Williams has a front-row seat for an historic occasion, but the real history will be made afterwards, with the results of the earliest science projects on Aurora under the ESP. These encompass everything from fusion energy to cosmological exploration to tremendous feats of engineering.
And to get a sense of the scale of innovation, it's important to also consider DOE's Exascale Computing Project (ECP), which came to a close in 2023. The ECP—a collaborative effort of DOE's Office of Science and the National Nuclear Security Administration—brought together approximately 1,000 researchers to develop a capable exascale ecosystem for the nation that encompassed science applications, system software, hardware technologies and workforce development.
Argonne scientist Andrew Siegel, who served as director of applications development for ECP, expects the project's collaborative nature will extend to the dynamic science set to occur in the exascale era.
"A unique strength of ECP is the degree to which it fostered interactions with people outside of their area of expertise," he said. "We have project metrics where the software teams have to succeed by building things that people use, and application development teams have to get a certain amount of performance improvement on scientifically relevant problems. When you put it together, people are incentivized to help each other."
Early science on Aurora
Even though he's cagey about which early science projects he favors, Williams is willing to tease a few. Regarding simulations, Williams mentioned one that hyper-realistically maps blood flow in the human circulatory system. "The special scientific case here is understanding how cells are transported through that system," he said. "In particular, tumor cells from cancer that transport and then implant elsewhere in the body is the process of metastasis of cancer." By simulating this process more precisely, scientists and pharmaceutical manufacturers can speed up the discovery of new methods for cancer treatment.
Other early science projects require data-intensive computing. Williams elaborated on one that involves tokamak fusion reactors—a promising design for power plants that would be capable of producing abundant clean energy. A challenge to overcome in tokamak design involves disruption. When a disruption happens in a tokamak, plasma energy can leak out into small parts of the physical container, potentially causing severe damage. An early science project on Aurora processes enormous volumes of tokamak data and uses deep learning to predict disruptions before they occur, "fast enough to be able to engage the controls to prevent them," Williams said.
As Argonne continues its work to integrate supercomputing and experimental facilities, Aurora is expected to enable many scientific breakthroughs. But what role does a supercomputer play in research conducted in a physical lab? According to Michael Papka, Argonne deputy associate laboratory director and director of the ALCF, it's all about helping everyone be more productive.
Processing the massive amounts of data generated by large-scale experiments presents a significant challenge. The time required to analyze the data often extends beyond the duration of the experiment, meaning researchers may not become aware of issues until after the experiment has concluded.
"If I were an experimentalist, I'd want to know I'm getting the data I need for my experiment," Papka said. "You don't want to wait and look at it afterward and say, OK, that's the wrong data. In the past, an experiment would end; somebody would return to their home institution, start looking at the data, and say, I don't have what I need. We want the computational resources at ALCF to help amplify great science happening worldwide."
As Siegel explained, in the past, you'd submit code to run on a supercomputer in a queue, then wait to get a response back. But Aurora will allow researchers and scientists to interact with data streaming in real time. "It used to be very siloed," Siegel said. "There was no way to, on-the-fly, initiate and execute a code on the machine to do, for example, complex imaging in real time from data that was streaming from an experimental facility. This degree of integration is just not how things were done in the past."
The engine under Aurora's hood
In general, it's the hardware—the new technologies—that enable Aurora to facilitate such scientific innovation. Kalyan Kumaran, director of technology at ALCF, spoke specifically about the new technologies that bring Aurora to the forefront. Kumar (as he prefers to be called) supervises the non-recurring engineering collaboration between Intel, Hewlett Packard Enterprise and Argonne.
Among the new innovations are the system's 21,248 Intel Xeon CPU Max Series and 63,744 Intel Data Center GPU Max Series processors. As Kumar explained, Intel's cutting-edge processors were designed specifically for Aurora to support three types of computational science research: simulation, data and learning.
"Aurora's GPUs seamlessly integrate traditional high performance computing and AI capabilities, enabling strong performance for complex workloads involving modeling and simulation, data analysis and scientific machine learning tasks," Kumar said. He also points out that each of Aurora's nodes have increased endpoints that provide more connectivity, and thus more bandwidth, within the supercomputer's network.
Then there's the Distributed Asynchronous Object Storage (DAOS) data storage system. Williams called it "Aurora's most sophisticated technological innovation." Very simply put, scientific input and output that historically have been inefficient on conventional file systems will be made more efficient on Aurora, without compromising traditional modes of data processing that will be beneficial for data-intensive projects.
Both Kumar and Williams highlight the DAOS data storage system, but as with virtually any conversation around computing right now, AI is a significant focus of Aurora's capabilities, and the AI features built into the machine's GPUs are profound. Williams explains that AI's primary benefit will be to accelerate scientific discovery. He points to one early science project aiming to find candidate materials for solar cells.
"There are certain properties you can calculate using supercomputing that will tell you whether or not a material is a good candidate, but it's very expensive," he said. "The AI model in this workflow uses data from chemistry databases. What it has learned already is to be able to identify which materials need the expensive calculations, which ones can clearly be thrown out, and which ones need less expensive calculations to evaluate."
Siegel argued the technology is where the paradigm shift is, because of how it expands scientific discovery across disciplines and in much less time. The computing itself is still largely traditional, but these advances in hardware can generate possibilities that cross a threshold older supercomputers were unable to exceed. He uses wind turbines as an example.
"Instead of doing a simulation of the air interacting with a single blade of a turbine, you can simulate arrays of turbines," he said. "You can understand local micro-meteorology and its impact on how you set up wind farms."
The big picture
So are researchers ready to work on Aurora? Well, they already have been working on Aurora … or at least, parts of it. At the end of 2022, early science projects were given access to a miniature version of Aurora called Sunspot—which replicates two racks of the supercomputer—to approximate what it would be like to access a tiny fraction of the machine. As time goes on, researchers have worked on larger chunks of Aurora as they've become available and operational.
When scientists were asked what Aurora means for the future, it is fascinating how varied answers can be. The responses get into the philosophical realm of the humanities or a charmingly utopian idealism. For Papka, he hopes that Aurora will prompt a more interactive, back-to-basics approach to computer science.
"If you think about doing benchtop science, go all the way back to your high school biology lab," he said. "You get the slide, slide it under the microscope and you look at it, and then you look up, talk to your classmates, and raise your hand, and your teacher comes over. I hope Aurora will again give us some of that capability, where we're in a more engaged conversation." However, this humble request for elemental science does not stem from nostalgia. "Being able to actively engage the design of an airplane wing in real time, determining what happens to its fuel efficiency, it's much more dynamic than configuring something, throwing it into the system, and waiting for the answer to come back."
The result of this integration between supercomputing and various scientific fields, as Siegel sees it, isn't just increased scientific discovery—although he thinks that's inevitable—but newfound scientific creativity.
"More complex workflows enable more seamless discovery," Siegel said. "If you're coming to a period where it's very hard to get better performance out of computers, it's natural to start thinking, what other creative things can we do accelerate scientific discovery?"
Take Siegel's earlier example of a turbine. It's a mechanism that could be used to develop better engines in airplanes, but also to create better and more efficient wind farms. What Siegel hopes supercomputers like Aurora might do is to go beyond treating each scientific field as a separate pursuit of knowledge, but rather that within the array of scientific inquiries are overlapping concerns that can be explored together. In that sense, the medium becomes the message: Just as the thousands of Aurora's GPUs and CPUs operate in harmony, scientific progress moves away from discrete moments of triumph and toward an inexorable stream of innovation.
Provided by Argonne National Laboratory