A core mission of the National Nuclear Security Administration (NNSA) is to ensure that the United States maintains a safe, secure, and reliable nuclear stockpile through the application of unparalleled science, technology, engineering, and manufacturing. The use of modeling and simulation is front and center in realizing science-based stockpile stewardship, especially since 1992, when the United States voluntarily ceased underground nuclear explosives testing. These simulations require leading-edge computer platforms, sophisticated physics and engineering application codes, and expertise in applied mathematics and computer science for the design, engineering qualification, surveillance, maintenance, and certification of the nuclear stockpile. NNSA’s Advanced Simulation and Computing (ASC) program1 has been successful in providing high-performance computing (HPC) systems, software, methods, and workforce that are the foundation for this computational work. The combination of high-end computing facilities and expertise also makes the NNSA laboratories a national resource that can be and has been deployed for other critical priorities, on both an ongoing and an emergency basis. The demand for increased computing capability will continue unabated, stemming from the need for more detailed simulations of the aging stockpile, higher confidence in those simulations, and, potentially, increased use of artificial intelligence (AI) methods, which may involve highly expensive training of large models.
In 2022, the United States installed its first exascale computing system for the Department of Energy (DOE) Office of Science, with an NNSA system scheduled for 2023. The DOE Exascale Computing Project (ECP)2 has developed new applications
1 Los Alamos National Laboratory, 2021, “Nuclear Weapon Simulation and Computing,” Advanced Simulation and Computing (ASC) Program, https://www.lanl.gov/projects/advanced-simulation-computing.
capabilities, parallelization approaches, and software tools, while co-developing the computing systems in collaboration with vendor partners. NNSA is positioned to take full advantage of exascale computing, but demand for more computing will continue to grow beyond exascale, driven by both familiar applications and new mission drivers and new computational approaches that will use high-end computing. Visionary leaders and creativity will be needed to move existing codes to next-generation platforms, to reconsider the use of advanced computing for all of NNSA’s current and emerging mission problems, and to envision new types of computing systems, algorithmic techniques implemented in software, partnerships, and models of system acquisition.
OVERARCHING FINDING: The combination of increasing demands for computing with the technology and market challenges in HPC requires an intentional and thorough reevaluation of ASC’s approach to algorithms, software development, system design, computing platform acquisition, and workforce development. Business-as-usual will not be adequate.
The approach used to reach petascale and now exascale capabilities is unlikely to be sufficient for the next two decades. Instead, NNSA will need to reevaluate how its mission problems, not limited to physics simulations, are best solved through advanced computing, and rethink what type of models, algorithms, and data analysis techniques are suited to each problem; what computing capabilities will be needed; and how it can best acquire those capabilities.
Owing to a confluence of technology, marketplace, and workforce challenges, NNSA’s ASC program is at a critical crossroads. The program has for decades delivered impressive and state-of-the-art predictive simulation capabilities using in-house expertise in applied mathematics, computer science, and the physical sciences, along with research and development (R&D) investments in the computer vendor community. However, the current deployment model is not likely to be sufficient for future NNSA missions.
ROLE AND IMPORTANCE OF EXASCALE AND POST-EXASCALE COMPUTING FOR STOCKPILE STEWARDSHIP (CHAPTER 1)
Today’s increasingly complex geopolitical landscape has refocused attention on nuclear security. Meanwhile, the nation’s nuclear stockpile continues to age, requiring continual surveillance and the maintenance, redesign, and replacement of components. New
demands may also arise, such as for alternative delivery mechanisms and to meet operational requirements for extreme environments. These challenges will require improved predictive capabilities and reduced uncertainties, more detailed models, and the inclusion of more detailed physics phenomena, whether obtained from first principles or learned from experimental data. In addition to more sophisticated simulations, advanced computing is essential for the analysis of data sets from NNSA’s experimental facilities, for the use of new AI methods, and for the control of complex experimental systems. Therefore, it has become clear that mission requirements for advanced computing will continue to grow in both complexity and scale in the future.
FINDING 1: The demands for advanced computing continue to grow and will exceed the capabilities of planned upgrades across the NNSA laboratory complex, even accounting for the exascale system scheduled for 2023.
FINDING 1.1: Future mission challenges, such as execution of integrated experiments, assessment of the effects of plutonium aging on the enduring stockpile, and facilitation of rapid design and development of new delivery modalities, will increase the importance of computation at and beyond the exascale level. Orders of magnitude improvement in application-level performance would allow for improved predictive capability, valuable exploration and iterative design processes, and improved confidence levels that will remain infeasible as long as a single hero calculation takes weeks to months to execute on an exascale system.
FINDING 1.2: HPC has traditionally played an important role in support of weapons systems engineering. Some emerging challenges in this arena, such as qualifying future weapons systems for reentry environments, will require new approaches to mathematics, algorithms, software, and system design.
FINDING 1.3: Assessments of margins and uncertainties for current weapons systems will require additional computational capability beyond exascale, a problem exacerbated by the aging of the stockpile. Enhanced computational capability will also be required in assessing margins and uncertainties should there emerge requirements for new military capabilities.
FINDING 1.4: The rapidly evolving geopolitical situation reinforces the need for computing leadership as an important element of deterrence, and motivates increasing future computing capabilities.
DISRUPTIONS TO THE COMPUTING TECHNOLOGY ECOSYSTEM FOR STOCKPILE STEWARDSHIP (CHAPTER 2)
On the technology front, the single-thread performance of microprocessors continues to be relatively flat, and improvements in transistor density are slowing. Processing elements increasingly rely on more abundant, finer-grained parallelism and increasingly specialized hardware features that can improve performance by tailoring to a given computational domain. These trends are creating a significant disruption in available hardware components, with commercial interests focused largely on AI, embedded systems, and cloud services. There is considerable uncertainty as to whether the processors emerging in response to these trends, with their lower precision and limited high-speed memory, can be productively applied to NNSA applications.3
The market is also changing. The enormous scale of cloud-computing vendors now means that many cutting-edge hardware designs are being developed and deployed for in-house uses and are not available for direct purchase. Last, the number of HPC integrators has shrunk, placing the current system acquisition model at risk. In addition to the increased demand for HPC, technological and market shifts are challenging the vendor partnerships and other acquisition approaches used by the NNSA laboratories.
FINDING 2: The computing technology and commercial landscapes are shifting rapidly, requiring a change in NNSA’s computing system procurement and deployment models.
FINDING 2.1: Semiconductor manufacturing is now largely in the hands of offshore vendors who may experience supply-chain risk; U.S. sources are lagging.
FINDING 2.2: All U.S. exascale systems are being produced by a single integrator, which introduces both technical and economic risks.
FINDING 2.3: The joint ECP created a software stack for moving systems software and applications to exascale platforms, but although DOE has issued an initial call for proposals in 2023, there is not yet a plan to sustain it.
3 In the early 1990s when the laboratories moved from parallel vector architectures such as those offered by Cray systems to massively parallel software based on MPI, the codes required massive rewrites. In the architectural transition anticipated by this report, it is likely that a similar investment in the software will be necessary to realize the potential of these new types of machines.
FINDING 2.4: Cloud providers are making significant investments in hardware and software innovation that are not aligned with NNSA requirements. The scale of these investments means that they have a much greater market influence than NNSA in terms of both technology and talent.
RECOMMENDATION 1: NNSA should develop and pursue new and aggressive comprehensive design, acquisition, and deployment strategies to yield computing systems matched to future mission needs. NNSA should document these strategies in a computing roadmap and have the roadmap reviewed by a blue-ribbon panel within a year after publication of this report and updated periodically thereafter.
RECOMMENDATION 1.1: The roadmap should lay out the case for future mission needs and associated computing requirements for both open and classified problems.
RECOMMENDATION 1.2: The roadmap should include any upfront research activities and how outcomes might affect later parts of the roadmap—for example, go/no-go decisions.
RECOMMENDATION 1.3: The roadmap should be explicit about traditional and nontraditional partnerships, including with commercial computing and cloud providers, and academia and government laboratories, and broader cross-government coordination, to ensure that NNSA has the influence and resources to develop and deploy the infrastructure needed to achieve mission success.
RECOMMENDATION 1.4: The roadmap should identify key government and laboratory leadership to develop and execute a unified organizational strategy.
RESEARCH AND DEVELOPMENT PRIORITIES (CHAPTER 3)
R&D activities have been a critical element of NNSA’s Science-Based Stockpile Stewardship strategy. Historically, these activities have led to such achievements as the development of better mathematical models, numerical algorithms, parallel programming tools, and HPC operating systems.
Both AI and quantum computing have received significant and growing national attention and research investments in recent years. AI methods have revolutionized
computational approaches in other disciplines, and NNSA has demonstrated their applicability in some limited domains while exploring the significant open questions limiting broader impact across NNSA’s mission. At this time, NNSA can neither dismiss AI nor pivot entirely away from traditional modeling and simulation, and the most likely scenario is a complementary role for AI with simulation. Because of the uncertainty, AI research is critical, and the outcomes may influence post-exascale hardware roadmaps, industry partnerships, and applications capabilities. Quantum computing is also an exciting research area but is not sufficiently mature to be the basis for a computing strategy with a 20-year time horizon. Thus, while both are vital research areas, the future security of the nation’s nuclear deterrent is too important to rely solely on either of these, as yet unproven, technologies for nuclear weapons development and assessment.
FINDING 3: Bold and sustained research and development investments in hardware, software, and algorithms—including higher-risk research activities to explore new approaches—are critical if NNSA is to meet its future mission needs.
FINDING 3.1: Physics-based simulators will remain essential as the core of NNSA predictive simulation. However, given disruptions in computing technology and the HPC ecosystem combined with the end of the weak-scaling era, novel mathematical and computational science approaches will be needed to meet NNSA mission requirements.
FINDING 3.2: Verification, validation, and uncertainty quantification (VVUQ) and trustworthiness remain of paramount importance to NNSA applications. VVUQ will become increasingly important as simulation methodology shifts toward more complex systems that incorporate models of different fidelity, including data-driven approaches.
FINDING 3.3: Novel architectures can have a significant impact on NNSA computing; however, mathematical research will be needed to effectively exploit these new architectures. Involvement of applied mathematicians and computational scientists early in the development cycle for novel architectures will be important for reducing development time for these types of systems.
FINDING 3.4: An end to transistor density scaling is likely to motivate industry to develop novel computer architectures for which today’s numerical algorithms, software libraries, and programming models are ill suited.
FINDING 3.5: Recent advances in applied mathematics and computational science have the potential for impact on NNSA mission problems far beyond traditional roles in physics-based simulation.
FINDING 3.6: Co-design of hardware and systems for high-performance scientific computing applications has been a modest success to date and will be more important in the future and need to be deeper. Technological and market trends are likely to shift the balance of co-design to the laboratories, requiring more innovation and engineering in the areas of hardware design, system integration, and system software.
FINDING 3.7: Rapid innovation in AI methods, driven by advances in computing performance and growth in data sets, is producing frequent technological surprises that NNSA should continue to investigate and track. These advances may benefit the NNSA mission but will likely complement rather than replace traditional physics-based simulations in the post-exascale era.
FINDING 3.8: Quantum technology has the potential to improve the fundamental understanding of material properties needed by important NNSA applications. Analog quantum simulation or digital quantum simulation will likely be available before general quantum computers.
FINDING 3.9: Major breakthroughs in quantum algorithms and systems are needed to make quantum computing practical for multiphysics stockpile modeling. Quantum computing is more likely to serve as a special-purpose accelerator than to replace leading-edge computing.
RECOMMENDATION 2: NNSA should foster and pursue high-risk, high-reward research in applied mathematics, computer science, and computational science to cultivate radical innovation and ensure future intellectual leadership needed for its mission.
RECOMMENDATION 2.1: NNSA should strengthen efforts in applied mathematics and computational science R&D. Potential areas include using novel architectures, data-driven modeling, optimization, inverse problems, uncertainty quantification, reduced-order modeling, multiscale modeling, mathematical support for experiments, and digital twins.
RECOMMENDATION 2.2: NNSA should strengthen efforts in computer science research and development to build a substantial, sustained, and broad-based intramural research program that is positioned to address the technological challenges associated with post-exascale systems and co-design of those systems to ensure that the laboratories are positioned for leadership in computing breakthroughs relevant to NNSA mission problems.
RECOMMENDATION 2.3: NNSA should expand research in artificial intelligence to explore the use of these methods both for predictive science and for emerging applications, such as manufacturing and control of experiments, and develop machine learning techniques that provide the confidence in results required for NNSA applications.
RECOMMENDATION 2.4: NNSA should continue to invest in and track quantum computing research and development for future integration into its computational toolkit; these technologies should be considered an additional computational tool rather than a replacement for current approaches.
WORKFORCE NEEDS (CHAPTER 4)
Perhaps the most significant challenge facing the NNSA laboratory complex is in attracting and retaining top talent in areas that overlap and compete with the computing industry. Today’s computing technology and services industry offer higher salaries, greater resources, more flexible work environments, and the ability to focus on compelling intellectual opportunities. Meanwhile, security concerns affect recruitment of foreign talent, both directly limiting NNSA hiring and indirectly discouraging international participation in the broader U.S. computing ecosystem.
Workforce areas of need include experts in computer hardware and performance optimization, algorithms and applied mathematics, physics modeling, numerical computations, and software development. Even more challenging are emerging areas of scientific computing such as machine learning, hardware co-design, and quantum information science. NNSA’s ASC program and DOE’s ECP program have been unique resources for addressing national priorities, with teams of world experts that deploy large-scale computing for complex analysis and prediction, but to continue in this role NNSA and the associated national laboratories need to attract and train talent from an increasingly diverse workforce, offer competitive compensation packages, and provide an intellectually exciting and stable environment in which to work on cutting-edge R&D problems.
FINDING 4: NNSA’s laboratories face significant challenges in recruiting and retaining the highly creative workforce that NNSA needs, owing to competition from industry, a shrinking talent pipeline, and challenges in hiring diverse and international talent.
FINDING 4.1: The ASC program currently faces a challenge maintaining a competitive workforce. This challenge will continue to grow because of pipeline issues (small number of U.S. citizens going into graduate-level science, technology, engineering, and mathematics fields), industry competition, and emerging computing talent choosing not to focus on scientific computing.
FINDING 4.2: The U.S. national security enterprise has benefited enormously from the inclusion of global talent, but incorporating international scholars in the NNSA community is challenged by important concerns about protecting sensitive information. Failure to balance these risks with the risk of missing the best talent can result in not finding the best candidates for the job.
FINDING 4.3: Addressing the challenges laid out in this report will require a nurturing environment that reduces distractions, funding uncertainty, and administrative burdens, while providing employees the time and flexibility to explore areas of interest and do the creative thinking required to solve these problems.
RECOMMENDATION 3: NNSA should develop an aggressive national strategy through partnership across agencies and academia to address its workforce challenge.
RECOMMENDATION 3.1: NNSA should make concerted efforts to create an environment that nurtures and retains existing staff; more aggressively grow the pipeline; create an efficient and modern, yet secure environment; advertise and grow existing workforce programs (such as the Predictive Science Academic Alliance Program and the Computational Science Graduate Fellowship); and collaborate with other federal agencies to support ambitious talent development programs at all career stages.
RECOMMENDATION 3.2: NNSA should also develop a deliberate strategy to attract an international workforce and to provide them with a welcoming environment while thoughtfully managing the attendant national security risks.
This page intentionally left blank.