The National Academies Press

Currently Skimming:

2 Disruptions to the Computing Technology Ecosystem for Stockpile Stewardship
Pages 35-62

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 35... ... Meanwhile, there is credible evidence that China was the first country to deploy exascale computing systems, targeting China's own national security interests. Moreover, technological shifts owing to the end of Dennard scaling and the slowing of Moore's law have raised questions about the technical and economic viability of continued reductions in transistor sizes and associated growth in computing performance, all at a time when the locus of semiconductor design is now being driven by artificial intelligence (AI) Read the entire page →
From page 36... ... Concurrently, the "hyperscalers" -- the largest of the cloud service providers -- have begun designing their own processors and accelerators, which are not available for pur chase. And a growing market focus on improving the performance of machine learning is shifting the locus of hardware innovation. Read the entire page →
From page 37... ... More recently, the demand for systems to address large machine learning problems has led to cloud offerings with high-speed networks, graphics processing units, and most recently custom hardware. When discussing the use of cloud computing for the National Nuclear Security Administration's Advanced Simulation and Computing program, it is important to use these more precise terms and areas of innovation, rather than "cloud computing" as a catch-all concept. Read the entire page →
From page 38... ... power dissipation had become such a major design constraint. Along with the end of Dennard scaling, this 1 National Research Council, 2011, The Future of Computing Performance: Game Over or Next Level? Read the entire page →
From page 39... ... • Lithographic reticle limits and the yield of working chips per semicon ductor wafer place a practical ceiling on how large silicon dies can be manu factured. The emergence of chiplets -- integrating multiple chips, often from different vendors and fabrication processes, on a shared substrate -- is both a technical and an economic consequence of chip yields and reticle limits. Read the entire page →
From page 40... ... The increasing disparity between processing speeds and memory access times now challenges von Neumann designs and traditional approaches to parallelism.4 • CPUs and GPUs today implement fixed sets of operations -- that is, fixed In struction Set Architectures that support a broad range of applications. For a sufficiently narrow class of applications, there are proven energy and perfor mance advantages from building hardware that is simplified and specialized to a specific application. Read the entire page →
From page 41... ... While weak scaling captures one aspect of how well an algorithm be haves and to what degree larger and larger computers will enable us to continue solving larger and larger problems, it neglects a critical aspect of time-dependent multiphysics simulations that characterize much of the National Nuclear Security Administration workload. Multiphysics simulations often track the evolution of a physical system over time by sequentially advancing the solution over a discrete time interval, referred to as a time step. Read the entire page →
From page 42... ... More serious is the decrease of the byte-per-FLOP ratios for overall memory and memory bandwidth. This decrease implies that a growing class of problems are dominated by memory access, and the actual achievable rate of such calculations has decreased over time. Read the entire page →
From page 43... ... Table 2-1 lists some of the common computational motifs used in LANL's ASC codes. For each motif, the type of parallelism, memory access patterns, and communication patterns, as well as observed bottlenecks are listed. Read the entire page →
From page 44... ... Motif Parallelism Memory Access Communications Synchronization Bottlenecks Data Structures Stencil operations Data parallel Regular/dense Point-to-point Memory bandwidth AMR Neighboring boundary on structured grids messages bound exchange Stencil operations Data parallel Irregular/dense Neighboring boundary Point-to-point Memory bandwidth Sometimes, AMR on unstructured exchange messages bound grids Particle methods Data or Irregular/sparse Neighboring boundary Point-to-point Memory latency, Yes thread parallel exchange messages network latency (divergent) Global or subset Global or subset collectives barriers Sparse linear algebra Data parallel Irregular/sparse Global or subset Global or subset Communication bound Sometimes, AMR and nonlinear collectives barriers solvers Dense linear algebra Data parallel Regular/dense Local operations N/A FLOPS, cache No, static Monte Carlo Data or Irregular/sparse Neighboring boundary Point-to-point Memory latency, Generally static methods thread parallel exchange messages network latency (divergent) Read the entire page →
From page 45... ... DRAM Memory DP L1 L2 L3 DRAM BW Latency FLOPS Vectorization Non-FP Flag 3D ALE AMR 3.70% 11.20% 96.30% Flag 3D ALE Static 2.20% 7.20% 97.80% xRAGE 3D AMR 6.50% 14.00% 93.50% PartiSN 42 Groups 26.20% 90.40% 72.80% Jayenne DDMC 15.50% 0.00% 84.50% Holraum NOTES: Red indicates a hardware resource that is heavily utilized, orange indicates moderate utilization, and green indicates light utilization. 3D, three dimensional; ALE, Arbitrary Lagrangian-Eulerian; AMR, adaptive mesh refinement; BW, bandwidth; DDMC, Discrete Diffusion Monte Carlo; DP, double precision; DRAM, dynamic random access memory; FLOPS, floating-point operations per second. Read the entire page →
From page 46... ... Software Infrastructure Disruptions For the past three decades, the global HPC community, including the NNSA and Office of Science laboratories, has leveraged a software and algorithm framework based on machines built as large collections of processing nodes connected via a message pass ing model called the Message Passing Interface (MPI) Read the entire page →
From page 47... ... Moreover, these challenges pale in comparison to the software challenges associated with emerging specialized processors, which require tools to design and test the hardware and their own implementation of standard programming languages or their own language. Although large commercial entities have the resources and expertise to design and implement such programming systems for large markets, the resulting programming systems are sometimes not well-suited to the NNSA workloads. Read the entire page →
From page 48... ... . In addition, ECP includes co-design centers that are developing software that supports common computational motifs found in the applications and a broad set of software projects that address other components of the software stack. Read the entire page →
From page 49... ... MARKET ECOSYSTEM DISRUPTIONS When DOE's Accelerated Strategic Computing Initiative (ASCI) 7 and Advanced Scientific Computing Research (ASCR) Read the entire page →
From page 50... ... Equally important, these companies focus on selling value-added services, not hardware, although all of them develop custom hardware to support their software services. Similarly, machine learning hardware and software are now a major focus of ven ture investments (e.g., Cerebras, GraphCore, Groq, Hailo, and SambaNova) Read the entire page →
From page 51... ... In response, Intel, Micron, Qualcomm, and GlobalFoundries, among others, have announced new plans for domestic semiconductor fabrication facilities. The CHIPS and Science Act also requires recipients of U.S. Read the entire page →
From page 52... ... It is now widely rumored that China has two exascale computing platforms that have not been "officially" reported or entered into the TOP500 competition, and several AMC Gordon Bell Prize submissions (used to measure performance on applica tions) were run on one of those systems, OceanLight, with impressive results. Read the entire page →
From page 53... ... computing leadership for national priorities in a globalized world will require increasing investments and attention. RETHINKING INNOVATIONS, ACQUISITION, AND DEPLOYMENT Given the dramatic changes in hardware -- driven by a combination of semiconductor constraints, the cloud service provider market, and deep-learning workload demands; new software models arising from the explosive growth of infrastructure and platform services and deep learning; and computing ecosystem economics accruing from these hardware and software forces -- it seems likely that NNSA will need new approaches. Read the entire page →
From page 54... ... Instead, NNSA should emphasize time-to-solution and identify the memory access motifs core to key applications as part of an end-to-end, hardware-software, co-design strategy. Hardware and Architectural Innovation and Diversity It is possible that the next generation of HPC systems can be built using evolutionary variants of system architectures, component technologies, interfaces, and memory hier archies, albeit likely with high acquisition costs and limits on the fraction of peak hard ware performance delivered to applications. Read the entire page →
From page 55... ... This project includes a large applications development effort focused on developing mission-critical applications that could effectively use exascale hardware and take advantage of state-of-the-art algorithms and software techniques. The software technologies and co-design elements of the project were valued based on their adoption by applications, yielding a vertically integrated software stack focused on meeting application requirements in which interoperability of the different components was a key feature. Read the entire page →
From page 56... ... While NNSA prioritizes performance and performance transparency with languages such as C++ and parallel extensions, much of industry and university education now focuses on languages like Java, Python, or Rust with their managed runtime systems, as well as machine learning frame works that hide parallelism. Scientific productivity has been identified as one of the top 10 exascale research challenges,16 and software productivity (the effort, time, and cost for software development, maintenance, and support) Read the entire page →
From page 57... ... • Cultivate a new relationship with the cloud vendors, each of which do custom hardware design and significant self-integration. The benefit here is that one could attempt to leverage their workforce. Read the entire page →
From page 58... ... Second, as previously described, the hyperscaler cloud providers are engaged in custom hardware development and will be more influential on the computing supply chain, including the semiconductor market, than NNSA alone or in partnership with the Office of Science. Moreover, as an increasing fraction of the workforce is being trained to use cloud services, there is also the potential to attract and leverage this experience, or conversely the inability to draw such talent if NNSA's environment is viewed as outdated or less productive than tools and systems used in the cloud. Read the entire page →
From page 59... ... This is in marked contrast to the earlier "killer micro" world, where developers of custom processors faced daunting technical and economic challenges, needing to develop a complete software and hardware environment and keep pace with the relentless performance increases of the mainstream microprocessor market. In the past, many custom designs were tried and failed. Read the entire page →
From page 60... ... More than incremental code refactoring, this must be a first principles approach that considers alternative mathematical models to account for the limitations of weak scaling. This "beat them" strategy acknowledges that targeted, custom hardware specialization is required to meet NNSA's future HPC performance needs, something the mainstream market alone is increasingly unlikely to provide. Read the entire page →
From page 61... ... Much as NNSA once worked collaboratively with vendors such as IBM and Cray to design and develop custom computing systems matched to NNSA needs, NNSA must again embrace collaborative ab initio system design, rather than specification development and product procurement. Such a model is likely to require more internal expertise in computer architecture, greater embrace of cloud software models, specification of novel and semi-custom architectures, end-to-end hardware prototyping at substantial scale for evaluation and testing, and partnership with nontraditional hardware and software vendors, notably AI and other hardware startups and cloud vendors. Read the entire page →
From page 62... ... RECOMMENDATION 1.3: The roadmap should be explicit about traditional and nontraditional partnerships, including with commercial computing and cloud pro viders, and academia and government laboratories, and broader cross-government coordination, to ensure that NNSA has the influence and resources to develop and deploy the infrastructure needed to achieve mission success. RECOMMENDATION 1.4: The roadmap should identify key government and laboratory leadership to develop and execute a unified organizational strategy. Read the entire page →

From page 35...

... Meanwhile, there is credible evidence that China was the first country to deploy exascale computing systems, targeting China's own national security interests. Moreover, technological shifts owing to the end of Dennard scaling and the slowing of Moore's law have raised questions about the technical and economic viability of continued reductions in transistor sizes and associated growth in computing performance, all at a time when the locus of semiconductor design is now being driven by artificial intelligence (AI)

2 Disruptions to the Computing Technology Ecosystem for Stockpile Stewardship Pages 35-62

2 Disruptions to the Computing Technology Ecosystem for Stockpile Stewardship
Pages 35-62