Skip to main content

Currently Skimming:

5 Investment Trade-offs in Advanced Computing
Pages 83-101

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 83...
... From the smallest-scale system to the largest leadership-scale system, one of the challenges of advanced computing today is the capacity requirement along two well-differentiated trajectories -- namely, highthroughput computing for "data volume"-driven workflows and high parallel processing for "compute volume"-driven workflows. Although converged architectures may readily support requirements at the small and medium scales, at the upper end, leadership-scale systems may have to emphasize some attributes at the expense of others.
From page 84...
... must consider, as it balances the needs of existing computational users against a rapidly emerging data science community. The chapter then turns to another critical trade-off, between investments in production and investments to prepare for future needs (Section 5.2)
From page 85...
... As another example, a design that allocates more time to computing capabilities may complete its analysis faster but may not be able to store 1  Communication volume is used here as shorthand for the more accurate and complex representation of internode communication, including latency, point-to-point bandwidth, bisection bandwidth, network topology and routing, and similar characteristics. Latency in particular is critical for many applications; some algorithms require high bisection bandwidth.
From page 86...
... When instruments, computers, and archival sites are geographically distributed, the data they produce may be processed and consumed at multiple sites, requiring special attention to the wide-area networks needed to transfer the data, how data should be staged and consolidated, and so forth. For experiments, deciding how much data to save is a trade-off between the cost of saving and the cost
From page 87...
... INVESTMENT TRADE-OFFS IN ADVANCED COMPUTING 87 FIGURE 5.2  A simplified view of computing, including the three axes of compute performance, input/output (I/O) and file system performance, and internode communication (network)
From page 88...
... The Blue Waters project has enabled breakthrough scientific results in a range of areas, including an enhanced understanding of early galaxy formation, accelerating nanoscale device design, and characterizing Alzheimer's complex genetic networks.2 NSF also supports the development and integration of midscale HPC resources through its XSEDE program, which provides HPC capacities to the broader scientific community along with resources for training, outreach, and visualization and supporting research in such areas as earthquake modeling and the simulation of black hole mergers.3 Further trade-offs concern the maturation of simulation science from one-off simulations of a select few critical points of a high-dimensional modeling space to ensemble calculations that can manage uncertainties 2  See Blue Waters, "Impact," https://bluewaters.ncsa.illinois.edu/impact-overview, ac cessed January 29, 2016. 3  XSEDE, "Impact," https://www.xsede.org/impact, accessed January 29, 2016.
From page 89...
... As a consequence of trends in both hardware and software, including multicore nodes with high degrees of parallelism and sophisticated algorithms that require higher levels of data sharing while reducing the number of operations per unit data, the communication-volume dimension is a key differentiator in how trade-offs need to be managed. The advanced systems for simulation science often require that significant fractions of the cost budget are invested in the form of low-latency and high-bandwidth communication networks to couple multicore processor nodes.
From page 90...
... An issue complicating the discussion is that leadership-class systems for simulation science are operated mainly by research organizations and the government, while leadership-class systems for data science today are operated mainly by industry. Advances in HPC system architectures have generally been shared.
From page 91...
... As a simple example, the total online data storage for Blue Waters and XSEDE systems is in aggregate on the order of 100 PB, while online data storage systems at Google can be estimated at over tens of exabytes,6 two orders of magnitude larger. In addition, the architectures at Internet-scale commercial companies are designed for the continuous updating and reanalysis of data sets that can be tens to hundreds of petabytes in size, something that is again rare in the research environment.7 Presently costing several hundred million dollars, an exabyte of storage will become affordable for science applications within a few years because both disk and tape storage are still following an exponential increase in density and reduction in cost.
From page 92...
... NSF needs to play a leadership role in both defining future advanced computing capabilities and enabling researchers to effectively use those systems. This is especially true in the current hardware environment, where architectures are diverging in order to continue growing computing performance.
From page 93...
... It will thus be important for NSF and the research users it supports to be involved in the national discussion around exascale and other future-generation computing, including through the recently announced National Strategic Computing Initiative, for which NSF has been designated as a lead agency. At the same time, it will be especially important that NSF not only is engaged, but is actually helping to lead the national and international activities that define and advance future software ecosystems that support simulation and data-driven science.
From page 94...
... Whenever considering trade-offs, it is important to keep in mind that designing for a broader overall workflow almost certainly means configuring a system that is not perfect for all individual workflows; rather, it is able to run the entire workflow more effectively than other configurations. Thus, simply maximizing the performance or capability of one aspect, such as floating-point performance or data handling capacity, will not provide useful guidance.
From page 95...
... In fact, the systems that dominate the Graph500 benchmark9 are all large HPC systems, even though this benchmark involves no floating-point computation. Similarly, there are other features, such as large memory size, high memory bandwidth, and low memory latency, that are desirable in leadership-class systems for a wide range of problems, be they datacentric or simulation/compute-centric.
From page 96...
... 5.6.2  Trading FLOP/s for Data Handling and Memory Size per Requirements Analysis In the short run, even as it develops a more systematic requirements process, NSF needs to ensure continued access to advanced computing resources (which include both data and compute and the expertise to support the users) , informed by feedback from the research communities it supports.
From page 97...
... An example of a new architecture that required many applications to be rewritten is the successful adoption of distributed memory parallel computers, along with messagepassing programming, more than 20 years ago, which enabled an entire class of science applications. A related issue is the one of scientist productivity versus achieved application performance balanced with efficient use of expensive, shared computational resources.
From page 98...
... NSF has already established several services that support application developers in making better use of the systems, both for XSEDE and for the PRAC teams on Blue Waters. The initial investments in the SISI program are a good start.
From page 99...
... Note that the Blue Waters procurement was one of the few for leadership-class systems that required overall application performance, including I/O, as part of the evaluation criteria; as a result, this system has more I/O capability than most systems with the same level of floating-point performance and is, in fact, as powerful for I/O operations as the leadership-class systems planned by DOE for 2016-2017. 5.6.6  General-Purpose Versus Special-Purpose Systems There are some applications that on their own use a significant faction of NSF's advanced computing resources.
From page 100...
... Such systems are needed to support current NSF science; by ensuring that there is adequate I/O support, as well as interconnect performance and memory, such a system can also address many data science applications. These systems must include support for experts to ensure that the science teams can make efficient use of these systems.
From page 101...
... The Beacon system at the National Institute for Computational Sciences, partly funded by NSF, provided access to Intel Xeon Phi processes before they were deployed in production systems by TACC. The team that proposed Beacon included researchers from several scientific disciplines, including chemistry and high-energy physics.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.