National Academies Press: OpenBook

Massive Data Sets: Proceedings of a Workshop (1996)

Chapter: Visualizing Large Datasets

« Previous: Massive Data Sets: Problems and Possiblities, with Application to Environmental Monitoring
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

Visualizing Large Data Sets

Stephen G. Eick*

Bell Laboratories (A Division of Lucent Technologies)

Abstract

Visualization is a key technology for understanding large datasets. It is useful throughout the analysis process, for exploratory descriptive analysis, to aid in model building; and for presenting the analysis results. Our approach to visualizing abstract, non-geometric data involves domain-specific representations, multiple linked views, color, and a highly-interactive user interface using filtering and focusing to reduce visual clutter. We have developed a software infrastructure embodying our design principles for producing novel, high-quality visualizations of corporate datasets.

1 Introduction

Just as spreadsheets revolutionized our ability to understand small amounts of data, visualization will revolutionize the way we understand large datasets. Our research focuses on extracting the information latent in large databases using visual techniques. The difficulty in extracting this information lies in understanding the complexity of the databases. To aid in this task, we have created many novel, highly interactive visualizations of large datasets. This involved developing the techniques, software tools, and infrastructure to mine knowledge from corporate databases so that it can be put to competitive and commercial advantage.

*  

AT&T Bell Laboratories-Rm 1G-351, 1000 East Warrenville Road, Naperville, IL 60566, email: eick@research.att.com

Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

2 Domain-Specific Representation

A key component of an effective visualization involves the visual representation of the data. The representation determines how the items in the dataset are rendered on the computer display. The best representations are often domain-specific: scatterplots for statistical data, maps for spatial data, and node and link diagrams for network data, for example. Inventing a representation for a new domain is a difficult, creative, and iterative process.1 The representation should take full advantage of perceptual cues such as size. positions. color, depth, and may even use motion and sound.

3 High Information Density

Our representations are often compact, color-coded glyphs positioned spatially. By using compact glyphs that overplot gracefully we can pack a lot of information into an image and thereby display a large dataset. A high-resolution 1280×1024 workstation monitor has over 1,300,000 pixels. Our goal is to use every pixel to display data, thereby maximizing the information content. in the image.

In some cases is is possible to display an entire dataset on a single screen, thereby eliminating the difficult navigation problems associated with panning and zooming interfaces that focus on small portions of the database.

4 Interactive Filters

Often information-dense displays become overly cluttered with too much detail. One approach to solving the display clutter problem involves interactive filters that reduce the amount of information shown on the display. Humans have sophisticated pattern recognition capabilities, perhaps due to our evolution, and are very efficient at manipulating interactive controls to reduce visual clutter. We exploit this to effortlessly solve the complex computational problems involved with determining when a display is too busy for an easy interpretation. Our approach is to leverage people's natural abilities by designing user interface controls that parameterize the display complexity.

1  

See the Figures for examples from some domains that we have considered.

Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

5 Multiple Linked Views

The power of our representations is magnified through the use of interaction and linked views. Each view, whether custom or standard (color keys, bar charts, box plots, histograms, scatter plots, etc.), functions both as a display and a control panel. Selecting and filtering data in one view instantly propagates to the other views, thereby providing additional insights. Linking multiple views interactively provides an integrated visualization far more powerful than the sum of the individual views.

6 Systems

Our systems have been used to successfully analyze and present software version control information. file system sizes. budgets, network traffic patterns, consumer shopping patterns, relational database integrity constraints, resource usage on a compute server, etc. The amount of information that our systems present on a single screen is between 10,000 and 1,000,000 records. Some of the more interesting systems we have built include:

  1. SeeSoftTM-lines of text in files [Eic94] (Figure 1)
  2. SeeSlice-program slices and code coverage [BE94] (Figure 2)
  3. SeeLog-time-stamped log reports [EL95] (Figure 3)
  4. SeeData-relational data [AEP95] (Figure 4)
  5. SeeNet-geographic networks data [BEW95] (Figures 5 and 6)
  6. NicheWorksTM-abstract networks [EW93] (Figure 7)
  7. SeeDiffTM-file system differences
  8. SeeLib-bibliographic databases [EJW94] (Figure 9)
  9. SeeSys-hierarchical software modules [BE95] (Figure 10)
  10. SeeSalesTM-retail sales inventory and forecasts (Figure 11)
  11. SeeTree-hierarchical data
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

Figure 1.

Lines of Code Colored by Age

Figure 2.

Forward Program Slice

Figure 3.

Log File View

Figure 4.

Relational Database View

Figure 5.

Christmas Morning Long-Distance Traffic

Figure 6.

World Wide Internet Traffic

Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

Figure 7.

Market Basket Analysis

Figure 8.

Demographic Information

Figure 9.

Document Retrieval

Figure 10.

Hierarchical system view

Figure 11.

Sales by Week and Event

Figure 12.

Organization Productivity by Week

Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
  1. SeeFraud-network calling fraud.
  2. Since the needs of each user are unique, the best visualizations are task-oriented. The most successful visualizations help frame interesting questions as well as answer them. Our visualizations:

    • Make use of existing data. In many cases large databases of vital importance to an organization already exist. Our visualizations extract meaningful information from this data.
    • Are directed toward real problems with targeted users. Our efforts are motivated by business needs and address real problems.
    • Focus on understanding and insight. Results are more important than any particular technique.
    • Are used throughout the analysis process including the initial data exploration, intermediate model formulation, and final result presentation.

    7 Software and Technology

    Underlying all of our visualizations is a common infrastructure embodied in a C++ library that handles interaction, graphics, and view linking. This C++ Visualization Library helps us to:

    • Minimize our development time,
    • Encapsulate expertise and design principles,
    • Build cross-platform systems (UNIX/X11, Open GL, and PC/Windows), and
    • Keep visualization application code small.
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

8 Conclusion

Visualization is a key technology that can help users understand the complexity in industrial-sized systems. We have exploited this technology to investigate a variety of large and complex data sets. Interactive data visualization is complementary to other analytic, model-based approaches and will become a widely used tool for extracting the information contained in large complex datasets.

Acknowledgments

The research presented here represents the joint efforts of Jackie Antis. Dave Atkins, Tom Ball, Brian Johnson, Ken Cox, Nate Dean, Paul Lucas, John Pyrce, and Graham Wills.

References

[AEP95] Jacqueline M. Antis, Stephen G. Eick, and John D. Pyrce. Visualizing the structure of relational databases. IEEE Software, Accepted for publication 1995.

[BE94] Thomas Ball and Stephen G. Eick. Visualizing program slices. In 1994 IEEE Symposium on Visual Languages, pages 288-295, St. Louis, Missouri, 4 October 1994.

[BE95] Marla J. Baker and Stephen G. Eick. Space-filling software displays. Journal of Visual Languages and Computing, 6(2), June 1995.

[BEW95] Richard A. Becker, Stephen G. Eick, and Allan R. Wilks. Visualizing network data. IEEE Transactions on Visualization and Graphics, 1(1):16-28, March 1995.

[Eic94] Stephen G. Eick. Graphically displaying text. Journal of Computational and Graphical Statistics, 3(2):127-142. June 1994.

Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×

[EJW94] Stephen G. Eick, Eric E. Sumner Jr., and Graham J. Wills. Visualizing bibliographic databases. In John P. Lee and Georges G. Grinstein, editors. Database Issues for Data Visualization, pages 186-193. Springer-Verlag, October 1994. Lecture Notes in Computer Science.

[EL95] Stephen G. Eick and Paul J. Lucas. Displaying trace files. Software Practice and Experienced, Accepted for publication 1995.

[EW93] Stephen G. Eick and Graham J. Wills. Navigating large networks with hierarchies. In Visualization '93 Conference Proceedings, pages 204-210. San Jose, California, 25-29 October 1993.

Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 121
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 122
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 123
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 124
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 125
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 126
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 127
Suggested Citation:"Visualizing Large Datasets." National Research Council. 1996. Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/5505.
×
Page 128
Next: From Massive Data Sets to Science Catalogs: Applications and Challenges »
Massive Data Sets: Proceedings of a Workshop Get This Book
×
 Massive Data Sets: Proceedings of a Workshop
Buy Paperback | $65.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!