Read "Data Visualization Methods for Transportation Agencies" at NAP.edu

« Previous: Chapter 2: How to Illustrate Data

Page 16

Suggested Citation:"Chapter 3: Developing Effective Visualizations." National Academies of Sciences, Engineering, and Medicine. 2017. Data Visualization Methods for Transportation Agencies. Washington, DC: The National Academies Press. doi: 10.17226/24755.

Page 17

Page 18

Page 19

Page 20

Page 21

Page 22

Page 23

Page 24

Page 25

Page 26

Page 27

Page 28

Page 29

Page 30

Page 31

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

16 Consider each step in the process of developing an effective visualization in order to imbue the finished product with focus and meaning. First, you must acquire and refine a dataset â a process called âdata wranglingâ in the visualization community â analyze the data, and identify patterns and findings that you can call out visually. To hone your message, identify your intent and audience â who needs to know about your data, what do you want them to think and do about it? For example, do you want them to change their daily behavior or try to change a law? With data in hand and clear intent, identify and execute a strategy, using appropriate charts and communicating with clarity. Finally, use the best tools to implement and share the project effectively within your organizationâs practice. 3.1 Â· Data Wrangling Find your data and make it yours Before you begin visualizing data, you must find, acquire, and prepare it. Analysis and visualization require accurate data that are well-structured for your task. The process of transitioning raw data inputs into presentable data sets has come to be called âdata wrangling.â Martin Wattenberg and Fernanda Viegas â cofounders of the IBM ManyEyes project â note that it is important to work with real rather than mocked-up data, since manufactured data will rarely contain the nuances of the real thing. Wattenberg compares working with real data to getting feedback from real people. Information is Beautiful creator David McCandless observes that sometimes the data may seem boring, and in these cases the practitioner may be able to find additional data to normalize, compare, or merge, or the boredom might be a cue to ask deeper questions. Jeffrey Heer, co-creator of the Trifacta data-wrangling tool, has cited survey results showing that between 50 and 80 percent of productive time spent by industry data analysts is for formatting and integration. His team uses a process for data wrangling that includes the following tasks: ï· Discover the content and patterns in your data. âSketchingâ your data can provide a positive feedback loop. Illustrating a dataset can make outliers and patterns in the data obvious where a spreadsheet might hide them, and it simplifies a key thought process â are the outliers mistakes or do they point to a real phenomenon?; ï· Structure the data to have only the needed attributes, named and formatted in a way that maximizes comprehension; ï· Clean the data to eliminate meaningless or undesirable outliers (i.e., null values reported as 0 or 99); ï· Enrich the data with relevant additions that illuminate trends or provide necessary context; and ï· Validate the prior steps by, at a minimum, assessing whether each attribute is formatted properly and falls within logical constraints (e.g., percentage sums to 100). Volume of Data It is equally realistic that a transportation practitioner could seek to display a single data point as it is that she may wish to portray millions. For example, a report to the residents of a town might wish to convey the 0-9 NBI rating of a local bridge. This one data point can be placed in context (e.g., a bar chart scaled from 0-9), translated (e.g., ï / ï), or illustrated (e.g., diagrams or photographs showing the damage that drives the rating). On the other hand, the same agency may wish to convey a dozen condition metrics on thousands of bridges through a single visualization. Methods that would make little sense for the single data point, such as geographic search functionality, mouse-over information windows, and animation, become sensible for larger datasets. Volume of data is closely tied to enrichment â you may need to add additional data to provide context and visual interest when you have a small dataset. For

17 example, West Virginia DOT visualized four alternatives for replacing the Dick Henderson Memorial Bridge, as shown in Figure 6. Figure 6: Build Alternatives Comparison for the Dick Henderson Bridge (WVDOT) With the larger data set, the alternatives (i.e., the data points) provided a full context for the data and fulfilled the designerâs intent â to compare the cost, closure time, and maximum grade of each design while also demonstrating the aesthetics of each. By contrast, once Alabama DOT selected an alternative for the Mobile River Bridge, its intent became selling the project to neighbors by demonstrating the visual impact of the structure. To do this, the Visualization Team enriched the bridge model with four square miles of Downtown Mobile, to allow residents to âseeâ the bridge from their doorstep. Figure 7 portrays an overview. Figure 7: 3D Model for the Mobile River Bridge (ALDOT) https://informedinfrastructure.com/18532/building-a-blockbuster-bridge/ Acquiring Data Data may be available in-house, but rarely are they already clean and in the ideal format. If data are acquired from a vendor, the format may be negotiable, but adapting them for the chosen visualization platform may still take some effort. Our survey respondents often visualized in-house data, but also often augmented them with free data from several common sources, including: ï· Highway Performance Monitoring System (HPMS) â HPMS is an information system maintained by the Federal Highway Administration (FHWA) that is built from required annual submissions by DOTs. Statistics including mileage, pavement condition, traffic volume, and functional classification can be found at http://www.fhwa.dot.gov/policyinformation/statistics.cfm. It is important to note that the website can be challenging to navigate and does not function with all browsers â we recommend Microsoft Internet Explorer; ï· National Bridge Inventory (NBI) â Like HPMS, NBI is compiled from annual DOT submissions to FHWA. For each bridge of sufficient size, states are required to provide physical characteristics (e.g., type, length, height), as Text intentionally left small to focus the reader on the overall image.

18 well as the results of an annual inspection and condition assessment. The data can be downloaded at https://www.fhwa.dot.gov/bridge/nbi/ascii.cfm. A delimited format is available behind the link for each year, for easy upload into Microsoft Excel or any other data wrangling and analysis tool; ï· US Census and the American Community Survey (ACS) â The US Census takes place every 10 years and participation is compulsory for all US residents. To fill in the intervening years, the Census Bureau completes the ACS annually using a sample of households in each census tract (approx. 3.5 million individuals per year). Generally, the ACS will be combined over a 3- or 5-year period. The demographic and employment data (including vehicle ownership and commute mode choice) from the Census and ACS are available through American FactFinder at http://factfinder.census.gov/; and ï· GIS Sources âWhere an agency standard basemap does not exist, Esri provides options to customers of its ArcGIS package, while OpenStreetMap relies on a worldwide network of contributors to provide a basemap for free with attribution. The primary means of accessing all of these alternatives is through a GIS tool. Beyond basemaps, most states maintain free GIS datasets and formatted layers for public use that are easily found through an online search, as do Federal Agencies such as the US Geological Survey (USGS) and the US Census Bureau Topologically Integrated Geographic Encoding and Referencing (TIGER). Data may also be acquired through a âdata scraper,â a procedural routine â typically based online â that extracts data from websites and documents to convert it into a tabular format (e.g., www.import.io). The practitioner can use these tools to collect and store a live feed over an extended period of timeâ either for retroactive analysis or to develop a visualization of live data. Data Wrangling in the Real World Brian Card and Mike Barry: âVisualizing MBTA Dataâ Brian Card and Mike Barry, creators of the prominent visualization project âVisualizing MBTA Data,â described their development process in a lecture at Simmons College in Boston (January 15, 2015). The video of their presentation is available from the WGBH Forum Network at http://forum-network.org/lectures/data-visualization-how-do-it-and-do-it-well/. (The video is a production of WGBH Educational Foundation Â© 2015) Key points of interest, with time stamps of their locations in the video, include: ï· Brief Overview of the Product: 1:52 â Card demonstrated an animated map of train positions, a static line chart (with time and space on the axes) with annotations for important events, a heatmap of station entries and exits against time, and a scatterplot of overall transit times (including on-train travel and wait times) for each pair of stations over the course of the day; ï· Research: 4:10 â The team discussed which elements of the MBTA would be interesting for users and the public to experience visually. Card emphasized the importance of identifying objectives in advance, because âonce you have a dataset, you start thinking in terms of whatâs easy to do with that data, instead of whatâs important.â Barry and Card chose to focus on locating congestion and delay, illustrating the impact of large events and snowstorms, and giving each user a takeaway about his own commute; ï· Brainstorming: 8:00 â Barry and Card brainstormed illustrations iteratively by sketching them on paper and uploading them to Google Docs for comment; ï· Data Acquisition: 9:10 â A snapshot of train positions is publicly available from the MBTA. Barry and Card periodically downloaded the snapshots to form a month-long dataset. Each member of the team added a redundant set of records each minute. Merging the datasets resolved missing records, as shown in Figure 8; Figure 8: Use of Redundant Datasets in âVisualizing MBTA Dataâ (Brian Card) ï· Interpreting Data Elements: 12:00 â Card noted a key record in the train location file: predictions of time-to-station. While the train might not actually take that amount of time (measured in seconds) to reach the station, the value hitting zero indicates that it has arrived. Barry and Card interpreted the data slightly differently from its intent, but used that interpretation to calculate the actual value;

19 ï· Data Wrangling Tools: 14:40 â Barry and Card used node.js (a JavaScript library) for processing their files in the JSON format. The visualizations themselves are built in D3.js, and the code is stored in BitBucket because GitHub (a more commonly-used competitor) makes all draft code public; and ï· Iteration: 16:10 â âNot all of the ideas that look good on paper look good with real dataâ¦ we had 6,000 JSON files and no idea what our dataset looked like. The only way that we could look at it was by building visualizations.â Barry and Card built many draft visualizations and tested them against their objectives. If an attempt did not tell their intended story, Barry and Card not only tried again, they attempted to identify elements of the failed attempts that were interesting and could inform future attempts. Barry takes over the presentation at 17:00 and describes the teamâs chart type and stylistic choices. We will pick up the description in Section 3.4. 3.2 Â· Intent and Audience Whatâs your story, and who needs to hear it? Conceptualizing and planning a visualization project is about telling a story, so you can frame it around your intent and audience: ï· Intent is the ânugget of truthâ that a visualization must make obvious. This visualization may be the only thing your audience knows about this topic. What do you want that to be, and what do you want them to do as a result? ï· Your audience should be comfortable with your tone and level of technical language, so align it to your audienceâs role and experience. Make comparisons, allusions, and references that tell your audience âI get where youâre coming from, and Iâm meeting you there.â You should keep your desired outcome â an element of intent â in mind throughout the process. Is your intent simply to inform your audience about a topic, or do you wish for them to take action? If so, what type of action? Do you need to highlight certain elements of a dataset not only because they are interesting, but also because they relate to an important proposal or initiative for which you want to gain support? Reviewing your data before you start may lead you to an insight to explore through visualization. Amanda Cox, editor of âThe Upshotâ at The New York Times, recommends that you âlearn to sketch with data,â by which she means creating rapid, low-fidelity sketches of various visualizations to identify patterns and findings that will interest your intended audience. This way of designing allows you to put tangible products in front of people for discussion. Intent âIntentâ is the question you want to answer or the outcome you want to encourage. Generally speaking, a visualization will convey a fact or an argument about a topic. For transportation practitioners, frequent topics include proposed projects, assets (e.g., bridges, roadways, bike lanes), the traveling public, and budgets. In many cases, the transportation practitioner must assume that the audienceâs entire understanding of a topic will be driven by a particular illustration. Being firm in your objectives can be a help you build a focused visual. In a blog post entitled âvisualizing opportunity,â visualization author Cole Nussbaumer- Knaflic demonstrates how a focus on communication leads from a formatted table to a more intuitive view of key characteristics and elements within the dataset. We summarize here process here, beginning with Figure 9. Figure 9: Initial Formatted Table, âvisualizing opportunityâ (Cole Nussbaumer- Knaflic) http://www.storytellingwithdata.com/blog/2015/9/16/visualizing-opportunity Nussbaumer-Knaflic makes the following immediate refinements: ï· The blue background represents a meaningless variation in color, so it is removed; ï· The sample size does not lead a reader to any interesting conclusions (i.e., it is not part of her intent), so she moves it to a footnote; ï· For a focused visualization, she applies a heatmap to the more easily- understood metric: average score.

20 After these refinements, the table appears as shown in Figure 10. Figure 10: Intermediate Formatted Table, âvisualizing opportunityâ (Cole Nussbaumer-Knaflic) She then notes that her objective is to show opportunity: how much better could we be doing in each category? So she revises her chart type to a stacked bar with a transparent gap between reported and benchmark performance, yielding her final product as shown in Figure 11. Figure 11: Final Formatted Table, âvisualizing opportunityâ (Cole Nussbaumer- Knaflic) Through this process, Nussbaumer-Knaflic has clarified the context of her data, focused the audience on the most important metric, and communicated additional information about that metric (the opportunity for improvement) by visualizing the data rather than stating it. Audience A beautiful and informative visualization does no good if it cannot its target audience cannot understood it. An overly technical illustration will not effectively reach an audience of laypeople. A designer can positively impact audience response by playing to its known interests through: ï· Visual Cues â Section 3.4 will discuss the use of human-recognizable objects. Beyond using familiar imagery, you may wish to tie the cues directly to your audience. Pictograms of local landmarks and icons as well as color schemes taken from an agency, state, or local university or sports team, can communicate your desire to connect with them; ï· Tone â Beyond avoiding technical jargon, your tone should be intentional. If your audience is expecting something casual, formal language will fail to resonate, and vice versa; and ï· References â With almost any data project for a local, regional, or State agency, information should be compared locally unless the intent is to place local data in a national or international context. An example of how these concepts can be applied: Chris Hedden, Dan Krechmer, and Ron Basile of Cambridge Systematics produced a cartoon-based slide presentation (Figure 12) to inform Transportation Planners about connected and self-driving cars. Figure 12: âThe Top Five Things Planners Need to Know About Self-Driving Vehiclesâ https://www.camsys.com/insights/top-5-things-planners-need-know-about-self-driving-vehicles Despite the technical audience, they chose a casual approach to convey the inevitable ubiquity of the technology and the high-level approach of the slides, and to capture an audience that might avoid the topic because it was widely perceived as too complex to address. The audience became open to taking in the technical details because they were presented in an accessible manner. The document achieved record views and inquiries, suggesting that it motivated people to delve into the topic further.

21 3.3 Â· Analysis Are you and your data telling the same story? Your analytical and aesthetic decisions should reflect the nature of your dataset. Explore how much data you have, how many ways it can vary, and your need to illustrate uncertainty. Selecting a chart type or homing in on a âlookâ without considering the data may make your visualization difficult to comprehend. Analysis is part of a feedback loop with Data Wrangling and Intent â If you realize that your data donât tell the story you wanted, do you clean, manipulate, or add data, or do you want to re-evaluate the argument you are making? Do your outliers signal error, or do they have meaning that you need to consider? Are your data in general trustworthy: do you need to show uncertainty? Visualizing for an Audience of You As with the other elements in the feedback loop, one way to make analysis easier is to visualize early and often. It will help you understand the data and, as a result, use it more appropriately. You are creating a visualization because it will illuminate patterns and increase clarity for your audience â take advantage! In March, 2010 interview with acmqueue, Fernanda Viegas notes the importance of identifying patterns through iterative visualizations: â[We] spent the whole summer trying to figure out a good way to visualize [Wikipedia] editors, but we kept getting these not-very-useful results. At one point we tried just to get a sense of the shape of the data using bar charts, line graphs, and stack graphs, but that wouldnât tell us anything either. Eventually, we decided to try out a very weird technique, which was mapping streams of text to colors. This makes you lose a lot of information because text is really rich and you can only use so many colors. All of a sudden we saw patterns. Someone was going around all of Wikipedia correcting typos; another person was working on images; another was working on stub sortingâ¦ Looking back, we feel that the very first experiments we did with the data were on too high of a level. They were abstracting too much away from the data and not giving you this sort of messiness that Wikipedia has, which is everybodyâs there, every day making minute changesâ¦ that add up to patterns. This notion of how close to the data you want to be and what is your question â what is the story you want to tell? â seems to be really important. Data Literacy To be data literate, you must understand what your data both can and cannot be made to communicate, and identify where relevant uncertainty can be shown visually. A lack of absolute certainty is not an impediment to effective visualization, and not all uncertainty is necessary to illustrate. Furthermore, data literacy can aid in the analysis-intent feedback loop â a logical problem often offers an opportunity to improve your message. Critiques of data literacy and appeals to critical thinking can be found in many forms and from many commentators. In his Data Journalism Handbook, Nicolas Kayser-Bril outlines some of the pitfalls of drawing unsupported conclusions: âWhen writing about an average, always think âan average of what?â Is the reference population homogenous? Uneven distribution patterns explain why most people drive better than average, for instance. Many people have zero or just one accident over their lifetime. A few reckless drivers have a great many, pushing the average number of accidents way higher than most people experience.â (http://datajournalismhandbook.org/1.0/en/understanding_data_0.html) Applying this principle to a transportation context, it may be the case that the majority of intersections experience below average accident rates, or the majority of bridges have above-average maintenance records. When visualizing these datasets, you should be prepared both to respond to an audience that points out these âlogical flawsâ and to reflect them in your intent. Do you want to visualize the difference from the average, or can you reduce your sample set by focusing only on the problem locations? âArticles about the benefits of drinking tea are commonplaceâ¦ although the effects of tea are seriously studied by some, many pieces of research fail to take into account lifestyle factors, such as diet, occupation, or sports. In most countries, tea is a beverage for the health-conscious upper classes. If researchers donât control for lifestyle factors in tea studies, they tell us nothing more than ârich people are healthier, and they drink more tea.ââ

22 Once again applying the principle to transportation, a map of mode choice across a region may show lower-income areas commuting by transit more often than by single-occupancy vehicle, except in areas nearby to centers of service and manufacturing employment (which have shifts outside of transit operating hours). It would be insufficient to simply draw conclusions about mode choice in these neighborhoods without accounting for these demographic trends; adding them presents the opportunity to provide your audience with useful insight and illuminate new parts of your data. Beyond simply showing the audience that the data do not present certain conclusions, you also can develop and visualize scenarios based on varying assumptions, as demonstrated by the Victoria Transportation Policy Institute in Figure 13. Figure 13: âAutonomous Vehicle Sales, Fleet and Travel Projectionsâ (VTPI) http://www.vtpi.org/avip.pdf General approaches for visualizing uncertainty include: ï· Using a visualization strategy that clearly communicates that the data are not meant to be exact (e.g., shapes instead of columns on a column chart); ï· Fading edges, increasing transparency, or in some other manner altering the appearance of conventional data points (as shown in Figure 14); and ï· Including error bars (an alternative approach â the Catâs Eye (Figure 15). Figure 14: 3D Worldwide Air Pollution Map, where Color Indicates Confidence (Kai Pothkow, Britta Weber, and Hans-Christian Hege, âProbabilistic Matching Cubes.â Computer Graphics Forum, 30(3):931-940, 2011.) Figure 15: âCatâs Eyeâ Approach to Visualizing Statistical Error (Geoff Cumming) http://www.psychologicalscience.org/index.php/publications/observer/2014/m arch-14/theres-life-beyond-05.html

23 Using Visualization to Drive Analysis Beyond the need to perform analysis to drive your visualization, it is important to recognize your visualizationâs potential for informing and facilitating analysis done by others. For example, the Delaware Valley Regional Planning Commission (DVRPC) developed the Ridescore metric for bicycle accessibility at Philadelphia-area commuter rail stations. Not only does the metric combine many measures of accessibility in to one easily-consumed number, it also allows for the data to be presented in a single map. The screenshot in Figure 16 shows this map, which leaves the immediate impression that bicycle accessibility improves the closer one gets to the city center, as well as identifying outliers â suburban stations with superior access for cyclists. The same interface displays the constituent scores when a user clicks on a station. Figure 16: Ridescore (Delaware Valley Regional Planning Commission) http://www.dvrpc.org/webmaps/ridescore/ Virginia DOT (VDOT) provides another example in Figure 17. DOTs are adopting dashboards to illustrate system performance, either in a static form (i.e., to report performance to the public) or in an interactive form (i.e., to allow planners and budget-makers to project the consequences of their decisions). Dashboards can greatly facilitate performance-based planning and budgeting, a key mandate of recent federal legislation. Figure 17: The VDOT Dashboard (Virginia DOT) http://dashboard.virginiadot.org/default.aspx Text intentionally left small to focus the reader on the overall image. Text intentionally left small to focus the reader on the overall image.

24 3.4 Â· Choosing a Strategy Bringing your story to life on the page Your strategy for visualizing your data represents not only the chart type or types that you include, but also how you customize your charts and illustrations to reflect your intent, your audience, and the elements of your data. Overall, your tasks when choosing a strategy include: ï· Selecting a chart type or types; ï· Selecting a medium; ï· Differentiating your data points; and ï· Ensuring that your visualization is useful, clear, and memorable for your audience. Chapter 2 addressed chart types and their use cases in detail. This section will focus on the other three tasks. Selecting a Medium Your medium has a profound impact on your design. Zooming and filtering of data is impossible if the medium is static. If your visualization is intended for a large-scale poster or presentation board, then you can either expand the dimensions of a single visualization or make a greater number of simpler charts. The form and dimensions of the page or screen can and should drive the arrangement and even the inclusion of information â if it is placed where the audience will have to scroll down, flip a page, or turn around to see it, they may not see it. If the visualization is to be delivered in a printed book, information on some pairs of consecutive pages (i.e., facing pages, which form spreads) is far easier to consume at once than on other pairs, where the pages are on reverse sides of the same sheet. The possibility of publishing content in web-based documents opens new opportunities for your audience to tour through information and for presenting interactive visualizations naturally in the course of a document. The Washington Post produced a classic best practice for this approach in its 2014 feature âReimagining Union Station.â The story uses the full width of the userâs screen, with content appearing on multiple panels and at multiple widths to ease mobile viewing. Visualizations include photographs, static illustrations, maps, charts showing demographic and economic data, and interactive renderings. (http://www.washingtonpost.com/wp-srv/special/business/reimagining-union- station/ ) A similar visual production would not have been possible in a printed newspaper, but the level of technical detail (including the budget and funding approach for the project) and reporting would not have been possible in a purely digital medium without text (such as presentation boards or a slide show). Differentiation Every visual distinction should communicate useful information to the audience. Elements of your dataâs appearance should each reflect an attribute that (a) varies and (b) is important to show varying. We refer to these attributes as âdimensions.â In designing your visualization, you will need to decide upon many dimensions to depict. Taking NBI bridge data for a state as an example: ï· With zero dimensions, the visualization shows how many bridges there are. This could be accomplished with a stylized number, with a collection of small bridge icons, or with a proportionally-sized box (in reference to some outside point of comparison); ï· One dimension could be location (e.g., a map of bridges), NBI condition (e.g., a bar chart), type (e.g., a pie chart or treemap), and so forth; ï· Two dimensions could any pair of the above. For instance, location and condition could be visualized at the same time using a choropleth, with regions colored by average condition; and ï· Three dimensions could add another variable. For instance, if time were added to the above, the choropleth map could be animated to show changes in average condition in each region over time. Tamara Munzner and Torsten MÃ¶ller discuss dimensions in the language of âmarks and channels.â To them, a mark is a âbasic graphical element or geometric primitiveâ â a point, line, area, or volume. A channel is a means of controlling appearance. MÃ¶llerâs slide presentation on the topic lists position, size, shape, orientation, and hue/saturation/lightness as channels.

25 Recognizing that orientation is fundamentally an element of shape, and accounting for the possibility of data points appearing or disappearing in an animation or a series of images, you can change five things about the appearance of your data points: 1. Position â You can change where a data point is located on the page on three axes; 2. Color â As noted by MÃ¶ller, elements of color include hue, saturation, and lightness. Some image editing programs also will allow you to change transparency and add patterns in place of solid colors; 3. Shape â Shapes are not only simple geometry, but human-recognizable objects as well. Shape also includes rotation and orientation; 4. Size â Elements can be proportionally-sized in terms of length, width or area; and 5. Existence â Assuming an animation or a series of static images, data points can appear and disappear between frames. Because only these five visual characteristics of a data point can change, a maximum of five dimensions can be represented in a visualization. To wit, if your data includes 30 dimensions, you will need to iterate through data wrangling, intent and audience, and analysis to identify the five (at most) that tell the best story. It is possible to make visual choices that have little useful meaning and detract from comprehension. Many visualization tools, for instance, will default to showing each record in a different color based on ID number or name. Referring back to Section 3.1, Cole Nussbaumer noted that the blue background on her tableâs header row constituted meaningless color so she removed it. Making visual choices without clarity implies that you lack clarity about your data and intent. Memorability and Comprehension The academic community has produced innovative and important guidance for visualization. The MassVIS team at MIT (http://massvis.mit.edu/) provides an additional set of recommendations for maximizing recognition and recall. After conducting online experiments that tested subjectsâ attention to and retention of visualized information, the researchers concluded that: ï· Memorable visualizations have memorable content. While sparse designs with significant white space may be more attractive, something needs to jump out and stick with people. This can be relevant background imagery, bright colors, a unique typeface, etc.; ï· Titles and text are key elements. According to the MIT teamâs research, the most memorable part of a visualization is the title. Their results also support labels next to the data (as opposed to below the axis) and limited, effective captions; ï· Human recognizable objects (e.g., pictograms) can add to effectiveness. Instead of text-based labels, designers should consider using visual cues or pictures. This extends to bars, columns, and lines, as well â making them resemble a related object improves retention; and ï· Redundancy improves comprehension. Repeat elements such as titles, captions, labels and pictograms as much as possible and appropriate among related visualizations. Figure 19 shows an excerpt from the Florida Transportation Plan that provides memorable imagery and colors, emphasizes important text, and uses human-recognizable objects.

26 Figure 18: Excerpt from the Florida Transportation Plan (Florida DOT) http://floridatransportationplan.com/) Another key concept from our review of academic research is âcongruenceâ â the idea that visual design decisions should convey a meaning similar to the one conveyed by the data. For example, this would exhibit poor congruence: a chart of hybrid car ownership using green to depict regions with the fewest vehicles and brown to depict regions with the most vehicles. To reference the discussion in Section 3.1, congruence may be audience-dependent. For example, if you are presenting data to a DOT that places its state in a national context, you may choose to represent its state with a color or icon familiar to the audienceâsuch as the main color from the state flag. Chapter 4 provides more detail on when and how to tailor your style to your audience. Visualization Strategy in the Real World Brian Card and Mike Barry: âVisualizing MBTA Dataâ The first 17 minutes of Barry and Cardâs seminar at Simmons College are discussed in Section 3.2. Moving on from data wrangling, they discussed their strategy and process for visualizing the data. ï· Organizing the Information: 17:40 â Barry recounts that âeach of the different views of the data answered a different question better than the other views did.â It wasnât possible to have a single overview. Barry and Card noted that their favorite visualizations were tall webpages that navigate using scrolling (as opposed to links) and chose that approach; ï· Innovating through Development: 22:30 â Barry and Card recognized that not only do some of our ideas not pan out in implementation, but some ideas we didnât consider to be promising look great. Their mantra was âwhen that happens, just use it everywhere.â Barry gives the example of the line- based system map, which was originally to appear in only one location but was so successful that they added it to multiple other views. In another example, the team experimented with changing the appearance of visualizations and highlighting information in response to the reader hovering over parts of the text. Again, it was effective enough to implement widely; ï· Seeking Feedback: 23:40 â âWays that people use your visualization incorrectly give you really useful feedback. The trick is that theyâre correct and your visualization is wrong.â Barry and Card connected with a data visualization professional and sought his insight before completing their project; ï· Accounting for Screen Size: 27:42 â Barry and Card developed their visualizations on a MacBook. The test users viewed the project on larger and smaller screens and accordingly recommended that they either âuse more of the real estateâ or shrink their content to prevent scrolling. The team resolved this with Bootstrap, a web coding library that allows a developer to automatically adapt content to fit screen size. They tested the project with all modern browsers; ï· Accounting for Screen Size: 29:30 â Barry and Card added one more visualization at the end of the project. Shown in Figure 20, it allows the user to select any two stops and observe the range of transit and wait times (and

27 from there the travel time). They had felt that a core question: âHow long will my commute take?â had gone unanswered. Barry refers to their model as a âmartini glassâ â you start out with wide-reaching overviews, narrow in on specific attributes and data points, and finish by widening back out and allowing for exploration and personalization; and Figure 19: Travel Time Scatterplot from âVisualizing MBTA Dataâ (Mike Barry and Brian Card) http://mbtaviz.github.io/ ï· Implementation: 31:00 â Barry and Card hosted their work at GitHub Pages due to its simplicity, lack of cost, and unlimited traffic accommodation. They added a date and header, used AddThis to include sharing buttons (partially to grant the site credibility for people stumbling across it). They implemented Google Analytics to track unique visits and visitors. Finally, they added tags to tell social media networks how to render an image, description, and title when the page is shared. 3.5 Â· Tools and Implementation Maximizing your visualization toolbox There are a growing number of tools for creating data visualizations. You can draw simple graphics by hand or creating them in a straightforward image editor such as Microsoft Paint or PowerPoint. You can build data-driven visuals in basic tools like Microsoft Excel and advanced tools like Tableau or build interactive online visualization using Tableau a coding library like D3.js. You can also use multiple tools in the process of creating a single visualization. Choosing the right tool depends on your strategy and your level of expertise. This section describes many of the most useful visualization tools covering a range of strategies and skill levels. We use our professional judgment to define the ease-of-use of each of the tools. Common Tools and How to Use Them Map Tools Creating sophisticated maps has become relatively easy with modern GIS tools. Esri has long been the major player in GIS, but recently open source projects have brought powerful mapping tools within reach of everyoneâs desktop. Esriâs ArcGIS is the gold standard in GIS software. It is a full-fledged professional tool, but even novice users can create simple maps. Developers can create custom interactive web pages and apps using ArcGIS servers, APIs, and software developer kits (SDKs). ï· Platforms: Windows (desktop and server) | Online via web | API for developing apps and web pages. ï· Cost (as of April 30, 2016): Desktop â $1,500 and up | Online - $2,500 for five users and up | Server - $5,000 and up for perpetual license | $100 for personal use | Discounts for non-government organizations, non-profits, and schools.

28 ï· Support: Esri provides online documentation and self-service and paid support | Esri Developers Network | Esri-related conferences and user groups | Extensive community of users | Books | Commercial support. ï· Publishing online: Via Esri cloud (requires service credits) or your own ArcGIS server. QGIS is a powerful free and open source GIS. Its capabilities are constantly evolving and can be extended through various free plugins. You can publish your maps on the web if you have access the necessary equipment and expertise. ï· Platforms: Windows | Mac OS X | Linux | Android. ï· Cost: Free, open source (Creative Commons Attribution-ShareAlike 3.0). ï· Support: Online community | Online documentation and tutorials | Books | Commercial support. ï· Publishing online: QGIS Server and Web Client | Export to Leaflet or other servers. General Tools The multi-purpose office tools allow users to build many of the most basic data visualizations and, with practice, they can make elegant visualizations. Microsoft Office comprises components that include Excel, PowerPoint, Visio, and PowerBI. Excel is often a first stop for exploratory data analysis and data wrangling, and can produce a number of data visualizations. PowerPoint can be good way to combine various visualizations with text to create infographics and visual presentations. Visio is useful for creating drawings. Power BI is a general- purpose visualization environment with a free version that can be published online via a subscription service. ï· Platforms: Windows (desktop and cloud) | Mac OS X | Windows (Power BI). ï· Cost (as of April 30, 2016): Free (Power BI desktop and service) | $150 and up (Office) or $70 per year and up (Office 365 â cloud) | $300 and up (Visio) or $13 per user per month (Visio for Office 365). ï· Support: Microsoft provides online documentation and tutorials | Active user community. ï· Publishing Online: Power BI can publish to the Power BI service. Adobeâs Illustrator, Photoshop, and InDesign are often used to polish and enhance visualizations created with other products. You also can use Illustrator to produce some basic visualizations. ï· Platforms: Windows | Mac OS X. ï· Cost (as of April 30, 2016): Part of Creative Cloud, starting at $9.99 per month for a single application. ï· Support: Adobe provides online documentation and tutorials | Active user community. ï· Publishing Online: Not available. General visualization tools allow you to upload data from a variety of sources (e.g., Microsoft Excel, comma delimited, R). Once the data is in place, the application can illustrate it in dozens of ways with limited customization. Finished visualizations can be exported for use in reports and presentations. Some tools facilitate hosting for interactive projects. Tableau is a general-purpose visualization environment with powerful tools for creating interactive data visualizations. You can combine them into dashboards and combine them into stories. The free version, Tableau Public, allows you to publish and reference your visualizations on the Tableau Public site (as long as you can let viewers download your data). ï· Platforms: Windows (desktop and server) | Mac OS X |Online via web. Adobe Creative Suite

29 ï· Cost (as of April 30, 2016): $999 (personal desktop) | $1,999 (professional desktop) | $10,000+ (server) | $500 per user per year (online) | Free (Tableau Public) | Discounts for non-profits and educational use. ï· Support: Tableau provides online documentation and self-service support as well as paid support | Tableau-related conferences and user groups | Extensive community of users | Examples readily available (visualizations on Tableau Public can be downloaded and reverse-engineered). ï· Publishing Online: Tableau Public | Tableau Online or Server | Hosted visualizations can be embedded in other web pages. Qlik is a general-purpose visualization environment with powerful and easy-to- use tools for creating interactive data visualizations. With a paid version or cloud hosting, you can embed visualizations or share them on the web. Qlik provides an API that enables you to mashup and extend visualizations in sophisticated Web applications. ï· Platforms: Windows (desktop) | Online via Web | API for developing apps and web pages. ï· Cost (as of April 30, 2016): Desktop - free for personal or internal business use | $20 per user per month for Qlik Sense Cloud | $1,500 per token (one user or ten logins per month) | QlikView Enterprise (server) priced on hybrid server and client access model. ï· Support: Qlik provides online forums, consulting, training, and conferences | Active user community. ï· Publishing Online: Qlik Sense Cloud (share with up to five others, 250 MB free). For Developers Custom, interactive visualizations like those seen in The New York Times generally are developed in JavaScript (an internet browser coding language). To build visualizations using these libraries, you will need software programming skills and comfort with web publishing. Data Driven Documents, or D3.js, is an open-source JavaScript library that provides powerful visualization components. If you have strong web-development skills, you can find an example visualization that fits your strategy, copy the code, and build your own. ï· Platforms: JavaScript | Runs in all recent web browsers. ï· Cost (as of April 30, 2016): Free, open source. ï· Support: D3.js provides online documentation and lots of examples | Active user community | Vast gallery of examples, many with source code shown. ï· Publishing Online: JavaScript scripts in a webpage, any web server. You can add charts and graphs to Google Sheets, and you can access those same visualizations and data through various APIs. Google Maps is accessible via API, enabling various map-based visualizations. Fusion Tables is an application to gather, explore, and share data tables. It helps you find public data, visualize it, and host it online. ï· Platforms: JavaScript | Runs in all recent web browsers. ï· Cost (as of April 30, 2016): Free, under terms of Google APIs Terms of Service (https://developers.google.com/terms/). ï· Support: Google provides online documentation and forums | Active user community. ï· Publishing online: JavaScript scripts in a webpage, any web server.

30 Data Wrangling Trifacta enables analysts of all skill levels to work with and manipulate complex data. As much as 80 percent of effort in a visualization project can be absorbed by cleaning and formatting your data, and Trifacta automates parts of that task. Whether you are accessing complex big data or a simple spreadsheet, Trifacta can help you prepare it for a visualization tool like Tableau. ï· Platforms: Windows | Mac OS X. ï· Cost (as of April 30, 2016): Free (except Wrangler Enterprise â data wrangling for Hadoop). ï· Support: Trifacta provides online training, videos, and basic documentation | Active user community. ï· Publishing Online: Not applicable. R is a power tool for data wrangling and statistical computing that also creates data visualizations. It is like a software development environment â the basic package includes a command-line editor and interpreter. RStudio provides a graphical development environment but still requires you to write scripts. Several graphics packages make creating plots and charts fairly easy, and Shiny (also from RStudio) produces interactive web pages. ï· Platforms: Linux/Unix | Windows | Mac OS X. ï· Cost (as of April 30, 2016): Free, open source (GNU General Public License). ï· Support: R provides online documentation | Active user community. ï· Publishing online: Through packages like Shiny by RStudio (which has both free and supported versions). Tips for Implementing Advanced Visualization Advanced data visualization can be engaging, beautiful, and informative. It can form the basis for how people think about an entire topic. This type of visualization requires building a toolkit of web development, statistical analysis, software programming, and graphic design. It is enticing to imagine taking an online course, learning a JavaScript coding library and building a fancy visualization. That will not be possible without first understanding the basics of each skill. This level of comprehension will help you to develop the capabilities in house, bring in the right kind of employee, or hire the right vendor to accomplish the work on your behalf. To get you started, we provide a job description for an online visualization professional on our website: vizguide.camsys.com/.

31 3.6 Â· Putting It All Together One practitionerâs example Members of our team worked with the New Hampshire DOT to develop a Sankey Diagram for the departmentâs Transportation Asset Management Plan. The chart shows the flow of funds from revenue sources on the left â through funds and programs in the center â to uses on the right, all proportionally-sized and colored by revenue source. Figure 21 shows the chart and the bullets to the right walk through how we considered the elements of this Guide to produce it. Figure 20: New Hampshire Funding Flows â Typical Year (New Hampshire DOT, 2015) ï· Data Wrangling â We held a workshop to explain the types of data we needed and what we planned to do with it. We collected written documents (e.g., Citizenâs Guide to the Transportation System and annual reports for the Turnpike and DOT) and spreadsheets (e.g., a comprehensive budget book) describing cash flows. We had a sense of who the audience would be and the story we wanted to tell, so we refined the data so it had common revenue, program, and expenditure categories and names. This took some effort. ï· Intent and Audience â The audience for this chart includes the public, FHWA, internal staff, and legislature. The intent was to explain to this audience how money is spent on different asset management programs, by asset (i.e., how much did you spend on maintenance and how did you pay for it?). We wanted to highlight connections among revenue, programs, and investment categories. As we sketched with stacked bar charts, we could see how revenue tied to programs but not how it related to expenditures. We needed something that had more connections. ï· Analysis â The Sankey requires that every flow balances. The DOT does not manage their income and investments like this, so we needed to make some assumptions to tie them together. We went back and modified the data, creating a hypothetical fiscal year that explicitly ties the flows together through the whole process. We checked with the fiscal folks to make sure that these assumptions were appropriate. ï· Choosing a strategy - The Sankey Diagram was effective at communicating our intent to our audience. We wanted to make clear how the revenue sources flowed through the diagram, so we kept them in the same color scheme (e.g., all toll revenues are in blue). We added text throughout to help the reader understand the chart. We also experimented with the organization of the flows to ensure readability. ï· Tools and implementation - We used Excel to wrangle the data. We generated the diagram using SankeyMatic, a free online tool built in JavaScript, but easy to learn for those without coding experience. The final graphic was built in Adobe Illustrator by tracing a screenshot of the raw diagram; this allowed us much more control over the look and feel of the chart.

Next: Chapter 4: Style Guide »

Data Visualization Methods for Transportation Agencies (2017)

Chapter: Chapter 3: Developing Effective Visualizations

Welcome to OpenBook!

Get Email Updates