Flight Risk Visualization

Understanding Plane Crashes Through Representation

The power of flight has undoubtedly been one of man’s crowning achievements. However, despite the great technological leaps and bounds we have made, every time we step on a plane we are putting faith in a whole team of people who have designed, maintained, and operated a vaguely bird-shaped object made out of sheet metal.

Primal fears aside, flying is by far one of the safest ways to fly. You probably heard that all-too-familiar statistic that you’re more likely to get struck by lightening or win the lottery (or was it be struck by lightning while winning the lottery?) than die in an airplane crash. Nevertheless, the accidents do occur and understanding why they occur gives airlines valuable information for how to make the safest mode of transportation even better.

In a fascinating visualization provided by the BBC, users are able to scan through all of the airline accidents (domestic and foreign) that have happened since 1993. The visualization pulls its data from sources that are not clearly cited in the visualization interface, but may have come from various government agent reports like the ones provided by the Federal Aviation Administration (FAA). While no clear source data is provided to the user in the visualization, as the viewer we can reconstruct some of the relevant data fields that were collected and organized into a concise database. Specifically, the visualization relies on data points like:

  • date
  • type of airplane
  • number of fatalities
  • cause of the accident (e.g., weather, foul play, mechanical failure, etc.)
  • certainty of the cause of accident (e.g., certain, suspected, unknown)
  • phase of flight (e.g., take off, landing, cruising altitude, etc.)

From this data the interface then maps these variables rather expertly onto some important representational features. All of the accidents are presented in a linear fashion from newest to oldest, where the number of fatalities are then indicated as the relative size of the circle for each accident. That is, the larger the circle, the more fatalities that occurred for this particular flight.


We can also see that in addition to mapping fatalities to the size of each circle, the cause of each accident can be assessed by looking at the color of each circle. Here we can map these colors over to the right, where yellow is pilot error, blue is weather, orange is mechanical failure, orange is of criminal origin, and grey is unknown. The opacity of each circle also allows the user to ascertain the certainty of each cause. That is, the darker the circle, the clearer the certainty of the cause of each accident.

The user can dive down into the data by selecting various features in the interface. Say I am interested in knowing more information about one particular accident, as the user I can then click on any accident and view detailed information about each of the flights, the number of deaths, the date, the type of aircraft, the location, the airline carrier, the cause of the accident, and the certainty of each accident.

The anxious flyer can filter all accidents displayed along various dimensions. Say, for example, we were interested in showing the number of flights that crashed as a result of pilot error. At the top of the interface, users can filter the data using the filter selection bar.

Then by selecting the cause of the error (in this case the yellow button with the icon of a pilot), the data is reduced to show only the flights that terminated by pilot error. The data is presented linearly from newest to oldest, still employing the same mapping conventions as presented in the aggregated data (e.g., size → number of fatalities).

The data can also be filtered by the phase of flight at which the accident occurred. For the overly anxious flyer, knowing when these accidents happen may be of interest. The user can click to show the accidents that occurred while grounded, taking off, climbing, flying at cruising altitude, descending, and landing. The following shows the accidents that have occurred while the plane was at cruising altitude. This data can also be reduced by enabling multiple filters. For example, we can show the number of accidents that occurred at cruising altitude as a result of weather.

This visualization is an excellent example of how various dimensions of a set of data can concisely be presented quickly and efficiently to the user. This interface cleanly maps variables like time, number of deaths, and cause of accident to visually salient dimensions like location, size, and color respectively. The dimensions are cleanly distinguished from one another and more contextual information that is not directly presented visually can still be obtained through interacting with the interface.

The interface also uses principled design features like using color to map between data points and categorical divisions in the dataset. These colors are then further connected to the graphical representation to the right hand side that re-represents trends in the data in another format that may have been difficult to deduce from simply having the representation on the left.

However, the visualization is not without areas for improvement. One problem that I consistently dealt with was the use of opacity to indicate the certainty of the cause of each accident. The designer of this visualization makes some assumptions about the user’s ability to perceive a singular level of opacity across different colors. This is not a trivial process and may lead the user to incorrectly interpreting two differently colored accidents as having the same certainty of cause, when in fact they do not. The opacities also overlap in various places, making these opacity similarity judgements even more complicated.

Moreover, while there a nice match between the numerical values in the dataset and the graph to the right, the graph does not rearrange in a way that linearly matches the representation of time in the interface. For example, if top to bottom maps onto going back int time in the main visualization window, then it would be most useful to also make top to bottom the same movement through time in the graph. Rather, what we see in the graph is the opposite where top to bottom actually maps to moving toward the future. While I would not argue that this is an insurmountable hurdle, the design tradeoff here might result in a subset of users mismapping between the visualization main area and the plot when determining trends.