Visualizing Emptiness: Reflections on a Preoccupation with Missing Values
The first question to be answered for any visualization is always: what data to use? I spent a lot of time looking for outward-facing data about Wikipedia. When I finally found data about Wikipedia traffic by country, I knew I had the connections I needed between the world and the world of Wikipedia.
I cleaned data with R and visualized it with Processing, both open source tools. The top represents countries, colored by region and more broadly by global north (blue) and south (red). The bottom represents languages. Connections represent over 100,000 page requests in the year from April 2010 to March 2011. It’s interactive, countries and regions can be highlighted, and sorted by population, pageviews, pageviews per person, and internet access. All data is transparently available on rollover.
I think that visualization is amazing for its ability to force us to see what’s missing; to see the missing values in a collection of data. Anyone who has experience with data analysis, especially with analyzing other people’s data, knows the feeling of being totally preoccupied with missing values: how are they represented in the dataset? How should we deal with them – bootstrap to fill them in, or throw out the associated data completely? I find that visualization trains my mind to notice what’s missing.
Missed Connections in the Global South
When I sort by region, I can force you to see the emptiness, the missed connections in the global south. The more I do visualization work, the more I notice who’s missing, not just globally, but personally.
There are people in the empty spaces of the visualization who want to be Wikipedia editors, who want to contribute, but don’t know it exists, or don’t see a way in. Openness is easy – you can just attach a license and say something is open. Accessibility is hard – it requires someone to take responsibility, to commit sustained effort. So – the goal I propose is: we meet back in 10 years and see the circle FILLED. No more missing values, no more missed connections, no more empty spaces. With the quantity of Wikipedia data being collected, we will be able to see, rather than speculate on, exactly how a diversity of voices has changed patterns of edits, the content, and the connections of Wikipedia. We will all have a Wikipedia for everyone, that reflects the collaborative contributions of everyone.
Quotes from the jury
Erik Zachte, data analyst for WMF, says:
I find this visualization extremely elegant, even mesmerizing. It is a joy to play with the different options, and to watch how the screen responds. Part of its appeal is its complexity: It resonates with how many people see Wikipedia – colossal and manifold, it is not so easy to grasp its inner workings. Coupled with the orderly presentation, this complexity invites the user to dive in, and perhaps be the first to find some new treasure, some hidden pattern
Moritz Stefaner, information visualizer, commented:
The visualization is very rich in data and navigation modes. I much applaud the audacity to include this much data, navigation modes, and detail information, this has certainly been a great effort. The amount and density of the data is staggering.
“A Thousand Fibers Connect Us” is released under a Creative Commons BY-SA license and the underlying code will be published under an open license shortly.
Tilman Bayer, WMF Movement Communications
Dario Taraborelli, WMF Senior Research Analyst and WikiViz co-chair.