Analisis con Tableau: The process behind a Viz

sábado, 23 de abril de 2016

The process behind a Viz

Few subjects are as broad as the Tableau chose for his first Iron Viz contest this year: food. Precisely because of the immense amplitude of the subject, when I first saw the post announcing the contest I thought "it will not be easy". And indeed it has not been, because if we talk about food we cood talk about production, transportation, consumption, price, availability, habits ... in a global or local perspective and in general or for specific products. So for this post accompanying my visualization that I submit to the contest I wanted to talk not just about the visualization itself, but also about the entire process behind a viz, and especially these four main challenges: the choice of what to analyze and looking for data, the approach to the data, the analysis and finally the visualization. I hope you find it interesting. You can also find a version in Spanish if this post here. Let's go. 


 1. Choosing the topic


Initially I thought that deciding what to analysis was would take me much longer that eventually did. The fact that I knew quite fast the approach I wanted for the visualization. I didn't want to analyze what was happening in a particular country, neither a self quantified approach using data from several months ago when I recorded all the food I had. I wanted an approach as global as possible, focusing on the differences between the different countries in the world and find some big picture insights.



A common problem with this approach is that, in general and independently of the topic, it is not easy to find databases that include information of all or at least most of the countries in the world, for the same period and without having a lot of missing values. After some initial searches in Google without much luck, the FAO came to my mind and I started looking at his statistics section. Good decision. On their web they have a lot and very detailed information on production, trade, prices, emissions from agriculture, environmental indicators and everything seemed to be quite robust in terms of countries available and time series. Now the challenge was to decide between all that amount of data, something that I find interesting enough to analyze and visualize. Something I love about data analysis is to think that can help improve things, either making people aware of a topic, calling to action or making people more interested about something.

First step to change anything is to be aware of it. And for me, raise awareness is of the key objectives of data analysis.

After a while searching in the website, when I discovered at FAO the Food Balance data which include figures for food supply in kcal / capita / year (it's important to clarify that these figures represent only the average supply available for the total population and not necessarily indicates what people actually consume), by product and by country from the 1961 to 2013, I had no doubts: that was information to analyze.

2. Between big numbers and the detail


Ok, I have my data. But before starting the data analysis and because the big variety of data available (supply of almost 100 products, 200 countries and during the last 40 years) it's important to ask some questions to yourself:

  • ¿Should I focus on the big numbers and the big picture worldwide and analyze the main differences between regions or countries or search for more detail and dig in the data for more concrete and subtle differences even between products?
  • Should I search for differences over time or just analyze a specific period?
  • Or all of that, but keeping the visualization at the end closer to a general overview of the data without going into too much detail or lose this more general approach and gain depth in the analysis?


Decide what story to teel with the data available was an important point especially because the lack of time available to analyze and create the visualization. In addition, after my submission for the previous year, this year I wanted to stick to a single dashboard and on a single screen. So I knew I was going to have at the end a lot of interesting information and insights that it will not be possible to include without overburden the visualization.



At this stage, it's easy to get lost among all the data available and start analyzing without order or sense. So it's important to remember the approach that I raised at the beginning: a global vision of the food supply around the world showing differences between regions and countries.

3. Analysis: the key for a good visualization


I had the data and the approach decided. It was time to start asking questions and trying to answer them. The first questions that came to my mind were: what countries and continents ...

  • ... have a greater food supply in calories/capita/day nowadays?
  • ... accumulate food supply in less products?
  • ... have a bigger variety of products?
  • ... experienced bigger changes in food supply quantity over the years?
  • ... have a similar composition of food supply by product and trends over time?

Personally, I always try to create a list of several to answer before starting the analysis. It's a good way to stay focused on your approach and objectives especially when time is limited and I have total freedom to create the visualization. But is very important not to focus exclusively on the questions initially raised. These should serve as a starting point, but the analysis process itself should generate additional questions and new insights to discover and analyze. One question always has to lead to another one, a new way to analyze the information or deepening on the facts that the data is showing.

It is time  then to interrogate the data, and nowadays I don't know any better tool than Tableau to do so. The great advantage of Tableau is that it allows us to ask questions to the data from almost all perspectives that I can think about and makes especially easy to give continue analyzing even when I think that I've found what I was looking for. Which is another of the points that I consider more important for building a good visualization. Always making sure that I leave enough time for the design.


4. Visualizing from different perspectives


For me it's essential to view the same information from as many different ways as possible: trends over time, aggregations, totals, averages, percentages, annual changes, outliers, trends ... The more views I have, also makes more difficult to decide between them, but it also helps me to generate in my head the skeleton of what finally the viz is going to include and look like, find the best way to communicate a message or discover interesting facts that initially I didn't notice. So  if my viz has 5 charts in total, probably I create x10 more charts during my analysis. 

Here are a couple of examples of graphics finally included in the visualization and other versions that I finally discarded:




I was really unsure with the map, as the HexMap shows better the measures of small islands in the Caribbean, Oceania and the small countries of Eastern Europe, that tend to almost disappear in the standard map because of big size of countries such as Russia, China, Canada or USA.

Also, during the analysis and visualization process I made additional decisions to make the visualization easier for the final user. For example, the database includes data from the 60's, but I focused the analysis from 1993. The main reason was the lack of data in several countries before 1993. The small number of countries with data available in 2012 and 2013 made me decide to filter those two years also.

The food groups were created using this information from the FAO as source, having to group all the products one by one. While countries were grouped creating a map with a mark points and using the lasso tool.

During the analysis of the regions was obvious that the food supply in Africa was much lower than in Europe or North America so I wanted to make a more complex analysis and analyze this with more detail. The idea was to visualize the differences for each of the 97 products. But instead of showing the total amount in kcal / capita / day per product, I thought it was more interesting to analyze the differences in each country with respect to all others. So I made the following table showing the difference by product versus the standard deviation of the average of all countries to quickly visualize for each country which products supply is significantly higher, and even more important which ones are significantly lower than the average of all countries.




Stands out that African countries have a lack of more products. In addition, generally, available supplies seems to reflect the diet of the countries quite accurate. In Spain, for example, stands a greater supply of olive oil, pork, oranges, potatoes, eggs, tomatoes, onions, beer and wine, among other product. I also decided to push to the back all products that are in the limits of the standard deviation to further enhance which are below and above. The calculation for creating this table calculation to compare country values vs the average + standard deviation is:
IF [Total per country per year] > WINDOW_AVG([Total per country per year]) + WINDOW_STDEV([Total per country per year]) THEN "above" ELSEIF [Total per country per year] < WINDOW_AVG([Total per country per year]) - WINDOW_STDEV([Total per country per year]) then "below" ELSE 'within limits' END
Specifying in the table calc options that it must be computed at the level of Country and restarting every Product.


Finally I wanted to add some insights that I found interesting during the data analysis, but because of the lack of space available in the dashboard, I decided to create a new sheet inviting users to hover over a text and showing the insights in the tooltip. I think it's also a good way to make the user pay more attention to the insights, because it has to hover over the text, so it's showing initially more interest to learn the insights.



Click on the next image to see the visualization.



Now I only have to wait and see if I win the IronViz. Don't forget to vote in twitter.

No hay comentarios: