As with most things in life, “passion” is the great motivator. Reading this book one day, I realized the pages were filled with data: beer style, ABV, IBU, color!
Turns out all this wonderful data was available on the web site – and so began the Viz and Fizz™ project. Yes, guess you could say I have a “passion” for beer.
I paid $5 (Fiverr.com
) to have someone cut/paste the information from 90 web pages into an Excel template.
I’m sure everyone has some data in this crosstab form, but for Tableau, it’s better to have the data “flattened.”
The Transpose tool allows you to pivot the orientation of the data table so you may view horizontal data fields on a vertical axis
'I began to envision this as an analytics story and thought ahead to the Tableau Conference in Las Vegas. That’s when my ideas got a lot more ambitious.'
“Eating my own dog food”
For our client work, part of my company's ASTRAL™
is to think about the “T,” tying threads of data -- what
other data can enrich the analysis. There’s good data at The BeerAdvocate
, but it wasn’t going to be as simple as collecting the beer data:
- The BeerAdvocate rankings update frequently
- A search for “Las Vegas” on Untappd returns almost 11,000 locations!
Both of these required a level of automation. Enter Alteryx! It’s perfect for collecting, parsing, arranging, and blending from multiple sources. This was an intense exercise in parsing and I found this site to be super helpful for testing regular expressions
. Would be great if Alteryx had a function like this. Here’s the plan I executed…
Alteryx has a function for capturing the HTML for a URL Then you have to process the HTML to strip out the data. This is good because I could run this routine and get the most up-to-date rankings.
The Download tool retrieves data from a URL to be used in downstream processing or to be saved to a file.
Here’s what the page looks like and the data I wanted. Word of caution, never assume -- even for a public facing service that isn't expecting you to scrape data off their site -- that they will have the data set up perfectly for you :). Turns out it's possible to have an entry without an ABV assigned, and that blew up my initial parsing algorithm.
This was the most challenging (I don’t recall if I had a celebration beer after figuring this out). First problem, Untappd only returns 25 locations on each page and greets you with a friendly prompt to "show more". Let's see, 11,000/25 = 439 clicks (already had one page). You got it, $5 more on Fiverr and I had the HTML source of one page with all the locations.
The data I really wanted is under each location listing -- the top beers.
I want to match these against the BeerAdvocate 250.
Alteryx in action
First step was to sift through the HTML page and extract the URL of the 11,000 locations within Untappd.com. Then I could use the Alteryx download tool to get the HTML page for each location! Yes, seems like a big deal, but not really once you have the workflow -- it was actually the simplest workflow of this project. One problem I ran into was trying to put the results into an Excel file; bad idea. Alteryx’s native yxdb file is much better.
Overall, I created five workflows to simplify my own work, but also so it would be easier for you, yes you, to see how all the pieces come together. There is a link at the bottom of the post to download all the files. I encourage you to go through each of them diligently.
With all this rich data, it would be tempting to create a sophisticated visual with all kinds of filters and colorful imagery. But, as I ask my client, “Now that having the data is easy, what do you want to accomplish, what DECISION are you trying to make?”
'I wanted to make it easy for someone (self included) to find the Las Vegas venues serving up beers on the BeerAdvocate top 250 list.'
As I thought about this a more, "easy" came to have another meaning – on my phone.
This might be the first Tableau work designed especially for the iPhone 6!
Having limited screen space to work with forced me through design tradeoffs and a lot of hard thinking. I hope you can see the power in “less is more” and are enjoying one of these awesome beers during TCC.
(note: the iPhone 6 width is 750 pixels, but I had to set the dashboard at 375 pixels to fit. Must have to do with something like PPI.)