How to Use Tableau: Turning Complexity into Simplicity Part 2 This is part 2 of a blog series on using Tableau to change complex data into information that is easy to understand. Part 1 of the series focused on telling a story by creating visual images, and this post can be accessed by clicking here. In Part 2 of this series we focus on using Tableau to create tabular data that will undergo further quantitative analysis. INTRODUCTION During the past seven years at QualPro, Tableau Desktop has played an integral role in analyzing, organizing, categorizing, and preparing data for additional computational (rather than visual) analysis. Although this usage is not highly publicized as a principle strength of Tableau, the capabilities of Tableau in this mode are equally as valuable as using Tableau for visual analysis. Tableau’s ability to create specific data tables for additional analysis is outstanding, is native to the software, and is intuitively simple. Tableau has many intrinsic capabilities that allow you to keep track of your data and to organize it efficiently. For example, Tableau acts as a bookkeeper automatically, saving you time in data pre-processing and in custom programming. Here are 6 examples to demonstrate its functionality: 1. Easily Organize Data into Buckets 2. Rapidly Evaluate High Data Volume Systems 3. Replace Excel VLOOKUP 4. Rapidly Reshape Data 5. Easily Aggregate Data at Desired Time Levels 6. Replace Pivot Tables EXAMPLE 1: TABLEAU ALLOWS YOU TO EASILY PLACE ITEMS INTO USER-‐DEFINED GROUPS (OR “BUCKETS”) Many times data files are provided to you but some of the data fields that you need for analysis are not included in the file. If the data file is small, it is easy to add the fields that you want by adding another column to the database and use an Excel function (such as vlookup) to populate the values in the new column. If data files are very large (>1M rows) and in a csv format, you really don’t have this option unless you write a custom computer code to add the new columns. Rather than doing the custom programming, it is easy to use Tableau calculated fields to create the groups for you. Creating groups is sometimes referred to as creating “buckets”. Buckets are exceptionally helpful in organizing data that has to be analyzed in groups. Through the years, we have created many types of calculated-field buckets including volume, time, experimental settings, geographic regions, sales categories, etc. In fact, the only limitation there is for creating buckets is your own imagination. If you find yourself needing to group the data in some way that isn’t in the database, a bucket will help you solve the problem. One tip in creating buckets is to use the word “bucket” in the name of the 1 calculated field, so that you know what that field contains. For example, the name “MVT_Time_Period_Bucket” is a self-descriptive name that will not be confused other calculated fields that you may have. Two example bucket definitions are shown below to illustrate the general applicability of this concept. The first example is a time-based bucket that assembles data relative to the timing of a processimprovement experiment as shown in Figure 1. This type of bucket allows the aggregation of measures into specific, user-defined time blocks. The length of the time blocks are controlled using the “date” function that is available in Tableau calculated fields. As shown in Figure 1, four specific time blocks are defined by the “if, then, else” structure of the calculated field. When using time-based buckets, it is a good idea to make a check on your bucket settings before you complete your work. You can check the accuracy of the calculated time bucket by adding a column for the number of days within each bucket, as shown in Figure 2. In this example, the screening and refining experiments each lasted 42 days (6 weeks), so it is clear that the dates entered into the bucket formulation were correct. If a mistake were made in typing in one of the dates, that mistake could be hidden and could lead to erroneous results if you didn’t check the longevity of each time bucket. Figure 1-‐ A Calculated Field for a Time-‐Based Bucket. Figure 2-‐ Results of Using the Time-‐Based Bucket. The second example shows the usage of two different types of buckets. The first is a production level bucket for DMA-based data (Designated Market Area) using the average production rate in a month within a DMA, as shown in Figure 3. For each DMA, there can be hundreds of production facilities that contribute data to the overall DMA average. The reason for making these types of groupings is to assemble DMA-based groups of similar performance so that regional comparisons in performance can be made for similar sized-DMAs. With over 200 DMA’s and millions of lines of data in this example, buckets like these can make analysis much more straightforward because the behaviors of large DMAs can be very different than medium sized or small DMAs. A second time-bucket is also used to 2 generate the average monthly production over two distinct eleven month periods from Oct 2011 to Aug 2012 and Oct 2012 to Aug 2013. The output table shown in Figure 4 can then be exported via cross-tab to Excel for further quantitative analysis, such as calculating percent change over the two time periods. Figure 3-‐ A Production-‐Volume Based Bucket. Figure 4-‐ Results of Using Production Volume and Time Buckets. 3 EXAMPLE 2: RAPIDLY EVALUATE HIGH DATA VOLUME SYSTEMS Prior to conducting process improvement studies, historical data analysis is typically used to evaluate data stability and quality, as well as the accuracy of the measurement system that is used to collect the data. This work is completed to assess the stability of the process being investigated before experimental testing occurs. If large amounts of data are generated in high-volume production settings, it is necessary to rapidly perform statistical data analysis to enable a proper experimental design. An example of this situation is taken from a paper production facility. At this site, an automated scanner records 6 paper quality measurements every minute at 594 locations across a paper roll. This scanner generates 154 million data points per month, or nearly 2 billion measurements per year. Tableau is used to take these giant data sets and produce tables (and obviously charts) that summarize statistical variation within the data sets and to isolate anomalous data and/or time periods. Once Tableau is configured for the data source and the worksheets are set-up, it is simply a matter of appending new data to the csv file and refreshing the extracts to update the statistical tables and/or control charts for the machine. Monitoring the process in this way becomes routine. Charts such as shown in Figure 5, represent data from one location along the paper roll but they show how the measured variable changes over the course of a month. Once these charts are made, the anomalous data can be extracted from Tableau to isolate the problematic time-frames such as shown in Figure 6, and this information can be sent to the client for problem resolution. Figure 5 -‐ Using Various Time Aggregations on High-‐Volume Data. 4 Figure 6 -‐ Anomalous Data Isolated by Tableau. EXAMPLE 3: TABLEAU IS A REPLACEMENT FOR EXCEL VLOOKUPS If you find yourself still using the Vlookup function in Excel, this example is for you! Vlookups allow you to complete a type of data blending within Excel. The Vlookup function allows you to populate a new column of data that you want to have in your visualization. Figure 7 shows an example data file that needs to have the last column (Experimental Recipe) populated because you are going to aggregate results on this field. This is accomplished with the Vlookup function as shown in Figure 7. 5 Figure 7-‐ Using Vlookup Within Excel to Add Another Column of Data. This works fine for small data sets (<1M rows) but what happens if your file is millions of lines long? The vlookup work in Excel is not necessary (and not practical) because a simple calculated field in Tableau will accomplish the same task without having to add the Experimental Recipe field to the data file. Figures 8 to 10 shows the three-step procedure for accomplishing this task. Figure 8-‐ Original Data Set Without Experimental Recipe Information. 6 Figure 9-‐ Experimental Design Table showing Recipes Assigned to DMAs. Figure 10-‐ Tableau Calculated Field for Mapping Recipes to DMAs. 7 With the calculated field in Figure 10 installed in the Tableau Workbook, the Experimental Recipe can be used as a dimension and total sales and total margin can be summed up automatically. The data table is created very quickly in Tableau and sent over to another program for computational analysis. The beauty of this approach is that a lot of book-keeping is handled automatically for you. Generally, there are a variable number of stores per DMA and all of the results from the stores will be aggregated automatically. In large DMA’s there may be hundreds of stores that feed information into the aggregated results by recipe. In this example, the experimental recipe represents conditions are that applied uniformly to all stores within a DMA so they are treated the same. If you have non-uniform tested conditions within a DMA, you might need to assign the Experimental Recipes by Store Number rather than by DMA, since some stores are tested and other are not. In both of these cases, Tableau is acting as your book-keeper, doing all the work for you based on either the DMA or the store number. Potential drawbacks to this approach relate to poor client-side data quality including (1) misspelled or inconsistent DMA names over time, (2) stores getting assigned to different DMA’s over time, and (3) stores simply assigned to the wrong DMAs. For these reasons, it is a good idea to use Tableau to draw maps of the stores by DMAs to make sure that they are assigned correctly before proceeding with your experiment. QualPro has found numerous examples of the poor data quality in large databases that contain DMA information. You should also run a store count by DMA to make sure that it isn’t changing dramatically across your experimental time periods. EXAMPLE 4: TABLEAU ALLOWS YOU TO RAPIDLY RESHAPE DATA In late 2008, we wrote a utility we called the “Tableau Pivot Reformatter” to help us rearrange incoming client data that was given to us in a pivot-table style format (also referred to as a cross-tab format). In many circumstances, clients aggregate their data using pivot tables and pass these onto subcontractors, thinking that this data format is useful for analysis. Sometimes data simply is created and stored in a pivot-table style format, especially for manufacturing systems that have scanners or other forms of data collection devices throughout their process. However, pivot table style formats are not very useful in Tableau, and in essence, pivot-table data has to be un-pivoted to be most effectively used in Tableau. After receiving many pivot-table data sets from various clients, we developed the reformatting utility and used it for years before Tableau released a link to a similar tool. The “Tableau Data Reshaper” is one of the best tools for getting data ready for Tableau analysis. This Microsoft Excel add-in utility is available via download by clicking this link. This utility is highly recommended for Tableau users that receive a variety of incoming data from multiple sources. This tool has four primary features that make it very useful. Figure 11 shows the capabilities of this tool which include reshaping data, transforming a pivot table to a regular table, filling in merged cells and directly opening an Excel sheet in Tableau. 8 Figure 11-‐ The Tableau Data Reshaping Tool Menu. The reshaping data tool is the most used of these options and it is a powerful utility that is a huge time saver. If you receive data in a pivot-style format such as shown in Figure 12, you can use this tool to quickly change the data into a more useful three-column input for Tableau. Figure 13 shows the reshaping tool in action, and Figure 14 shows the resulting three-column data structure, which is then loading into Tableau. Figure 12-‐ Pivot-‐Table Formatted Data Example. Figure 13-‐ The Tableau Data Reshaping Tool in Action. 9 Figure 14-‐ The Results of Using the Tableau Data Reshaping Tool. The data reshaping tool has a couple of great features that make it indispensable. First, if you have a pivot-table style data set that will un-pivot to a size greater than Excel maximum of 1,048,576 rows, the utility will automatically write the un-pivoted data to a csv file rather than to multiple Excel worksheets. Secondly, your original data is left intact and the results of the un-pivoting operation are written to a new worksheet that will be titled “original_worksheet_name-Tableau”, where “original_worksheet_name” is where your original data is stored. Also, if you run the utility more than once, the new worksheets will be named “-Tableau2”, “-Tableau3”, etc. Qualpro testing of this utility over the past couple of years on a wide variety of data sets indicates that this tool will reshape data at the rate of about 1 Million lines per minute on a typical professional workstation. The performance of this tool will depend upon the computer CPU and hard drive speeds that you are using. EXAMPLE 5: TABLEAU ALLOWS YOU TO EASILY AGGREGATE AT ANY DESIRED TIME LEVEL One of the most powerful features of Tableau, is its handling of dates and times. When conducting process-improvement experiments, QualPro will sometimes calculate experimental results on a daily, weekly, monthly, quarterly, or annual basis. Creating the tables for these various time periods is automatic in Tableau and just depends upon how you pick the date setting on the columns shelf. Figures 15 to 19 show the results of aggregating at these five different time levels. Once you have the table properly configured in Tableau, the table is exported as a cross-tab to Excel (Worksheet|Export|Crosstab to Excel) so that the data can undergo further quantitative analysis. 10 Figure 15-‐ Data Aggregated by Day. Figure 16-‐ Data Aggregated by Week. 11 Figure 17-‐ Data Aggregated by Month. Figure 18-‐ Data Aggregated by Quarter. 12 Figure 19-‐ Data Aggregated by Year. EXAMPLE 6: TABLEAU IS A REPLACEMENT FOR PIVOT TABLES We have saved the best example for last. If you create pivot tables in Excel, simply quit doing it. It isn’t necessary anymore. Just use Tableau to create the same tables in about a tenth of the time, without any confusion and without any need to worry about whether the tables have been updated or not. No examples are needed. The process in Tableau is just that simple. Resign your role as the company “pivot-table” master and learn to use Tableau Desktop. You can thank us later for this advice. 13