Tableau as Traditional BI

Tableau was one of the early self-service BI vendors, and has really made it’s impact by allowing any end user to not only access data, but to interact with it, ask their own questions, and build their own vizzes. The idea is to actually empower users to get what they want out of data, rather than what you prescribe.

That said, having sold Tableau for 4.5 years, I’ve come to realize that no matter how great self-service BI is, there are certain pieces of “Traditional BI” that people can’t let go of, no matter how value-less they seem. After hearing the same request enough times, sometimes you just have to cave to tradition and allow the Cognos users their one feature request.

“The less there is to justify a tradition…the harder it is to get rid of it” — Mark Twain

Luckily, the 19.3 release of Tableau came with two features which make this incredibly easy. The ability to use Python in your Prep flows makes it really easy for anyone who can write Python to distribute CSVs. A 20-line Python script could distribute the relevant data to the users you want, but not everyone can write Python. The reusable steps option in Prep Builder 19.3 makes it easy to socialize this content, providing an easy skeleton for any user to use your Python flow.

The last piece of the puzzle is choosing the end users who should receive the report. By building a consistent table which can be incorporated into your flow, you can centralize this piece as well. I’ll walk through the steps below.

One column has a list of Groups you’d like to send the email to, one column has the name of the Site those Groups exist on, and one column has the File Name that the CSV should be distributed with.
  • Build a Distribution Table. This can be in a database or a simple CSV document, and should have the below structure.
  • Create a Tableau Prep flow. This can be as simple as a single file that you want to send as a CSV or it could be a 100-step Prep flow coming from 20 datasources.
  • Filter your Distribution Table so it returns only one row: the desired groups for distribution.
  • Join your Flow Result to your Distribution Table. There’s no matching columns here, so you’ll need to join on 1=1. This will perform a cross join, adding the Group Names to every row of your Prep Flow.
  • Add a Script step (more details below).
  • Add an output, publish your Flow, and schedule it to refresh on the schedule of your choosing!

Once you’ve followed the above steps, Tableau will, at the appropriate interval that you’ve chosen, run your entire ETL flow, send a CSV to all of the appropriate users, and publish the .hyper file to Tableau Server. You can even make this resuable by publishing up all of the “scheduling” steps to your Tableau Server so other people can reuse it.

Publish the entire flow except the data input step. This makes it a reusable distribution tool for other to download and plug into their own flows!
Right-click and insert the published steps into any flow you’re working on.
Drag the output of your existing flow as the input to your inserted flow. Change the filters on your Distribution Table to the desired schedule and simply publish your new flow. It’ll send to the chosen groups on the schedule of your choosing.

So what’s the magic of the Python script? It does a couple things in this case, and all of the code is available here. It does a couple of things.

  1. Logs in to Tableau Server.
  2. Gets the list of all users on the Site.
  3. Finds which users are in the specified Groups.
  4. Compiles a list of those addresses.
  5. Sends an email to all of those people!

I’ve broken it out into two functions.

  • emailer()
  • GroupMailer()

The emailer() function is pretty simple. It takes two arguments: a list of emails and a dataframe. Tableau Prep’s Python integration requires that you create a function which takes in a dataframe and returns a dataframe. In this case, we’re just taking in the data that you’d like to distribute. We do no modifications at all to it, just send it via email and return it to Tableau Prep.

The GroupMailer function is the more complex one. It leverages a couple calls from Tableau’s REST API to find all of the necessary email addresses and compile them into a list, which is then used in the emailer() function. You could, of course, create your own list of email addresses and pass that in instead.

To facilitate that, I built a separate function called PersonMailer(). It functions almost exactly like the GroupMailer() function, but allows you to pass in comma-separated email addresses instead of relying on Tableau to generate them. The downside of this is that it’s harder to scale (reports often go to thousands of users), but the upside is that these users don’t need to be licensed on Tableau Server! We’re simply using Python to send the emails out, so if you need to send it to unlicensed users, distribution lists, or dummy email addresses, this function should work perfectly for you.

Web Data…Conductors?

Tableau 19.3 released this week, and with it came a whole host of features, including Server Management, Explain Data, and Data Catalog. Data Catalog (a part of the Data Management Add-on) allows you to see what data is being used on your Tableau instance, how people are using it, what tables they’re using in conjunction with what, and all of the associated lineage. On top of that, it allows you to write alerts, notify data users of outages/delays, and predict the impact of deprecating individual data assets. All of these features have created a renewed interest in the Data Management add-on, which also includes Prep Conductor.

One of the new features released within Prep in 2019.3 is the ability to use Python/R within your Prep flows. Now my experience with Python is effectively 0, but there is a really easy and cool use case worth documenting. Tableau has long had the ability to connect to API-based data through Web Data Connectors or the Extract/Hyper API, but both of these remove you from the Tableau interface. Hosting Web Data Connectors can be a hassle and require extra admin work, and the Hyper API exists entirely outside of Tableau, giving you little visibility to when (or if) tasks finish. The Python Prep node requires only that you create a function which take a dataframe as an argument and returns a dataframe, and this means you can now create entirely (or partly) web-based data connections entirely in-flow. The steps are below.

  1. Create a function which takes a dataframe as an argument.
  2. In that function, ping the necessary API.
  3. Convert the return from that API into a dataframe.
  4. Define the schema of that new dataframe.
  5. Save your Python work as a .py file.
  6. Create a Prep flow with an input, a Script node, and an output.
  7. Publish it!
  1. Create a Python function. In a text editor of your choice, you’ll simply define a function. It must take a dataframe as an argument, as this will be the data passed from your Prep flow into the function.
def APIcall(df):

2. Ping the necessary API. In this case, I’m using a stock-ticker API from alphavantage. You should get your own API key, but mine is published here. This API call returns a dictionary of CRM (our new overlord) stock data history. For this, I’m using the requests library.

def APIcall(df):
    r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=CRM&apikey=UT5SGBK00NBXYLK1&outputsize=full")

3. Convert the return from that API call into a dataframe. To do this, I’m using a couple of pieces. I use the json library to convert the string response from the API into a dict, then use pandas to convert the dict to a dataframe.

def APIcall(df):
    r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=CRM&apikey=UT5SGBK00NBXYLK1&outputsize=full")
   dfWorking = pd.DataFrame();
    data = json.loads(r.text)
    data = data["Time Series (Daily)"]
    dfWorking = pd.DataFrame.from_dict(data, orient='index')
    return dfWorking

This returns all of my data, but my dates are being used as an index, not a column.

def APIcall(df):
    r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=CRM&apikey=UT5SGBK00NBXYLK1&outputsize=full")
    dfWorking = pd.DataFrame();
    data = json.loads(r.text)
    data = data["Time Series (Daily)"]
    dfWorking = pd.DataFrame.from_dict(data, orient='index')
    dfWorking['date'] = dfWorking.index
    return dfWorking

4. Define (and match) the schemas. The Tableau help article here shows how to define the schema that we’re returning. On top of that, though, we need to make sure that our dataframe has the appropriate types. Even though the stock prices look like decimals, the API returned them as strings. First, I recast those values as floats, then I define the schema of the dataframe I’ll send back to Tableau. Make sure you also import all of the necessary libraries.

import requests;
import pandas as pd;
import json;
def APIcall(df):
    r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=CRM&apikey=UT5SGBK00NBXYLK1&outputsize=full")
    dfWorking = pd.DataFrame();
    data = json.loads(r.text)
    data = data["Time Series (Daily)"]
    dfWorking = pd.DataFrame.from_dict(data, orient='index')
    dfWorking['date'] = dfWorking.index
    dfWorking = dfWorking.astype({'5. volume': 'int32'})
    recast = ['1. open', '2. high', '3. low', '4. close'];
    for f in recast:
        dfWorking = dfWorking.astype({f: 'double'})
    return dfWorking
def get_output_schema():
    return pd.DataFrame({
        'date' : prep_date(),
        '1. open' : prep_decimal(),
        '2. high' : prep_decimal(),
        '3. low' : prep_decimal(),
        '4. close' : prep_decimal(),
        '5. volume' : prep_decimal(),
    });

5. Creating a Prep flow. This part is easy. Simply open Prep and connect to data. Even though the data we return will be entirely API-based, Tableau requires that you connect to a set of data (and it has to have at least one row). In my case, I used Superstore. Turns out you really can demo anything using only Superstore. You’ll need a TabPy server set up, but the rest is easy. Simply connect to any dataset, run your newly-created Python script, and create an output on Server. Now schedule that to refresh and you’ll get API-based data with all of the monitoring Tableau has to offer!

So how does this really work? Tableau takes in a dataframe from your datasource, throws out that data, and replaces it with your new dataframe. What else can we do with this? All sorts of things. Now you’ve got your API-based data in a Prep flow. Want to union it to something? Run cross-db joins? Pivot it? Join it to a published datasource so you can correlate stock prices with the times your customers purchase? The world is your oyster. Of course, you can also make more complex scripts. For example, you could simply incorporate a for-loop into this script an return the data for any number of tickers that you want. To find the history of Tableau, for example, I need both the CRM and DATA tickers. I’ve created an array below which allows for an input of tickers and an output of a hyper file with all of the stock data for both companies.

import requests;
import pandas as pd;
import json;
def APIcall(df):
	tickers = ["NKE", "CRM"];
	dfWorking = pd.DataFrame();
	recast = ['1. open', '2. high', '3. low', '4. close'];
	for i in tickers:
		r = requests.get("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=" + i + "&apikey=UT5SGBK00NBXYLK1&outputsize=full")
		data = r.text
		data = json.loads(data)
		data = data["Time Series (Daily)"]
		newFrame = pd.DataFrame.from_dict(data,orient='index')
		newFrame['date'] = newFrame.index
		newFrame['ticker'] = i
		dfWorking = dfWorking.append(newFrame)
	for f in recast:
		dfWorking = dfWorking.astype({f: 'double'})
	dfWorking = dfWorking.astype({'5. volume': 'int32'})
	return dfWorking


def get_output_schema():
    return pd.DataFrame({
        'date' : prep_date(),
        '1. open' : prep_decimal(),
        '2. high' : prep_decimal(),
        '3. low' : prep_decimal(),
        '4. close' : prep_decimal(),
        '5. volume' : prep_decimal(),
        'ticker' : prep_string()
    });

Make a viz out of that dataset and see exactly how valuable Tableau was to Salesforce! Or set up your own script and automate other cool workflows. Want to send out your dataset as a CSV? Simply incorporate that into the Python script. Even though it has to take in and return a dataframe, that doesn’t mean all it can do is ETL. Have fun with it!

Publishing TDS Datasources Using Tableau Prep

Though Tableau originated as a visualization tool, it has added significant ETL processes over the last couple versions. With version 18.1 it added Tableau Prep and the ability to build ETL flows, and 19.1 added Prep Conductor, which comes with the ability to automate workflows to run on a schedule. One current limitation, however, is that Tableau Prep outputs a .hyper file, not a .tdsx file. What’s the difference here?

In Tableau, a .hyper file is a raw data file. It contains the results of the data from the datasources as well as any calculations which can be materialized at the individual row level (calculations like string manipulations, differences between two columns, etc.). Calculations which can’t be materialized on individual rows, however, aren’t stored in a .hyper file, but instead are saved in a .tds file (Tableau Datasource). This file contains the logic for level of detail calculations, aggregate calculations (such as ratios), and the username-based calculations often used for row level security. A .tdsx file is the combination of the raw data (.hyper file) and the associated logic (.tds file). Tableau Prep, however, doesn’t allow for the customization of .tds files. If you want to add aggregate calculations, you can do so in Desktop, but when Conductor runs your flow, it will overwrite your entire Datasource, replacing your .tds file with a generic one and losing all of your calculations in the process. Below is a walk-through of how to avoid that behavior.

Before we go any further, it’s worth noting that this workflow will probably be streamlined at some point, but that for now, this is the easiest way of allowing creating a Datasource with data from Prep and .tds-based logic.

  1. Create a Prep flow which outputs a .hyper file to a network-mapped location.
    1. In the Output step of your Prep flow, do not select “Publish as a data source”, but instead choose “Save to File”. You need to ensure that your Prep inputs and outputs are using UNC file paths, so it will continue to work when published to Server.
  2. Publish and schedule the flow.
    1. Simply publish your flow to Tableau Server. You’ll need to ensure that your Run As User has access to the file input/output locations as well as safelisting those file locations for Prep Conductor.
    2. Though we’ll tie this flow to a schedule, we won’t actually be relying on the schedule’s timing to run the flow. Therefore, you’ll want to make it a schedule that you don’t use for anything else and only runs very infrequently. I set mine to run monthly on a schedule named “PrepScriptSchedule”. The reason we need to tie it to a schedule (even though we aren’t relying on timing) is that tabcmd allows us to run a scheduled task.
  3. Open the output of the flow in Tableau Desktop.
  4. Create your Datasource modifications in Desktop (create calculations, Datasource filters, hierarchies)
  5. Publish the Datasource.
  6. Using tabcmd, refresh the .hyper file and publish it without overwriting the Datasource.
    1. If you’re not already using tabcmd, you’ll need to install it.
    2. Log in to the Server using tabcmd login.
    3. Run the Prep flow using tabcmd runschedule.
      1. Because we’re running a schedule (not executing a task on Tableau Server), we’ll need to build in a wait time for our script. This step has started the Prep Flow, but we’ll need to pause until it finishes creating the file.
    4. Pause the script until the flow is complete using SLEEP. This command takes an argument which is the number of seconds to pause your script. You should make sure that the number you input here is higher than the time your Prep Flow takes to run.
    5. Using the tabcmd publish command, point to the .hyper file output from the Prep flow and overwrite the Datasource in question. Use the –replace option to avoid overwriting the .tds, instead just overwriting the source data contained in the .hyper file.
tabcmd login -s https://<server-name> -u <username> -p <password> -t <siteName>
tabcmd runschedule "PrepScriptSchedule"
sleep 1000
tabcmd publish "\\network\filepath\prepoutput.hyper" -n <targetDatasource> --replace

It’s an easy script to run, and can be run on the schedule of your choice using any task scheduler (most likely Windows Task Scheduler or as a cron job). Using the above script we can create Tableau Datasources with Prep ETL, Desktop metadata, and Server security, and refresh it all on a schedule. Go forth and enjoy your complex data structures with complex governance tied in!

Filter Yourself: Using Reflexive Sheet Actions for Drilldowns or Disappearing Acts

Tableau’s dashboard actions allow easy interactivity between sheets. New users to Tableau are familiar with clicking the “Use as Filter” icon and seeing that everything they click just magically…works. More experienced users have likely gone into the Dashboard Actions menu and created their own Filter actions. Dashboard actions can easily filter across different fields, reference different datasources, or even take the user to a different dashboard. With Set Actions, Tableau has even enabled interactivity that didn’t previously exist. Set Actions make it easy for a sheet to even filter itself, adding and removing members to a set which is used as a filter.

The one thing that IS difficult in Tableau, however, is having sheets target themselves as filters. When creating a new dashboard action, you can set a sheet to target itself by setting it as both the Source and Target sheet, but it won’t change the behavior.

Even though the Map is both Source and Target, clicking a State doesn’t filter itself.

For the most part, we don’t want actions to target the source sheet. If the above action did select a single State, the top sheet would become useless (and create a confusing UI).

There are times, however, when this type of filtering makes sense. I’ve got an org chart dashboard which shows a selected employee, their manager, and everyone who works for them.

A reasonable follow-up question to this is “Who else works for Josh?”. I’d like to click on Josh McDaniels and filter the entire dashboard to show his manager, his employees, and his information. This requires having my Manager sheet filter itself when Josh is clicked on.

We’ve already shown, however, that selecting a sheet as Source and Target does nothing. Instead, the correct setup requires setting it as the source sheet, but changing the way we target it. Rather than selecting the Target as a sheet which is part of the dashboard, we target it as an individual sheet. We also need to target a different field…we’re clicking on a manager, but we want to treat him as an employee once the dashboard updates.

The target isn’t the “sheet as part of the dashboard”. Instead, choose the sheet as a standalone piece.

It’s important to set “Clearing the selection will:” to “Leave the Filter”. When a sheet filters itself, it first filters everything out. Because the sheet now returns an empty set on which nothing can be selected, Tableau (frustratingly) treats this as clearing the selection. We need the filter to persist through this phase of filtering.

So the combination of targeting a specific sheet and leaving the filter allows us to self-target with a dashboard action. This is immensely useful in any hierarchy, where we can easily click up or down through the hierarchy without ever having to reset the dashboard. Where else are self-actions useful? Hiding and showing a menu is another great use case.

Using the same org-chart dashboard example, we have to start by selecting an employee. Our end-user can log in, select a employee, and have the dashboard populate for that employee. From there, they can navigate to any point in the hierarchy by simply drilling up and down…but if they want to navigate to an employee in a totally different part of the company, they may have to go all the way to the top and then guess their way through upper management. Instead, we could provide them an interface to select an employee. Once they make a selection, the menu disappears. Again, we can accomplish this with a self-targeted action.

First, though, we’ll need two calculations. I call them [Dummy Dimension] and [Dummy Dimension 2]. [Dummy Dimension] is simply the integer 0, and [Dummy Dimension 2] is simply the integer 1. Now I set up a self-filter using the above methodology. To make the sheet hide itself, though, we have to set up a filter condition which returns an empty set.

Clicking on this sheet will now create an action which finds rows where [Dummy Dimension] = [Dummy Dimension 2]. As far as I’ve ever found, 1<>0, so the whole sheet is hidden.

Pairing this with other actions allows this click to hide the “menu” sheet as well as driving interactivity on the rest of the dashboard. Of course, you may need to get your menu back at some point. When you click on this action, it grabs the value 0 from [Dummy Dimension], but it only sets the filter on the field [Dummy Dimension 2]. The resulting filter is the equivalent of filtering [Dummy Dimension 2] for the value 1. Any filter which filters on [Dummy Dimension 2] by filtering for 1 instead will therefore replace this filter…re-showing the sheet.

Overall, you can use dashboard actions to create all kinds of magic filtering on a single dashboard. Whether it’s a sheet hiding itself, bringing that sheet back, or using actions to create in-place drill-up and drill-down through hierarchies, Dashboard Actions are far more flexible than the one-click option of “Use As Filter”. The dashboard I linked at the beginning of this post uses 12 different dashboard actions…across only 5 sheets. Can you do it in fewer?

Devil in the Details: Understanding Level of Detail Equations and Filtering

Tableau introduced level of detail (LODs) equations with Tableau 9.0, and with it came a huge number of new use cases.  LODs are an easy alternative to table calculations when looking for two-pass aggregations.  They don’t depend (much) on your viz layout, making them more easily reusable than table calcs.  One of the coolest pieces of functionality, however, is the flexibility provided with filters.  The addition of FIXED calculations gave Context filters additional functionality by adding something between Context filters and Dimension filters in the Tableau Order of Operations.

In short, the additional functionality could be summarized by two sentences.

  1. FIXED calculations compute before filters.
  2. Context filters still impact FIXED equations.

For most use cases, this holds true:  FIXED equations gave us a new layer of computation between filter levels.  If you play around with filters for long enough, however, you’ll note that there are times this isn’t true.  Take the below example.  I’ve written three calcs.

  1. SUM([Sales])
  2. {FIXED [Region] : SUM([Sales])}
  3. {FIXED : SUM([Sales])}

With nothing else on the viz, these all return the same results, as expected.  Filtering on State, however, returns surprising results.

When we filter to just one State (Alabama), our calculations return 3 different results.  Have we been lied to?!

Even though Dimension filters come after FIXED expressions in the order of operations, we can see that a Dimension filter has impacted the results of a FIXED equation…but not both FIXED equations.  What’s happened here?  The short answer is that our filter filtered out the entirety of a Region, and we have an equation FIXED on Region.  The longer answer requires understanding how LODs work and a look at the SQL issued by FIXED calculations.

At its core, Tableau takes your drag-and-drop commands and turns them into SQL.  There are tons of great resources of Tableau for the SQL writer available, so we’ll look specifically at the SQL issued by FIXED commands here.

When we drag a normal metric to a visualization, Tableau issues a query which is SELECT SUM(Metric) GROUP BY <list of dimensions on viz>.  The combination of dimensions on the visualization is what we call the “visualization level of detail” and determines the granularity of displayed data.  Declaring dimensions in an LOD allows us to create a different level of detail for the calculation itself.  At the end of the day, of course, the LOD is displayed on a chart, so we need to reconcile the viz level of detail with the calculation level of detail.  In this case, our Region calc is at a more granular level than our viz, so Tableau aggregates up for us.  Looking at the SQL just for that calculation, we see the below.  (I’ve cleaned up the SQL just because Tableau has some funny re-aliasing that goes on.  This isn’t verbatim what Tableau issues, but it’s the same query.)

SELECT SUM(t1.LODCalc)
FROM (
   SELECT Region AS Region,
   SUM(Sales) AS LODCalc FROM Orders 
   GROUP BY Region
) t1

Tableau first selects our Sales per Region, then aggregates them all together.  Because SUM is an additive metric, it returns the same as simply doing SUM(Sales).  Now let’s look at the query when we add SUM(Sales) to the viz.

SELECT t2.LODCalc , t0.Sales
FROM (
   SELECT SUM(Sales) AS Sales
   FROM Orders
) t0
CROSS JOIN (
   SELECT SUM(t1.Metric) AS LODCalc
   FROM (
      SELECT Region, SUM(Sales) AS Metric
      FROM Orders
      GROUP BY Region
   ) t1
) t2

Again, let’s walk through it step by step.  First, Tableau creates a table of sales by region (and names it t1).  Because our viz is at the full-table level, we don’t need this data disaggregated at all, so Tableau aggregates that up to a single row (and names it t2).  Then, Tableau cross-joins that to the SUM(Sales) total.  It results in a one-row table.

Now let’s look at it as a real use case.  I want to know the average sales in the Region for each State.  It’s simply {FIXED [Region} : AVG([Sales])}.

SELECT State, SUM(t1.RegionalAvg) 
FROM (
   SELECT Region, State
   FROM Orders
   GROUP BY Region, State
) t0
 INNER JOIN ( 
    SELECT Region, AVG(Sales) AS RegionalAvg
    FROM Orders 
    GROUP BY Region
) t1 ON t1.Region = t0.Region
GROUP BY State

This is where the important step first surfaces.  When we use the LOD on a viz, we inner join the LOD portion back to the original data set on the dimension which we FIXED to.  It’s the same step-by-step process as the SQL statements before.

  1. Create a table of AVG(Sales) per Region
  2. Join it back to our other data (in this case, just State name) on the appropriate dimension (in this case, Region)

In this case, we’re joining back on Region = Region.  If we filter down to a specific state, however, the join won’t return the data from any other Regions.  Our INNER JOIN on Region = Region now only returns data from the specific State we filtered to.  In this case, though, we only needed data from the South, because that’s all that’s being displayed on our viz.

If we weren’t looking at it at the State level, however, Tableau still loses the data.  Let’s go back to the original chart with 3 bars.  This time, I’ll filter just to Alabama.

The FIXED SUM(Sales) bar still returns our total sales.  SUM(Sales) returns just the data from Alabama, as we expect.  The FIXED Region calc, however, returns sales from the entire region that Alabama is a part of.  Let’s look at the query for SUM(Sales) and FIXED Region together.

SELECT t3.RolledUp, t0.Sales 
FROM(
  SELECT SUM(Sales) AS Sales
  FROM Orders
  WHERE State = 'Alabama'
) t0
   CROSS JOIN ( 
    SELECT SUM(t2.RegionalTotal) AS RolledUp
    FROM(
     SELECT Region
     FROM Orders
     WHERE State = 'Alabama'
     GROUP BY Region
   ) t1
  INNER JOIN (
  SELECT Region, SUM(Sales) AS RegionalTotal
  FROM Orders
  GROUP BY Region
) t2 ON t1.Region = t2.Region
) t3

Again, we can walk through this step by step.  First, Tableau builds a table of Sales by Region (t2).  Next, it joins that on to a list of our available states (t1) on Region = Region.  In this case, we only have one state (Alabama) and therefore only one region (South).  This is where the dimension filter impacts our LOD.  We’ve computed the sales for each region…but when we start to bring it back to the viz, the data gets excluded by a simple dimension filter.

Circling back to the original point…is it fair to say that a FIXED equation isn’t impacted by a normal dimension filter?  For most use cases, yes it is.  If, however, we filter out (whether explicitly or coincidentally) an entire member of the dimension we’ve fixed to, Tableau won’t have any way to bring that data back.  The whole concept is built on INNER JOINS, so we need the dimension to be present in our resulting data.  Relying on FIXED equations to cheat your dimension filters is great as long as you’ve vetted your use case, but it isn’t a bullet-proof approach to LOD filtering.

End-of-Period Totals

Totals in Tableau have the ability to aggregate your data a number of different ways, and it takes care of 90% of the use cases for totals.  Often, however, people want to look at end-of-period totals.  Imagine you run a store and count inventory at the end of every week.  You may want to build a crosstab that shows inventory at the end of each week, but also add totals for the month, quarter, and year level.  The yearly total obviously shouldn’t show the summation of every week for the year…that doesn’t help you figure out what you have on-hand.  Don’t worry, its easy possible in Tableau!

See how the last week of each month matches the total below it.  In the yellow box, we see the last week of the quarter matches the monthly AND quarterly totals.

The end-of-period total is useful for inventory managers, any type of finance (end of day balances), and anybody tracking non-additive time series metrics.  Before we get to the solution, it’s important to understand how totals work.  Once you’ve got that down, the rest is just figuring out logic and syntax.

The logic is simple.  If we’re looking at a single cell (the granularity of your data), just return that data.  If we’re looking at any total, only return the data from the most recent date.  Normally, we’d cohort out the most recent date using something like the below.

RUNNING_MAX(ATTR([Date]))

We’d then need to compare that date to a row-level date, though.  There’s our problem.  If you try the below calc, you’ll end up with an error for comparing row-level data to aggregate data.

IF RUNNING_MAX(ATTR([Date])) = [Date] THEN [Inventory] ELSE 0 END

Table calcs will always return aggregates.  What we need to do is find a way to return the running-max date as a row level result so we can use it in comparisons.  To do this, we can use EXCLUDE or FIXED.  Normally, I use FIXED equations when given the option, simply because they’re less viz-dependent.  You can build one calc and use it on 15 different sheets, almost regardless of their dimensionality.  In this case, however, we’re custom-building a calc for the sheet anyway.  FIXED has funnier behavior with filters, so in this case I’ll use EXCLUDE.

The first thing we need to do is ensure that we return the appropriate value for individual cells.

IF COUNTD([Week-ID]) = 1 THEN SUM(Inventory) 
END

Next we need to take into account monthly totals.

IF COUNTD([Week-ID]) = 1 THEN SUM(Inventory)
   ELSEIF COUNTD(DATEPART('month',[Date])) = 1 THEN 
       SUM(IF [Week-ID] = {EXCLUDE [Week-ID] : MAX([Week-ID])} THEN [Inventory] END)


END

Looking at the above calc, we accomplish a couple of things in sequence.  First, we need to identify the monthly total.  We know all single cells were caught by line 1 of the calculation, so anything else will be a total.  We specifically want to identify a monthly total, though.  Simply checking if COUNTD of months = 1 will ensure we’re looking at the monthly total.

Next, we need to isolate the most recent week.  We do this using our EXCLUDE function.  Our viz is calculating the the level of year, quarter, month, and week.  We need to find the highest week-id of the month.  To calculate at that level, we simply need to exclude week-id from the calculation.

Now we’ve identified the highest week value, we need to find the inventory value associated with it.  This is where our if-then logic comes in.  Only one week will match the highest week-id, so IF <Exclude function returning max week id> = [Week-ID] THEN [Inventory] END returns the inventory associated with that week.

Month totals are now computing correctly, but quarter totals are wrong.

Looking at the above image, you see we’ve successfully fixed some of our totals!  There’s still some to go, however.  The quarterly total is currently displaying as the sum of our monthly totals…and this makes sense.  Remember, our EXCLUDE statement returned data at the level of year-quarter-month, excluding week.  When Tableau queries for this total, it is returned 3 numbers (one for each month) and then sums them together.  To get our quarterly numbers to compute correctly, we need to add the below into our calculation.

ELSEIF COUNTD(DATEPART('quarter',[Date])) = 1 THEN
    SUM(  IF [Week-ID] = { EXCLUDE [Week-ID], DATEPART('month',[Date]) : MAX([Week-ID])} THEN [Inventory] END)

This takes care of our quarterly totals, but our yearly totals will still mis-compute.  You’ll need to add one of these clauses for each level of totals that you want to display, and add them sequentially from most granular to most aggregate.  In the end, your calc should look like the below.

IF COUNTD([Week-ID]) = 1 THEN SUM([Inventory])
ELSEIF COUNTD(MONTH([Date])) = 1 THEN 
  SUM(  IF [Week-ID] = { EXCLUDE [Week-ID] : MAX([Week-ID])} THEN [Inventory] END)
ELSEIF COUNTD(DATEPART('quarter',[Date])) = 1 THEN
SUM(  IF [Week-ID] = { EXCLUDE [Week-ID], DATEPART('month',[Date]) : MAX([Week-ID])} THEN [Inventory] END)
ELSEIF COUNTD(YEAR([Date])) = 1 THEN
SUM(  IF [Week-ID] = { EXCLUDE [Week-ID], DATEPART('month',[Date]), DATEPART('quarter',[Date]) : MAX([Week-ID])} THEN [Inventory] END)
ELSE
SUM(  IF [Week-ID] = { EXCLUDE [Week-ID], DATEPART('month',[Date]), DATEPART('quarter',[Date]),DATEPART('year',[Date]) : MAX([Week-ID])} THEN [Inventory] END)
END

There you have it.  This calc returns your most recent value as your total, instead of accepting Tableau’s defaults.  Because we used EXCLUDE, it will respond to all of your filters, regardless of it they’re on context or not.  I’ve found this solution to hold up fairly well in production, but keep in mind that if you’re running this against billions of rows of data, it may slow down.  Most end-of-day datasets don’t grow to that size (there’s only been about 10,000 days since the world wide web became publicly available), so plug this in and let me know how it works!

Custom Totals in Tableau

Tableau has built-in functionality to compute totals for almost any visualization.  Simply drag Totals onto your viz, and Tableau will add a totals portion to the end of your viz.  At first glance, you might think these totals were all computed similarly to table calculations – Tableau looks over the results of the query and computes based off of that.  It’s a simple two-pass aggregation and Tableau can do it all in memory.  If you tinker around a little more, however, you’ll find that that’s not the case.  Take the below example.

avgofavgs

We’re looking at AVG([Sales]) in three separate categories, and it returns 349.7, 119.3, and 452.7. The reference line (an average line) returns 307.3.

Now let’s ask (seemingly) the same question a different way by removing the reference line and adding totals.

avgoverall

The total returns a different value… 229.8.What’s happening here?

Well, if we sum up the three values (349.7 + 119.3 + 462.7 = 921.7) and divide by 3, we get 307.3.  The reference line is returning an average of averages (and somewhere, a statistician rolls over in his grave).  The total bar, however, returns 229.8.  Removing Category from the Rows shelf shows us that Tableau is returning the overall row-level average of sales.

onebaraverage

What this proves to us is that Tableau is actually going back to the database and sending a separate query for the totals.  What this means for us is that we can cheat totals.  If we know that Tableau sends a query back for totals, we could create a calculation which returns different results for the totals bar.

The first thing we need to do is find a way to flag the totals bar in the query.  Because my above visualization shows one bar per category, we know that COUNTD([Category]) will return 1.  The totals bar, however, will return 3.  Putting the logic into a calculation allows us to return…well…anything as our total.

Sales looking bad?  Try the below!

IF COUNTD([Category]) = 1 THEN SUM([Sales]) ELSE MIN(10000000000) END

On a more practical side, however, it allows us to customize the behavior to what we need.  Maybe you want each individual bar to display the average sales, but the grand total bar to display the total sales.

IF COUNTD([Category]) = 1 THEN AVG([Sales]) ELSE SUM([Sales]) END

Any (aggregate) business logic you want can now easily be built in there.  As with any Tableau hack, of course, it leads to a couple more hacks as well, including custom waterfall charts and custom end-of-day totals.