Though Tableau originated as a visualization tool, it has added significant ETL processes over the last couple versions. With version 18.1 it added Tableau Prep and the ability to build ETL flows, and 19.1 added Prep Conductor, which comes with the ability to automate workflows to run on a schedule. One current limitation, however, is that Tableau Prep outputs a .hyper file, not a .tdsx file. What’s the difference here?
In Tableau, a .hyper file is a raw data file. It contains the results of the data from the datasources as well as any calculations which can be materialized at the individual row level (calculations like string manipulations, differences between two columns, etc.). Calculations which can’t be materialized on individual rows, however, aren’t stored in a .hyper file, but instead are saved in a .tds file (Tableau Datasource). This file contains the logic for level of detail calculations, aggregate calculations (such as ratios), and the username-based calculations often used for row level security. A .tdsx file is the combination of the raw data (.hyper file) and the associated logic (.tds file). Tableau Prep, however, doesn’t allow for the customization of .tds files. If you want to add aggregate calculations, you can do so in Desktop, but when Conductor runs your flow, it will overwrite your entire Datasource, replacing your .tds file with a generic one and losing all of your calculations in the process. Below is a walk-through of how to avoid that behavior.
Before we go any further, it’s worth noting that this workflow will probably be streamlined at some point, but that for now, this is the easiest way of allowing creating a Datasource with data from Prep and .tds-based logic.
- Create a Prep flow which outputs a .hyper file to a network-mapped location.
- In the Output step of your Prep flow, do not select “Publish as a data source”, but instead choose “Save to File”. You need to ensure that your Prep inputs and outputs are using UNC file paths, so it will continue to work when published to Server.
- Publish and schedule the flow.
- Simply publish your flow to Tableau Server. You’ll need to ensure that your Run As User has access to the file input/output locations as well as safelisting those file locations for Prep Conductor.
- Though we’ll tie this flow to a schedule, we won’t actually be relying on the schedule’s timing to run the flow. Therefore, you’ll want to make it a schedule that you don’t use for anything else and only runs very infrequently. I set mine to run monthly on a schedule named “PrepScriptSchedule”. The reason we need to tie it to a schedule (even though we aren’t relying on timing) is that tabcmd allows us to run a scheduled task.
- Open the output of the flow in Tableau Desktop.
- Create your Datasource modifications in Desktop (create calculations, Datasource filters, hierarchies)
- Publish the Datasource.
- Using tabcmd, refresh the .hyper file and publish it without overwriting the Datasource.
- If you’re not already using tabcmd, you’ll need to install it.
- Log in to the Server using tabcmd login.
- Run the Prep flow using tabcmd runschedule.
- Because we’re running a schedule (not executing a task on Tableau Server), we’ll need to build in a wait time for our script. This step has started the Prep Flow, but we’ll need to pause until it finishes creating the file.
- Pause the script until the flow is complete using SLEEP. This command takes an argument which is the number of seconds to pause your script. You should make sure that the number you input here is higher than the time your Prep Flow takes to run.
- Using the tabcmd publish command, point to the .hyper file output from the Prep flow and overwrite the Datasource in question. Use the –replace option to avoid overwriting the .tds, instead just overwriting the source data contained in the .hyper file.
tabcmd login -s https://<server-name> -u <username> -p <password> -t <siteName> tabcmd runschedule "PrepScriptSchedule" sleep 1000 tabcmd publish "\\network\filepath\prepoutput.hyper" -n <targetDatasource> --replace
It’s an easy script to run, and can be run on the schedule of your choice using any task scheduler (most likely Windows Task Scheduler or as a cron job). Using the above script we can create Tableau Datasources with Prep ETL, Desktop metadata, and Server security, and refresh it all on a schedule. Go forth and enjoy your complex data structures with complex governance tied in!