Do you ever get frustrated by how slow an Alteryx workflow is to run? Ever wonder how do you improve Alteryx processing speed? Or, Why does my Alteryx Workflow run so slow?
Follow these steps to improve Alteryx Workflow Processing Speed.
What is Alteryx?
Alteryx is a business intelligence and ETL tool that’s widely used for data integration, prepping and blending data, and data analysis. It has a wide range of features, and is one of the more mature business intelligence tools on the market today. Processes are built using workflows that ingest data from a range of sources, Excel, SQL, Flat Files, or anything that has an available API. Learn more about the Alteryx Analytics Platform
Alteryx provides some information on the topic of workflow optimization, but it’s fairly basic.
How do You Improve Alteryx Performance?
The following list includes various methods to configure Alteryx, ways to setup workflows, and walks through some IT infrastructure considerations that may improve Alteryx processing speed. First, it’s important to understand a little bit about how Alteryx works.
How does Alteryx Process Data?
It’s important to note that Alteryx Desktop and Alteryx Server are not Cloud Software solutions. Alteryx Designer and Server pull data in from .xlsx or other files saved locally on a computer, a network drive, or cloud storage such as Google Drive, or Microsoft SharePoint.
Alteryx can connect to SQL databases, data warehouses or data lakes but the information is pulled into wherever the Alteryx Designer or Server is installed.
Alteryx (AYX) recently released the Alteryx Analytics Cloud Platform which in its current form is the Trifacta back-end with a simpler Alteryx user interface on top. Alteryx purchased Trifacta in Q1 of 2022. In its current state, Alteryx Cloud Platform has very few features compared to Alteryx Designer and will likely take a while for them to build out the functionality of traditional Alteryx Designer.
1. Enable Performance Profiling in Alteryx
Alteryx Performance Profiling is enabled by clicking on the Canvas (White Space) in a workflow, and going to Runtime Settings in the Configuration Panel.
Performance profiling will track how long it takes each Tool in your Workflow to process. The results are presented in the Workflow Results pane at the bottom of your workflow. This is the most important step in beginning the workflow optimization process as there are many ways to achieve the same result in Alteryx. Some processes are slow than others, and you can re-design your workflow with performance in mind. For example, if a Data Cleanse tool takes a long time to process a trim, you might try a Formula Tool or Multi-Field Tool to perform the same task.
2. Avoid Using Processor Intensive Tools
You may find some tools that commonly take up a large amount of processing time as a percentage of the total. Not all of these tools can be avoided, or the amount of time to work around them would not be worth it. These are a few common culprits of poor performance.
- Sort Tool
- Cross Tab and Transpose Tools
- Summarize Tool
- Join Tool
- Data Cleanse
- Browse Tools
These tools typically require a lot of computational power, or require Alteryx to render the entire dataset in its calculation engine. Sorting is not a complex task, but it does have to know what all of the values in a field to sort it accurately.
Here are some ways to avoid some of these tools:
- Eliminate duplicate tools. You may not need to run a Sort until late in your workflow but could have it in multiple locations.
- Reduce Join Complexity. If your workflow is joining on multiple criteria, you can try to reduce the number of fields in the join. You could also try consolidating multiple join tools with a Multi-Join Tool.
- Join on Integers instead of Strings. Try changing the primary key that join tools are based on. Joining on integers could be faster than doubles or strings. This technique could require some creativity like assigning a row ID prior to joining your datasets.
- Summarize Data Early in a Workflow. Try to summarize data early on. It will decrease the amount of data feeding into tools that are typically slower, such as a join or data cleanse tool. Formulas will have less rows to calculate on and will allow Alteryx to process more information in memory.
- Avoid the Data Cleanse Tool. Most common functions of the Data Cleanse Tool can be performed using a Formula Tool, such as a Trim formula. A Multi-Field Formula can be used to apply the same formula to multiple columns at the same time. You can even change the data type of columns using Multi-Field. (The Data Cleanse Tool is a giant Macro, you can open it from the toolbar by right clicking on it and going to open macro)
- Disable All Browse Tools. Browse Tools will render your full dataset and are incredibly resource intensive. Turn them off or remove them from your workflow prior to running or publishing to Alteryx Server.
All Browse Tools in a Workflow can be disabled under the Runtime Settings
3. Cache and Run an Alteryx Workflow
Cache and Run a workflow while developing it. Alteryx will store the data locally, so it won’t have to load it every time. This is most useful when working with large datasets and large workflows. It can drop runtime from minutes to seconds.
After you Cache and Run a workflow the tools up to the point that it’s cached will show up in bubbles. Data in this part of the workflow will not update in real time. If you need to update the data, right click on a tool in a bubble and clear the cache.
4. Save Large Datasets to a .YXDB File
If the data you are working with is coming from a database that takes a long time to query, try bringing the data into a workflow and output it as a .yxdb file. This is the Alteryx database file format. It’s very efficient and can be read by other workflows. Use the new file as an input to your larger workflow and replace it with a live connection when you’re done developing.
5. Limit the Amount of Data in a Workflow Early On
The less data that Alteryx is processing, the faster it will process. Use the Select Tool to drop unnecessary columns. Use the Auto-Field tool to automatically detect the smallest possible data type. In this case, data size refers to the amount of memory that the data takes up. You can find more on data types here: Data Types | Alteryx Help
6. Process as Much Data as Possible in Memory
Computers use two types of storage. Super-fast memory, and not as fast Disk Storage. Alteryx will process as much as it can in Memory then store any overflowing data to storage as a temp file. You want to use as much memory as possible. Here are a few ways how:
Override Default Memory Settings in Alteryx Designer
Go to: Options > User Settings > Edit User Settings from Alteryx Designer, and you can manually bump up the amount of Maximum Memory that will be used for processing. By default Alteryx limits the amount of resources to 1/4 of physical memory that a machine has.
Note: This method could cause some stability issues if too much memory is used by a workflow and there are not enough resources to keep Windows or other software running.
Increase the Amount of Available Memory
Some users will consistently struggle with slow running workflows on their computers. Don’t forget to check the system specifications. Make sure that you go above beyond the minimum required specifications. While you can run Alteryx Designer with 8 gigs of ram, you will not be happy with the performance.
7. Run Workflows on Server or a Virtual Machine
If a workflow is slow on your laptop or desktop computer. Try publishing it to Alteryx Server. Typically, Servers have much more powerful hardware and can run workflows a lot faster than the machines that they are designed on.
Try running Alteryx on a Virtual Machine
This method takes a bit of setup and can require help from your IT department. Most Cloud Service Providers such as Microsoft Azure allow you to rent a computer in one of their datacenters.
Virtual Machines can be rented by the month or by the Second. If you’re not running workflows 24/7, then you might be able to run them on much more capable hardware during work hours or only when you’re actively designing a workflow. It can be a chore to setup some of the additional infrastructure in the cloud, like establishing connections to data warehouses or mapping cloud storage separate from cloud resources. Here is an example of the specs and hourly pricing of one of the Azure Tiers available.
Note: If you have access to Alteryx Server and a workflow runs slow on your Desktop Designer, you could try scheduling it off peak hours like the middle of the night and let it run while you’re not at work.
8. Use the Alteryx AMP Engine
AMP is the new standard for running Alteryx Workflows. If you’re on an older version of Alteryx Designer, it may not be enabled by default. AMP is a multi-threaded processing method that can significantly reduce processing time. The one caveat is that it can also re-order your data compared to what you’re used to seeing. Don’t panic, turn on AMP Engine Compatibility Mode and it will use traditional processing for some tools which still giving you the speed benefit. Turn on the Alteryx Amp Engine under Canvas > Runtime Settings.
9. Use In-Database Tools and Dynamic Input to Limit SQL Queries
Ingesting data from a database can be very time consuming when working with large datasets. If your SQL data source supports it, you can use In-Database Tools. This sub-section of Alteryx Tools use a combination of SQL queries and temp tables to improve processing speed. They also push processing from your computer back to the SQL Server, which will typically be running on more powerful hardware.
Limit Your SQL Queries during Input or use Dynamic Input
The Alteryx Input Tool allows you to use custom SQL queries as a starting point for your workflow. By using a WHERE clause, you can reduce the amount of data. Try brining in a couple hundred rows of your data to determine which columns you want to bring in. Then go back to the starting query and filter it for only those columns. Data could also be filtered on specific date ranges, and then opened up to a wider date range when your workflow is ready for production.
Use the Dynamic Input Tool to Dynamically Update your Where Clause
If date ranges, or other parts of your SQL query need to be dynamic. The Dynamic Input Tool can be used to pass these parameters back into the SQL Query. You could also use a separate workflow to first determine the filters that are needed, or you can try to embed it into the same workflow. The process sequencing may be tricky, as Alteryx will try and pull in all data sources at the same time instead of in sequence. To get around this you can try to chain workflows, encompass parts of a workflow in a container to turn on and off, or try Crew Macros
10. Don’t use Alteryx for all Use Cases
This may seem blasphemous in an article about speeding up Alteryx Workflows, but it’s important to note that Alteryx is not a cloud first platform. There is a reason that they purchased Trifacta which is used as a cloud integration tool.
There are other tools purpose built to work with big data sets. Snowflake, Microsoft Azure Synapse, and Google Big Query all have built-in data integration tools as part of their platforms. It may be faster to prepare data closer to the dataset, summarize it then send it to Alteryx for analytical analysis or to further prep for downstream business intelligence functions.
We expect that this will become less of a problem as Alteryx Cloud matures as a product, but it’s unclear how Alteryx Desktop, Servier in their current state will work with Alteryx Cloud if it all.
How can I use SQL if I don’t know SQL?
Chat GPT developed by OpenAI is one of our favorite new technologies. It’s a natural language processing AI that can take written questions and give you useful answers. Chat GPT was trained on a very large dataset that includes information about many common programming languages.
What are Some More Alteryx Tips and Tricks?
Alteryx can be a fantastic platform for analytics automation and data analytics. It can also take some time to learn the tool and get up to speed. Even with years of experience there seems to always be a new time saving feature to discover. We compiled a list of our favorite Alteryx Tips and Ticks so you can save time and get up to speed fast!