How to Group and Summarize Data in Alteryx

Data is grouped and aggregated in Alteryx using the Summarize Tool. It supports a number of different aggregation methods which vary based on the data types of the data going into the tool. For large data sets connecting to SQL it can be more efficient to write group by statements directly in a SQL Query instead of processing groups directly in Alteryx.

Data can be grouped, summarized and aggregated using an Alteryx Summarize Tool

Grouping is done to combine similar values together, often as a hierarchy going from the highest-level group to the lowest level group. While this could be done with a SQL statement when importing data into Alteryx it can also be accomplished using a Summarize tool.

Let’s dive in.

Alteryx Data Types and Grouping

It’s important to start by understanding the difference between Alteryx data types as they determine the types of grouping and aggregations that are available. In general, only numeric values will be able to have math functions applied, sum, average, median, mode. Text data types can be grouped or concatenated, and dates can be calculated with min, max.

These are just some of the examples, and there are overlap such as being able to count multiple data types. However, when you send data into a Summarize tool for grouping the options available will be driven by the data type of each field prior to going into the Summarize tool.

To avoid errors and to ensure that you have the correct grouping or aggregation option available, you may need to update the data type with a Select Tool prior to going into a Summarize Tool.

How to Group with the Alteryx Summarize Tool

To group data in Alteryx, add a Summarize tool from the Transform section of the Alteryx Toolbar to your workflow. Select a field or multiple fields from the top of the configuration panel, click the “Add” button in the middle and choose an aggregation method to move it to the bottom of the panel.

Fields are grouped in the order of top to bottom in the Summarize tool which converts to left to right in workflow results. The aggregation methods available will depend on the data type of each field going into the Summarize Tool.

The Summarize Tool is available in the Transform Section of the Alteryx Toolbar. Tools are organized in alphabetical order from left to right.

The Summarize Tool is available in the Transform section of the Alteryx Toolbar

The Summarize tool is configured by moving available fields from the top of the configuration panel to the bottom of the configuration panel by adding actions. Select a field, or multi-select several fields then click on “Add” and choose an aggregation method.

Example of a Summarize tool added to a workflow

A SQL Group function is re-created with the Group By action. However, you can also use different aggregation methods similar to how you would configure a Pivot Table in Excel.

Adding fields from the top of the Summarize available field list to the group by list uusing aggregation methods

Once configured your field or list of fields will be displayed at the bottom of the configuration panel. When multiple fields are present it will create a grouping hierarchy, where the field at the top of the list will be the first tier and each subsequent one will be nested below it in your workflow results.

Example of a configured summarize tool

Be aware that you can add multiple columns to a Summarize Tool, and you can also use the same column multiple times with different aggregation methods. For example, at times you may want to have the Sum of values and a Count of values using the same column. The columns used in the Summarize tool do not need to be unique.

The example below shows the results of a Summarize tool. Any un-selected column is removed from the data set after the Summarize tool which will group data, remove duplicates, and aggregate numbers depending on how the tool has been configured.

Example of data before and after being grouped in Alteryx

Removing duplicate values is a common use case for the Summarize tool but because it limits the number of columns moving forward into a data set there are other methods which may be more suitable for removing duplicate values.

Grouping by Multiple Criteria in Alteryx

To group by multiple fields in Alteryx, use a Summarize tool from the Transform section to group data. Use the “Add” button to add multiple levels of grouping and data will be aggregated from top to bottom in the list the corresponds to left to right in the workflow results.

The example below shows 3 fields added to a Summarize tool, and the results of the aggregated data in the workflow results.

Example of grouping and aggregating by multiple fields in Alteryx

Re-naming and Re-Ordering Grouping

When adding numeric fields and applying aggregations, Alteryx will automatically add a prefix to the column, such as Sum_ to the column name. This is default behavior, but you can click into the “Output Field” name, highlight the text and rename it prior to the column leaving the Summarize Tool.

The buttons on the right side of the bottom of the Summarize configuration will allow you to move column order up and down which also changes the grouping order of the output.

How to change the order of columns or remove columns from an Alteryx Summarize tool

The ( – ) circle with a minus sign lets you remove or delete fields from your grouping.

Troubleshooting Summarize Options Not Available

If an aggregation method in an Alteryx Summarize tool is greyed out and unavailable, check the data type of the field being added to the summarize actions. String fields cannot have numeric actions applied to them, and will be greyed out to prevent workflow run errors.

The Sum option not available using an Alteryx Summarize tool due to being a string data type

Even if the values in a field are numbers, the data type will still determine which functions are available. Add a Select tool prior to the Summarize tool to adjust data types to make the required aggregation method available.

Processing in SQL vs Alteryx

While Alteryx allows you to Group data using the Summarize tool similar to how you would Group data in SQL there is a key difference. In SQL processing is offloaded to the SQL server, but processing in Alteryx will transfer all data to the machine that has Alteryx Designer for local processing causing a significant difference on performance.

When working with large data sets that are SQL based, it’s recommended to Group the data using a SQL query prior to importing the data into Alteryx Designer to reduce the time spent on data transfer and processing.

SQL statements can be modified in an Alteryx Input Tool prior to importing data from a SQL database.

Conclusion

Alteryx users can utilize the Summarize Tool in the Transform section of the Alteryx Toolbar to Group data similar to how they would use a Group By statement in SQL. The benefit of Alteryx is that the process is no code. You can group or aggregate data using multiple aggregation methods, which are determined by data type.

When working with large data sets and connections to SQL servers, we recommend considering writing the Group By statement directly into the SQL Query to avoid large amounts of data having to be transferred into the machine running Alteryx Designer. While Alteryx is highly efficient and transforming data, it may not be as efficient as executing queries inside of a SQL server.

Scroll to Top