Azure Data Factory: Transforming Data with Wrangling Data Flows

This blog will teach you how to use wrangling data flows in Azure Data Factory to transform data with a spreadsheet-like interface and Power Query M scripts.

1. Introduction

Data transformation is the process of converting data from one format or structure to another, according to certain rules or logic. Data transformation is essential for data analysis, data integration, data quality, and data visualization.

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows for moving and transforming data at scale. Azure Data Factory supports various types of data transformation activities, such as mapping data flows, wrangling data flows, SQL scripts, Databricks notebooks, and Azure Functions.

In this blog, you will learn how to use wrangling data flows in Azure Data Factory to transform data with a spreadsheet-like interface and Power Query M scripts. Wrangling data flows are a graphical and interactive way of transforming data without writing code. You can use wrangling data flows to perform tasks such as filtering, sorting, grouping, pivoting, merging, and splitting data. You can also write and debug Power Query M scripts to apply custom logic and transformations to your data. Power Query M is a powerful and expressive data manipulation language that is used in Excel, Power BI, and other Microsoft products.

You will also learn how to preview and validate your data transformation results before publishing and executing your wrangling data flow. Previewing and validating your data transformation results can help you ensure the accuracy and quality of your data, as well as troubleshoot any errors or issues in your wrangling data flow.

By the end of this blog, you will be able to create and run your own wrangling data flow in Azure Data Factory and transform data with ease and efficiency.

Are you ready to get started? Let’s dive in!

2. What is Wrangling Data Flow?

A wrangling data flow is a type of data transformation activity in Azure Data Factory that allows you to transform data with a graphical and interactive interface. Wrangling data flows are based on Power Query M, a data manipulation language that is used in Excel, Power BI, and other Microsoft products.

With wrangling data flows, you can perform various data transformation tasks, such as:

  • Filtering, sorting, and grouping data by columns or values
  • Pivoting and unpivoting data to change the shape of your data
  • Merging and appending data from multiple sources
  • Splitting and combining columns or values
  • Adding or removing columns or rows
  • Applying data type conversions, formatting, and validations
  • Extracting and parsing text, dates, numbers, and other data elements
  • Applying conditional logic, calculations, and aggregations

Wrangling data flows are designed to be easy and intuitive to use, without requiring any coding skills. You can use a spreadsheet-like interface to view and edit your data, and apply transformations by using the ribbon, the context menu, or the formula bar. You can also use the query editor to write and debug Power Query M scripts, if you prefer a more advanced and flexible way of transforming your data.

One of the main benefits of wrangling data flows is that you can preview and validate your data transformation results at any step of the process. You can see the impact of each transformation on your data, and compare the input and output data of each step. You can also validate your data quality and integrity by using the data profile and column statistics features.

Wrangling data flows are a powerful and convenient way of transforming data in Azure Data Factory. They can help you prepare and shape your data for further analysis, integration, or visualization.

How do you create a wrangling data flow in Azure Data Factory? Let’s find out in the next section!

3. How to Create a Wrangling Data Flow in Azure Data Factory

To create a wrangling data flow in Azure Data Factory, you need to follow these steps:

  1. Create a data factory resource in Azure portal, if you don’t have one already.
  2. Create a linked service to connect to your data source, such as Azure Blob Storage, Azure Data Lake Storage, or Azure SQL Database.
  3. Create a dataset to define the schema and format of your data source.
  4. Create a data flow activity and select the wrangling data flow option.
  5. Add a source transformation and select your dataset as the input.
  6. Add and configure other transformations to transform your data according to your needs.
  7. Add a sink transformation and select your destination dataset as the output.
  8. Preview and validate your data transformation results at each step.
  9. Publish and execute your wrangling data flow.

In the following sections, we will go through each step in more detail and show you how to use the wrangling data flow interface and features.

Are you ready to transform your data with wrangling data flows? Let’s begin!

4. How to Use the Spreadsheet-like Interface to Transform Data

Once you have created a wrangling data flow and added a source transformation, you can use the spreadsheet-like interface to view and edit your data, and apply transformations by using the ribbon, the context menu, or the formula bar.

The spreadsheet-like interface consists of three main components:

  • The data grid, which shows your data in a tabular format, with columns and rows. You can select, sort, filter, and resize columns, and edit values directly in the data grid.
  • The ribbon, which contains various buttons and menus for applying common transformations, such as filtering, grouping, pivoting, merging, splitting, and formatting. You can also access the query editor, the data profile, and the column statistics from the ribbon.
  • The formula bar, which allows you to enter and edit Power Query M expressions to apply custom transformations to your data. You can also use the formula bar to rename columns, change data types, and add comments.

To apply a transformation to your data, you can either use the ribbon, the context menu, or the formula bar. For example, if you want to filter your data by a column value, you can do one of the following:

  • Use the ribbon: Click on the Filter button in the Home tab, and select the column and the value you want to filter by.
  • Use the context menu: Right-click on the column header, and select Filter by Value, and choose the value you want to filter by.
  • Use the formula bar: Enter a Power Query M expression that filters your data by the column and the value, such as = Table.SelectRows(#"Source", each [ColumnName] = "Value"), where ColumnName and Value are the column name and the value you want to filter by, respectively.

Each time you apply a transformation to your data, a new step is added to the Applied Steps pane on the right side of the interface. You can view, edit, delete, or reorder the steps in the Applied Steps pane. You can also see the Power Query M script for each step by clicking on the Script button on the top right corner of the interface.

Using the spreadsheet-like interface, you can transform your data with ease and flexibility, without writing any code. You can also preview and validate your data transformation results at each step, as we will see in the next section.

How do you write and debug Power Query M scripts to apply custom transformations to your data? Let’s find out in the next section!

5. How to Write and Debug Power Query M Scripts

Power Query M is a data manipulation language that is used in Excel, Power BI, and other Microsoft products. Power Query M allows you to apply custom logic and transformations to your data, using functions, variables, operators, and expressions.

To write and debug Power Query M scripts in wrangling data flows, you can use the query editor, which is a text-based interface that shows the Power Query M script for each step of your data transformation. You can access the query editor by clicking on the Query Editor button in the Home tab of the ribbon.

The query editor consists of three main components:

  • The query list, which shows the list of queries in your wrangling data flow. You can select, rename, duplicate, or delete queries from the query list.
  • The formula bar, which allows you to enter and edit Power Query M expressions for each step of your data transformation. You can also use the formula bar to rename columns, change data types, and add comments.
  • The script editor, which shows the full Power Query M script for the selected query. You can edit, format, and validate the script in the script editor.

To write a Power Query M script, you can either use the formula bar or the script editor. For example, if you want to add a new column that calculates the average of two existing columns, you can do one of the following:

  • Use the formula bar: Enter a Power Query M expression that adds a new column with the average calculation, such as = Table.AddColumn(#"Previous Step", "Average", each ([Column1] + [Column2]) / 2), where Previous Step is the name of the previous step, and Column1 and Column2 are the names of the existing columns.
  • Use the script editor: Add a new line to the script with the same Power Query M expression, and indent it with four spaces.

To debug a Power Query M script, you can use the following features:

  • The error icon, which shows a red cross next to the step name if there is an error in the Power Query M expression. You can hover over the error icon to see the error message and the location of the error.
  • The information icon, which shows a blue circle next to the step name if there is a warning or a suggestion for the Power Query M expression. You can hover over the information icon to see the warning or the suggestion message and the location of the issue.
  • The validate button, which allows you to check the syntax and semantics of your Power Query M script. You can click on the validate button in the Home tab of the ribbon, or press Ctrl+Shift+V on your keyboard. If there are any errors or warnings, you will see a message box with the details and the location of the issues.

Using the query editor, you can write and debug Power Query M scripts to apply custom transformations to your data. You can also preview and validate your data transformation results at each step, as we will see in the next section.

How do you preview and validate your data transformation results before publishing and executing your wrangling data flow? Let’s find out in the next section!

6. How to Preview and Validate Data Transformation Results

One of the most useful features of wrangling data flows is that you can preview and validate your data transformation results at any step of the process. Previewing and validating your data transformation results can help you ensure the accuracy and quality of your data, as well as troubleshoot any errors or issues in your wrangling data flow.

To preview your data transformation results, you can use the data grid, which shows your data in a tabular format, with columns and rows. You can select any step in the Applied Steps pane, and see the input and output data of that step in the data grid. You can also compare the input and output data of two steps by selecting them and clicking on the Compare button in the Home tab of the ribbon.

To validate your data transformation results, you can use the data profile and the column statistics features, which provide you with useful information and insights about your data quality and integrity. You can access the data profile and the column statistics from the Data Profile and the Column Statistics buttons in the Home tab of the ribbon.

The data profile shows you a summary of your data, such as the number of rows, columns, and errors, as well as the data type distribution and the data quality score. The data quality score is a measure of how clean and consistent your data is, based on factors such as null values, duplicates, and outliers. You can also see the data profile for each column by clicking on the column header.

The column statistics show you detailed information and statistics about each column, such as the minimum, maximum, average, median, standard deviation, and percentile values, as well as the frequency distribution and the histogram chart. You can also see the column statistics for multiple columns by selecting them and clicking on the Column Statistics button.

Using the data profile and the column statistics, you can validate your data transformation results and identify any potential issues or anomalies in your data. You can also apply filters, sorting, and grouping to your data to further explore and analyze your data.

Previewing and validating your data transformation results is an important step in ensuring the success of your wrangling data flow. It can help you verify that your data is transformed correctly and meets your expectations and requirements.

How do you publish and execute your wrangling data flow? Let’s find out in the next and final section!

7. Conclusion

In this blog, you have learned how to use wrangling data flows in Azure Data Factory to transform data with a spreadsheet-like interface and Power Query M scripts. You have also learned how to preview and validate your data transformation results before publishing and executing your wrangling data flow.

Wrangling data flows are a powerful and convenient way of transforming data in Azure Data Factory. They can help you prepare and shape your data for further analysis, integration, or visualization. You can use wrangling data flows to perform various data transformation tasks, such as filtering, sorting, grouping, pivoting, merging, splitting, and formatting. You can also write and debug Power Query M scripts to apply custom logic and transformations to your data.

Previewing and validating your data transformation results is an important step in ensuring the success of your wrangling data flow. It can help you verify that your data is transformed correctly and meets your expectations and requirements. You can use the data profile and the column statistics features to validate your data quality and integrity, and identify any potential issues or anomalies in your data.

To create and run your own wrangling data flow in Azure Data Factory, you need to follow these steps:

  1. Create a data factory resource in Azure portal, if you don’t have one already.
  2. Create a linked service to connect to your data source, such as Azure Blob Storage, Azure Data Lake Storage, or Azure SQL Database.
  3. Create a dataset to define the schema and format of your data source.
  4. Create a data flow activity and select the wrangling data flow option.
  5. Add a source transformation and select your dataset as the input.
  6. Add and configure other transformations to transform your data according to your needs.
  7. Add a sink transformation and select your destination dataset as the output.
  8. Preview and validate your data transformation results at each step.
  9. Publish and execute your wrangling data flow.

We hope you have enjoyed this blog and learned something new and useful. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *