Azure Data Factory — Create pipeline

Vitali Dedkov
4 min readMar 2, 2021

In this post I hope to give you an idea of how to create a simple pipeline in Azure Data Factory.

Azure Data Factory is an orchestration tool provided by Azure and it allows users to build workflows that move data from point A to point B.

This article is for those that have heard about Azure Data Factory and were hoping for a good guide on how to create a pipeline yourself.

With that out of the way let’s go through and actually create our first pipeline.

Prerequisites

Before being able to create this pipeline we have to make sure that you have the following things:

  1. An Azure account (either through your company or your own).
  2. Created an Azure Data Factory.

Keep in mind that when we say Azure Data Factory pipeline we don’t just mean creating the Azure Data Factory in the Azure portal. When I say “creating a pipeline” that means this is a pipeline inside an Azure Data Factory that was created in the Azure portal.

Lets get started!

Logging in

All great journeys being with a first step, and our first step is to open up adf.azure.com and then find the Azure Data Factory that you have created. This will usually be under a specific subscription group.

Once you find your Azure Data Factory click on “Continue” to log into your Azure Data Factory.

First step to greatness!!!

Navigating the home page

Once you have logged into Azure Data Factory either click on the pencil icon on the left hand side or click on “Create pipeline” icon on the welcome page.

If only everything was as simple as this.

What we are doing in this step is called “Authoring” a pipeline. While there are a lot of different actions you can do at the home page I usually prefer to go straight to the Author section and start developing a pipeline.

“Authoring” your first pipeline

Now that we are in “developer” mode, the creation of a new pipeline is actually very easy.

If you notice currently we have a very big 0 next to the “Pipeline” section.

Its okay, no one is judging.

So to rectify click on that 0 then select “New Pipeline”.

And that is it. Now you should see that a new pipeline is created called “pipeline 1” and on the right hand side you will see the “Activities” pane and the “Properties” tab open up on the right hand side.

Taking that one small step!!!

As you can see a lot of stuff has appears once you have created that new pipeline. Lets go through the different tabs that you might see.

Properties tab

This tab can be thought of as anything that you need to describer your pipeline. The name should be short enough, but descriptive enough to let you know what this pipeline does.

In the description section feel free to put as many details as you want, but make sure that again you don’t write a novel.

For concurrency basically manages parallelism for this pipeline, but unless you have a specific reason to limit this I would just leave this one alone. In my opinion that should be done through limiting things through actual pipeline activities or proper chaining of activities.

Activities tab

This section can be though of as a list of all the things (or activities) that your pipeline needs to do. Each Activity is itself a contained unit. For example, click and hold the copy activity and move it to the whitespace screen.

Once placed you will notice that there are multiple tabs available to you, but the main two are “Source” and “Sink”. Think of these as the where you are getting your data (Source) and where you are going to write/copy data to (Sink).

Now click on the little trashcan icon to delete these activity for now.

Summary and next step

Now that we have created a blank pipeline we won’t be able to save it yet. If you click on the “Publish all” button it will show an error saying that we can’t have a blank pipeline.

In my next post I will go through adding a source and sink and then being able to publish/save that new pipeline.

--

--

Vitali Dedkov

Hello, I am currently a Data Engineer working in the Azure space. Before that I was an ETL developer in SAP space.