Using Azure Data Factory to Copy Data Between Azure File Shares – Part 2

This blog post is a continuation of Part 1 Using Azure Data Factory to Copy Data Between Azure File Shares. So lets get cracking with the storage account configuration.

Storage Account Configuration

Lets start off with the basics, we will have two storage accounts which are:

vmfwepsts001 which is the source datastore
vmfwedsts001 which is the sink datastore

Within each storage account we have three file shares:

documents
images
videos

When configured each storage account should look something like this.

Right lets move onto the Data Factory configuration.

Data Factory Configuration

I have created a V2 Data Factory called vmfwepdf001. Next let’s click on Author & Monitor as shown below.

This will now redirect us to the Azure Data Factory landing page. We need to select ‘Copy Data’.

We need to give the pipeline a name, in this instance, I have chosen Document Share Copy. To keep the file shares in ‘sync’ we are going to use a schedule with a trigger type of ‘schedule’.

Depending on how often you want the pipeline to run, you can run the task every minute if required with no end date. I have chosen a daily basis as shown in the screenshot below.

When your ready, click next. We are now ready to select our Source Data Storage which will be ‘Azure File Storage’. To enable Azure Data Factory to access the Storage Account we need to Create a New Connection.

A new Linked Service, popup box will appear, ensure you select Azure File Storage. Give the Linked Service a name, I have used ‘ProductionDocuments’. You can create a custom Integration Runtime to allow the data processing to occur in a specific Azure Region if required. In this instance, I’m going to leave it as ‘AutoResolveIntegrationRuntime’.

Azure Data Factory requires the Host to be in a specific format which is //storageaccountname.file.core.windows.net/filesharename

The user name is your storage account name and the password is your storage account access key.

The below screenshot provides the configuration.

If you have entered everything correct;y, when you click on ‘Test Connection’ you should receive a Green Tick! Click Next and then Next again, it will test your connection again.

When you are greeted with the ”input file or folder’ screen, we need to define a few pieces of information as follows:

File or Folder – leave this blank unless you want to focus on a specific file or sub-folder
File Loading Behaviour – this is really a design decision between load all files and incremental load : LastModifiedDate
Copy File Recursively – Copy all files and subfolders, I would suggest selecting this
Compression Type – None

Once configured it should look something like this:

Follow the same process for the Destination Data Store, when you get to the output file or folder screen, we need to define a few settings as follows:

File or Folder – leave this blank unless you want to focus on a specific file or sub-folder
Compression Type – None
Copy Behaviour – Preserve hierarchy which means we will preserve the folder structure

Once configured it should look like this:

Click next then Next and you will see a Summary of your configuration. Click Next and you should see your Data Factory completed.

Does It Work?

Lets check that this works. I have loaded a few files into my Production Storage Account under Documents.

On the Azure Data Factory Landing page, click the Pencil (top left) > Select Pipelines > Document Share Copy > Trigger > Trigger Now as per the screenshot below.

Checking my Development Storage Account, I now have the three files available, success!

I hope you found this post useful, tune in for some more in the near future.

2 thoughts on “Using Azure Data Factory to Copy Data Between Azure File Shares – Part 2”

Valerio Farias says:

17 October 2019 at 14:39

Man… it doesn’t work for me… Host is correct and still gets test connection wrong..

1. richi says:
  
  2 June 2020 at 01:57
  
  do someone now if this works when azure firewall of storage account is up. Getting an error when the firewall of stg account is up. And cannot find documentation supporting that when firewall is up this is not possible

Share this:

Related

2 thoughts on “Using Azure Data Factory to Copy Data Between Azure File Shares – Part 2”

Leave a Reply Cancel reply