Using Azure Data Factory to Copy Data Between Azure File Shares – Part 2

This blog post is a continuation of Part 1 Using Azure Data Factory to Copy Data Between Azure File Shares. So lets get cracking with the storage account configuration.

Storage Account Configuration

Lets start off with the basics, we will have two storage accounts which are:

  • vmfwepsts001 which is the source datastore
  • vmfwedsts001 which is the sink datastore

Within each storage account we have three file shares:

  • documents
  • images
  • videos

When configured each storage account should look something like this.

Right lets move onto the Data Factory configuration.

Data Factory Configuration

I have created a V2 Data Factory called vmfwepdf001.  Next let’s click on Author & Monitor as shown below.

data factory 02.PNG

This will now redirect us to the Azure Data Factory landing page.  We need to select ‘Copy Data’.

data factory 03.PNG

We need to give the pipeline a name, in this instance, I have chosen Document Share Copy.  To keep the file shares in ‘sync’ we are going to use a schedule with a trigger type of ‘schedule’.

Depending on how often you want the pipeline to run, you can run the task every minute if required with no end date.  I have chosen a daily basis as shown in the screenshot below.

data factory 04.PNG

When your ready, click next.  We are now ready to select our Source Data Storage which will be ‘Azure File Storage’.  To enable Azure Data Factory to access the Storage Account we need to Create a New Connection.

data factory 05.PNG

A new Linked Service, popup box will appear, ensure you select Azure File Storage.  Give the Linked Service a name, I have used ‘ProductionDocuments’. You can create a custom Integration Runtime to allow the data processing to occur in a specific Azure Region if required.  In this instance, I’m going to leave it as ‘AutoResolveIntegrationRuntime’.

Azure Data Factory requires the Host to be in a specific format which is //storageaccountname.file.core.windows.net/filesharename

The user name is your storage account name and the password is your storage account access key.

The below screenshot provides the configuration.

data factory 06

If you have entered everything correct;y, when you click on ‘Test Connection’ you should receive a Green Tick! Click Next and then Next again, it will test your connection again.

When you are greeted with the ”input file or folder’ screen, we need to define a few pieces of information as follows:

  • File or Folder – leave this blank unless you want to focus on a specific file or sub-folder
  • File Loading Behaviour – this is really a design decision between load all files and incremental load : LastModifiedDate
  • Copy File Recursively – Copy all files and subfolders, I would suggest selecting this
  • Compression Type – None

Once configured it should look something like this:data factory 08.PNG

Follow the same process for the Destination Data Store, when you get to the output file or folder screen, we need to define a few settings as follows:

  • File or Folder – leave this blank unless you want to focus on a specific file or sub-folder
  • Compression Type – None
  • Copy Behaviour – Preserve hierarchy which means we will preserve the folder structure

Once configured it should look like this:data factory 07.PNG

Click next then Next and you will see a Summary of your configuration.  Click Next and you should see your Data Factory completed.

data factory 09.PNG

Does It Work?

Lets check that this works.  I have loaded a few files into my Production Storage Account under Documents.

data factory 10.PNG

On the Azure Data Factory Landing page, click the Pencil (top left) > Select Pipelines > Document Share Copy > Trigger > Trigger Now as per the screenshot below.

data factory 11.PNG

Checking my Development Storage Account, I now have the three files available, success!

data factory 12.PNG

I hope you found this post useful, tune in for some more in the near future.

 

 

Leave a Reply