Using Azure Data Factory to Copy Data Between Azure File Shares – Part 3

This blog post is a continuation of Part 1 Using Azure Data Factory to Copy Data Between Azure File Shares and
Part 2 Using Azure Data Factory to Copy Data Between Azure File Shares. In this final part we are going to configure alerts to send an email on a failed pipeline run.

First of all select your Data Factory and then Select > Alerts > New Alerts Rule

In the previous configuration, the Azure Data Factory is running once a day. So with this in mind, we are going to Select ‘Add Condition’ then Failed Pipeline Runs.

Scroll down and Select Alert Logic. Ensure the conditions are set to Greater Than, Total 1. This essentially defines that if an issue occurs, perform an action.

Under the Evaluation based on, Select 12 Hours and Frequency Every Hour. This is how often the query is evaluated. It should look something like this:

Next we need to create an Action Group so when the above condition is met, an action is taken. I have called my Action Group VMF-WE-DFAG01, which stands for VMFocus, West Europe, DataFactory, ActionGroup 01.

For the short name, I have used Copy Failure, note this needs to be under 12 characters long.

Finally, I have chosen the ‘Action Type’ as Email/SMS/Push and entered in the appropriate contact details. Once done it should look something like this.

After a short while, you will receive an email from Microsoft Azure to confirm that you have been added to an Action Group.

Finally we want to give the Alert Rule a Name and a Description, such as the below.

That’s it your Azure Data Factory is all configured and ready for production use!

Using Azure Data Factory to Copy Data Between Azure File Shares – Part 2

This blog post is a continuation of Part 1 Using Azure Data Factory to Copy Data Between Azure File Shares. So lets get cracking with the storage account configuration.

Storage Account Configuration

Lets start off with the basics, we will have two storage accounts which are:

  • vmfwepsts001 which is the source datastore
  • vmfwedsts001 which is the sink datastore

Within each storage account we have three file shares:

  • documents
  • images
  • videos

When configured each storage account should look something like this.

Right lets move onto the Data Factory configuration.

Data Factory Configuration

I have created a V2 Data Factory called vmfwepdf001.  Next let’s click on Author & Monitor as shown below.

data factory 02.PNG

This will now redirect us to the Azure Data Factory landing page.  We need to select ‘Copy Data’.

data factory 03.PNG

We need to give the pipeline a name, in this instance, I have chosen Document Share Copy.  To keep the file shares in ‘sync’ we are going to use a schedule with a trigger type of ‘schedule’.

Depending on how often you want the pipeline to run, you can run the task every minute if required with no end date.  I have chosen a daily basis as shown in the screenshot below.

data factory 04.PNG

When your ready, click next.  We are now ready to select our Source Data Storage which will be ‘Azure File Storage’.  To enable Azure Data Factory to access the Storage Account we need to Create a New Connection.

data factory 05.PNG

A new Linked Service, popup box will appear, ensure you select Azure File Storage.  Give the Linked Service a name, I have used ‘ProductionDocuments’. You can create a custom Integration Runtime to allow the data processing to occur in a specific Azure Region if required.  In this instance, I’m going to leave it as ‘AutoResolveIntegrationRuntime’.

Azure Data Factory requires the Host to be in a specific format which is //storageaccountname.file.core.windows.net/filesharename

The user name is your storage account name and the password is your storage account access key.

The below screenshot provides the configuration.

data factory 06

If you have entered everything correct;y, when you click on ‘Test Connection’ you should receive a Green Tick! Click Next and then Next again, it will test your connection again.

When you are greeted with the ”input file or folder’ screen, we need to define a few pieces of information as follows:

  • File or Folder – leave this blank unless you want to focus on a specific file or sub-folder
  • File Loading Behaviour – this is really a design decision between load all files and incremental load : LastModifiedDate
  • Copy File Recursively – Copy all files and subfolders, I would suggest selecting this
  • Compression Type – None

Once configured it should look something like this:data factory 08.PNG

Follow the same process for the Destination Data Store, when you get to the output file or folder screen, we need to define a few settings as follows:

  • File or Folder – leave this blank unless you want to focus on a specific file or sub-folder
  • Compression Type – None
  • Copy Behaviour – Preserve hierarchy which means we will preserve the folder structure

Once configured it should look like this:data factory 07.PNG

Click next then Next and you will see a Summary of your configuration.  Click Next and you should see your Data Factory completed.

data factory 09.PNG

Does It Work?

Lets check that this works.  I have loaded a few files into my Production Storage Account under Documents.

data factory 10.PNG

On the Azure Data Factory Landing page, click the Pencil (top left) > Select Pipelines > Document Share Copy > Trigger > Trigger Now as per the screenshot below.

data factory 11.PNG

Checking my Development Storage Account, I now have the three files available, success!

data factory 12.PNG

I hope you found this post useful, tune in for some more in the near future.

 

 

Using Azure Data Factory to Copy Data Between Azure File Shares – Part 1

I was set an interesting challenge by a customer to copy the data in their Production Subscription Azure File Shares into their Development Subscription Azure File Shares. The reason behind this was to ensure that any uploads to their Production environment are kept inline with the Development environment, enabling testing to be performed on ‘live’ data.

The customer wanted something which was easy to manage, which provided visibility of data movement tasks within the Azure Portal without needing to manage and maintain PowerShell scripts.

The answer to this was Azure Data Factory.

What Is Azure Data Factory?

Azure Data Factory is a managed data integration service that enables data driven workflows between either on-premises to public cloud or within public clouds.

Pipelines

A pipeline is a logical grouping of activities that together perform a task.  The activities within the pipeline define actions to perform on data. 

Data Factory supports three types of activities data movement activities, data transformation activities and control activities. In this use case, data movement activities will be used to copy data from the source data store to the destination data sink.

Linked Service

Linked Services are used to link data stores to the Azure Data Factory.   With the ‘data set’ representing the structure of the data and the linked service defining the connection to the external data source. The diagram below provides a logical overview of this.

Integration Runtime

For copy activities an integration runtime is required to determine the source and sink linked services to define the direction of data flow.  To ensure data locality a custom integration runtime will be used within West Europe.

Source Datastore

Each file share within the vmfwepsts001 Storage Account is an individual linked service.  Therefore, four source linked services will be defined for data, documents, images and videos.

Sink Datastore

Each destination file share within the vmfwedsts001 Storage Account is an individual linked service.  Therefore, four source linked services will be defined for data, documents, images and videos.

Copy behaviour to the sink datastore can be undertaken using three methods:

  • Preserve Hierarchy the relative path of source file to source folder is identical to the relative path of the target file and folder
  • Flatten Hierarchy all files from the source folder are in the first level of target folder.  The target files have auto generated names
  • Merge Files merges all files from the source folder to one file, using an auto generated name

To maintain the file and folder structure, preserve hierarchy copy behaviour will be used.

Tune in for the next blog post when we will cover the configuration settings.

Azure CDN: Custom Cache Rules

It was just over a couple of years ago when I wrote the Azure CDN Concept blog post.

I was recently asked by a customer to apply caching rules to only a specific set of file extensions using a custom domain name.  So with this in mind, I thought I would share the process with you.

Step 1 – Which CDN?

Microsoft Azure provides a number of CDN, so we need to find the correct CDN to meet requirements which are custom caching rules and custom domain HTTPS.

Looking at the Compare Azure CDN Product Features page it shows that only Standard Verizon and Premium Verizon will meet the requirements.

In this case, I will start by using Standard Verizon, we can migrate to Premium Verizon if needed.

Step 2 – Caching Rules

Azure CDN uses the HTTP caching specialisation RFC 7234.  It should be noted that not all resources can be cached in particular Standard Verizon only deals with:

  • HTTP Status Codes 200
  • HTTP Methods GET
  • File Size Limits 300GB

By default Standard Verizon caches any HTTP Status 200 Codes for 7 days.  To override this, we need to enable Global Caching Rules which affect the caching behaviour for all requests.

In this case we want to set the caching behaviour to ‘Bypass Cache’ meaning that no content which will be cached.

Next we then set our specific Custom Caching Rules which supersede the Global Caching Rules using File Extension types for example:

We are now utilising the Standard Verizon CDN to only cache jpg, jpeg, png and gif file extensions.

Final Thought

In a nutshell Custom Caching Rules override, Global Caching Rules which override Default Caching Rules.

Think of it like a game of top trumps, for those of you who don’t know what this is, I would suggest adding a pack to your Christmas list!

Standard SSD: Azure Backup Failure

imagesI have been undertaking a customer deployment and thought I would share this nugget of information which may save you some time.

Standard SSD

Even though Standard SSD are now GA as per this article.  We are unable to backup VMs with Standard SSD, receiving in total two error messages.

The first error message is the initial job to configure the backup fails with the message ‘Deployment to resource group ‘name’ failed.  Additional details from the underlying API that might be helpful: At least one resource deployment operation failed.  Please list deployment operations for details.  Please see https://aka.ms/arm-debug for usage details.

Azure Backup 01

Digging a bit deeper we receive the Error Code ‘UserErrorGuestAgentStatusUnavailble’ with a recommended action of ‘Ensure the VM has network connectivity and the VM agent is update and running.  For more information, please refer to https://aka.ms/guestagent-status-unavailable’.

A quick reboot of the VM and this resolves the initial ‘configure backup error’ with Standard SSDs.

We then go to protect the VM and undertake the initial backup and this is where the problem occurs.  After two plus hours, you will receive an error notification which states ‘The storage type is not supported by Azure Backup’.

Azure Backup 02

This is a known issue and is documented in the ‘Prepare Resource Manager Deployed VMs‘ article under the section ‘Limitations when backing up and restoring a VM’.

So for now, you can deploy VMs with Standard SSD but you can’t back up the entire VM using Azure IaaS VM Backup!

Update

Azure Backup now supports Standard SSD see blog post here.