When I think about storage, I normally visualize hard drives in client devices, some type of enterprise shared storage or a hyper-converged appliance that supports a single person or business towards meeting it’s requirements for storage capacity, performance, integrity and recoverability.
To talk about Azure Storage, I need to shift my perspective slightly. Azure Storage is truly a web scale storage solution currently supporting over 50 Trillion or 50,000,000,000,000,000,000 objects which is alot!
To understand Azure Storage we need to understand how it all fits together. When you have cloud multi-tenancy you need a way to ensure that only you have access to your own data (unless of course you choose to share it). This is the fundamental underpin of Azure Storage which is your Azure Storage Account.
Azure Storage Account
An Azure Storage Account is the gateway to accessing storage in Azure. When created you receive a unique namespace which is linked to the type of storage your are going to use for example http://storagevmfocus.blob.core.windows.net. This then in turn links to your storage billing which is based around four factors:
- Storage usage
- Replication of data
- Read and write operations
- Data transferred out to other Azure regions
Azure Storage Types
Now that we know that the starting point is an Azure Storage Account, what types of storage can Azure offer? These are broken down into four areas (I’m beginning to think Microsoft link the number four) which are Blob, Table, Queue and File.
Blobs is the name given to Microsoft’s cost effective cloud storage. It is used to store large amounts of unstructured data for example:
- Azure VM Hard Drives
- Media Files
- Backup Files
Blobs are further broken down into different categories this is to ensure that the storage is optimized for it’s intended workloads.
Table storage is provided by Microsoft’s NoSQL which is a distributed scale out store. Essentially its a repository for metadata that is captured and then needs to be accessed quickly. Example use cases for Table Storage is shown in the diagram below.
Queue storage is essentially a reliable messaging solution that passes information between different tiers of an application.
When held in the ‘queue’ data is kept until it is passed ‘a synchronously’ to the application. Example use cases for Queue Storage are:
- Communication between Websites and Applications
- Hybrid communications between on-premises and Azure applications
File storage is your traditional SMB 2.1 or 3.0 file share that we are used to accessing on a daily basis for example \\vmfocus\customer_files\. Access to the file share can be granted from on-premises using the net use command with the storage key for example
net use z: \\vmfocusprodstorage.file.core.windows.net\vmfocusfileshare /u:vmfocusprodstorage m1G1Xatnb9NgzEjCrx1gBtQ/xpyFR4N71i6imkt38VvKCWB2bK9X==
This can then be added to a group policy login script for users.
An application would access the file share from an on-premises location using the REST API.
My understanding is that SMB 3.0 file shares provide encryption and persistency they do not provide Role Based Access Control (RBAC) via Active Directory Users & Groups.
Great you say, I have moved some of my on-premises storage to Azure, but how do I make sure that data is available? Well Azure offers the concept of Storage Redundancy which is broken down into, yes you guessed it four areas, these are:
- Locally redundant storage (LRS)
- Zone redundant storage (ZRS)
- Geo redundant storage (GRS)
- Read access geo redundant storage (RA_GRS)
Locally Redundant Storage (LRS)
Data is held within the same datacentre, however it is replicated three times. Each replica sits inside a separate fault and update domains. This uses the same concept as Availability Sets, but for storage.
Use cases, include:
- Protection from hardware failures
- Provide redundancy inside a local datacentre to meet compliance or regulatory requirements
Zone Redundant Storage (ZRS)
Data is held across two or three datacentres either in the same region or across regions.
- Protection from hardware failures
- Provides a higher level of fault tolerance above LRS
Geo Redundant Storage (GRS)
Data is replicated to a second region. Data is replicated three times in the primary region like ZRS then replicated to a secondary region ‘a synchronously’. The purpose behind this is to ensure continued storage performance. Waiting for an acknowledgement from a replicated region would slow storage responses down.
Read Access Geo Redundant Storage (RA_GRS)
Works in the same was as GRS. However you have read access to the data at the secondary location. This can be useful for data mining operations where you don’t want to run against the primary set of data.
Azure storage is huge and needs to be thought about very carefully to ensure that no single part of the chain becomes the bottleneck for example you might have sufficient disk I/O but the limiting factor is the network throughput from source to target.
This leads us onto the next topic of discussion which is Virtual Machines.