Just in Time Virtual Machine Access

Security Centre.png

Consider for a moment, the attack vector on your virtual machines.  You may have some ports exposed to the public internet , however these are likely to be protected using Next Generation Firewalls and perhaps even a DDoS scrubbing service from your ISP.

Perhaps the largest attack vector are your management ports such as SSH, RDP and WMI to name but a few.  When these ports are open, it allows anyone to try and obtain access  whether it is a authorised or not.

This is where ‘Just in Time Virtual Machine Access’ steps in to reduce your overall attack surface.  Access to management ports are closed and access is only granted from either trusted IP’s or per request.

How Does It Work

Just in Time (JIT) works in conjunction with Network Security Groups (NSG) and Role Based Access Control (RBAC) to open up management ports on a timed basis.

  • Works for VMs which are both public and private accessible
  • Requires write access the VM

The second point makes perfect sense, we have customers who have read access to certain elements within the Azure portal to review logs or performance charts, but aren’t allowed access to the virtual machines.

To gain access on a desired management port, the requester must have ‘Contributor’ rights to the VM.  Which means that the following points need to be considered:

  • The requester requires an Azure AD Account
  • RBAC configuration
  • Access for third parties using Azure B2B

Configuration Choices

At the time of writing, you can define the following conditions per VM policy:

  • Port
  • Protocol (TCP/UDP)
  • Allowed Sources either Per Request or IP Range
  • Maximum Request Time  1 to 24 Hours

Requesting Access

Once the JIT policy has been applied to the VM.  A user logs into the Azure Portal and then has to open Security Center.  From within this they need to select Just in time VM access, select the VM and ‘Request Access’ choosing the Ports and Time Frame required.

Final Thoughts

The process to enable JIT is straight forward but does require some detailed consideration on how RBAC is configured.

Requesting access to a VM is currently quite clunky it would be great if a JIT portal was available for this purpose.

StorSimple Overview Subscription Model

Esh Group - StorSimple v0.1

Microsoft have changed the model for the StorSimple devices.  In the previous iteration it was based on an upfront commit to Azure Consumption of either $60K or $100K.  Under the new model, it’s subscription based which means that for:

  • $1,333 or £999 per month you can bag yourself an 8100 device
  • $1,916 or £1436 per month for a 8600 device

StorSimple Overview

StorSimple is a Cloud-integrated Storage (CiS) solution that stores highly active or heavily used data locally while it moves older and less frequently used data into the cloud.   StorSimple is designed to be a best-of-both-worlds solution for storage, backup, and recovery. While on-premises storage is more appropriate for data that undergoes real-time processing, cloud storage is the better option for archiving and housing your periodic backups and infrequently used files.

  • Data transmission between the StorSimple system and cloud storage are encrypted using SSL, supporting up to AES 256 bit session encryption during data transfers between the StorSimple system and Microsoft Azure Storage.
  • The StorSimple 8600 model offers up to 500TB of storage that can be allocated and this is split between the local device (approx. 38TB before compression) and the cloud.
  • Microsoft Azure StorSimple automatically arranges data in logical tiers based on current usage, age, and relationship to other data. Data that is most active is stored locally, while less active and inactive data is automatically migrated to the cloud.
  • Microsoft Azure StorSimple uses deduplication and data compression to further reduce storage requirements. Deduplication reduces the overall amount of data stored by eliminating redundancy in the stored data set. As information changes, StorSimple ignores the unchanged data and captures only the changes. In addition, StorSimple reduces the amount of stored data by identifying and removing unnecessary information.

Disaster Recovery

StorSimple provides device failover using the backup copies of on-premises volumes held within Microsoft Azure.  During a failure scenario the Microsoft Azure based StorSimple Device Manager rehydrates the secondary StorSimple with the data held within the cloud based Storage Account.

It should be noted that dependent on the backup schedule, data change rate and network bandwidth to Microsoft Azure, data loss is possible.

StorSimple Conceptual Design v0.1.png

Microsoft Azure – Auto Scaling

Autoscaling v0.1The ability to dynamically scale to a public cloud was one of the mantra’s I used to hear a couple of years ago.  When reality struck and customers realised that there monolithic applications wouldn’t be suitable for this construct they realised they would need to re-architect.

Wind forward a couple of years and the use of Microsoft Azure Auto Scaling has become a reality, so with this in mind  I thought it would be a good idea to share a blog post on the subject.

What Is Auto Scaling?

Auto Scaling is the process of increasing either the number of instances (scale out/in) or the compute power (scale up/down) when a level of demand is reached.

Scale Up/Down

Scale Up or Down is targeted at increasing or decreasing the compute assigned to a VM.  Microsoft have a number of ways in which you can ‘scale up’ on the Azure platform.  To vertically scale you can use any of the following:

  • Manual Process – Simple keep your VHD and deploy a new VM with greater resources.
  • Azure Automation – For VM’s which are not identical you can use Azure Automation with Web Hooks to monitor conditions e.g. CPU over ‘x’ time greater than ‘x’ and then scale up the VM within the same series of VM
  • Scale Sets – For VM’s which are identical you can use Scale Sets which is a PaaS offering which ensures that fault, update domains and load balancing is built in.

Note that using a Manual Process, Azure Automation and Scale Sets to resize a VM will require a VM restart

The diagram below provides a logical overview of a Scale Set.

Scale Set v0.1

Scale Out/In

Scale Out or In is targeted at increasing or decreasing the number of instances, which could be made up of VM’s, Service Fabric, App Service or Cloud Service.

Common approaches are to use VM’s for applications which will support Scale Out/In.  Typically a piece of middleware that performs number crunching but holds no data or perhaps a worker role that is used to transport data from point a to be.

For websites it is more common to use App Service ‘Web Apps’ which in a nutshell provides a PaaS service and depending on the hosting option chosen Standard, Premium or Isolated will dictate the maximum number of instances and Auto Scale support.

Considerations

Auto Scaling requires time to scale up or out, it doesn’t respond to a single spike in CPU usage, it looks at averages over a 45 minute period.  Therefore it is suggested that if you know when a peak workload is likely it could be more efficient to deploy Auto Scaling using a schedule.

To ensure that a run away process doesn’t cause costs to spiral out of control, use tags, a different Azure subscription, email alerting or perhaps even limit the number of instances on Auto Scale.

Azure Migrate – Initial Thoughts

When Microsoft announced Azure Migrate at Ignite, I was enticed and signed up for the limited preview.  Having being accepted to the program I thought I would share my initial thoughts.

What Is Azure Migrate?

It a set of tools provided by Microsoft to enable you to provide a high level overview of your on-premises virtual machines and a possible migration approach to Microsoft Azure.

It’s components are as follows:

  • OVA file which is Windows Server 2012 R2 that runs a Collector that connects to vCenter to extract information.
  • Unique credentials that are entered into the Collector to securely report back information to Migrate PaaS within your Azure Subscription
  • Assessment that enables you to group virtual machines into readiness for Azure along with expected monthly costs

  • Azure Readiness assessment per virtual machine with data customization

  • Data export to Microsoft Excel to enable further information manipulation
  • Integration with OMS solution pack Service Map to provide application dependency mapping, communication paths, performance data and update requirements

Azure Migrate 04

On-Premises Support

In the limited preview, the support for on-premises systems in limited to vCenter 5.5 and 6.0.  However, I ran the Collector against a vCenter Server Appliance 6.5 without any issues.

The guest operating system extends to those supported by Microsoft Azure, which makes sense.

Known Issues

As this is a limited preview, I’m sure that these issues will be resolved in due course.

  • Windows Server 2016 showing as an ‘Unsupported OS’ in Azure Readiness report
  • SQL not providing a link to the Azure Database Migration Service
  • For 182 VMs the suggested tool is always ‘Requires Deep Discovery’

Final Thought

Azure Migrate will be a good starting point (when it is generally available) to provide a high level overview of readiness for Azure.  It will require human intervention to overlay application considerations to ensure they are natively highly available to meet customer SLA’s.

 

 

Storage Spaces Direct Overview

Storage Spaces Direct is an area which I have been meaning to look into, but for one reason or another it has slipped through the gaps until now.

What Is Storage Spaces Direct

Storage Spaces Direct is a shared nothing software defined storage which is part of the Windows Server 2016 operating system.  It creates a pool of storage by using local hard drives from a collection (two or more) individual servers.

The storage pool is used to create volumes which have in built resilience, so if a server or hard drive fails, data remains online and accessible.

What Is The Secret Sauce?

The secret sauce is within the ‘storage bus’ which is essentially the transport layer that provides the interaction between the physical disks across the network using SMB3. It allows each of the Hosts to see all disks as if they where it’s own local disk using Cluster Ports and Cluster Block Filter.

The Cluster Ports is like an initiator in iSCSI terms and Cluster Block Filter is the target, this allows each disk to presented to each Host as if it was it’s own.

Storage Bus v0.1

For a Microsoft supported platform you will need a 10GbE network with RDMA compliant HBA’s with either iWARP or RoCE for the Storage Bus.

Disks

When it comes to Storage Spaces Direct, all disks are not equal and you have a number of disk configurations which can be used.   Drive choices are as follows:

  • All Flash NVMe
  • All Flash SSD
  • NVMe for Cache and SSD for Capacity (Writes are cached and Reads are not Cached)
  • NVMe for Cache and HDD for Capacity
  • SSD for Cache and HDD for Capacity (could look at using more expensive SSD for cache and cheaper SSD for capacity)
  • NVMe for Cache and SSD and HDD for Capacity

In a SSD and HDD configuration the Storage Bus Layer Cache binds SSD to HDD to create a read/write cache.

Using NVMe based drives will provide circa 3 x times performance at typically 50% lower CPU cycles versus SSD, but come at a far greater cost point.

It should be notes that as a minimum 2 x SSD and 4 x HDD are needed for a supported Microsoft configuration.

Hardware

In relation to the hardware it must be on Windows Server Catalog and Certified for Windows Server 2016.  Both HPE DL380 Gen10 and Gen9 are supported along with HPE DL360 Gen10 and Gen9.  When deploying Storage Spaces Direct you need to ensure that the Cluster creation passes all validate tests to be supported by Microsoft.

  • All servers need to be the same make and model
  • Minimum of Intel Nehalem process
  • 4GB of RAM per TB of cache drive capacity on each server to store metadata e.g. 2 x 1TB SSD per Server then 8GB of RAM dedicated to Storage Spaces Direct
  • 2 x NICS that are RDMA capable with either iWARP or RoCE dedicated to the Storage Bus.
  • All servers must have the same drive configuration (type, size and firmware)
  • SSDs must have power loss protection (enterprise grade)
  • Simple pass through SAS HBA for SAS and SATA drives

Things to Note

  • The cache layer is completely consumed by Cluster Shared Volume and is not available to store data on
  • Microsoft recommendation is to make the cache drives a multiplier of capacity drives e.g. 2 x SSD per server then either 4 x HDD or 6 x HDD PER SERVER
  • Microsoft recommends a single Storage Pool per cluster e.g. all the disks across A 4 x Hyper-V Hosts contribute to a single Storage Pool
  • For a 2 x Server deployment the only resilience choice is a two way mirror.  Essentially data is written to two different HDD in two different servers, meaning your capacity layer is reduced by 50%.
  • For a 3 + Server deployment Microsoft recommends a three way mirror.  Essentially three copies of data across 3 x HDD on 3 x Servers reducing capacity to 33%.  You can undertake single parity (ALA RAID5) but Microsoft do not recommend this.
  • Typically a 10% cache to capacity scenario is recommended e.g. 4 x 4TB SSD is 16TB capacity then 2 x 800GB SSD should be used.
  • When the Storage Pool is configured Microsoft recommend leaving 1 x HDD worth of capacity for immediate in-place rebuilds of failed drives.  So with 4 x 4TB you would leave 4TB un allocated in reserve
  • Recommendation is to limit storage capacity per server to 100TB, to reduce resync of data after downtime, reboots or updates
  • Microsoft recommends using ReFS for Storage Spaces Direct for performance accelerations and built in protection against data corruption, however it does not support de-duplication yet.  See more details here https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview