Tuesday, May 27, 2014

Cloudy I/O Performance - Increasing Azure IOPS (Part 2 of 2)

Notes: This is part 2 of a 2 part post. You can find part 1 here. Since publication of this article Microsoft has introduced new machine types with higher performing storage. More information can be found here.

Foreword


In the last article we discussed a repeatable testing methodology to quantify storage performance in the cloud, and in this article we'll put that methodology into practice. I've done substantial testing in Azure and aim to illustrate what your options are for scaling performance at this point in time.

Scope

I undertook this project to see what can be done to increase disk I/O in Windows Azure IaaS. Upon researching the topic I found several interesting articles. Among those are:
There seems to be little consensus regarding disk striping in Windows Azure IaaS. Some blogs recommend this while some of Microsoft's own writing seems to discourage it. After combing through the options the following points stand out:
  • Disk Striping (Software RAID 0) may or may not increase performance based on your workload.
  • Striping will increase I/O capacity to a degree (which we'll test here).
  • What software striping solution works better: legacy (Windows software RAID from 2000 to present) or Storage Spaces (new software "RAID" in Windows 2012 and up)?
  • How does NTFS cluster size impact performance?
  • If striping, disable geo-replication as Microsoft explicitly warns against the use of geo-replication with this solution.
  • If possible, use native application load distribution rather that disk striping to split I/O.  (For example, split DB files in SQL across disks)
  • Some articles reference needing to use multiple storage accounts to get maximum performance. This is not true; as of 6/7/2012 storage account targets are 20,000 IOPS per account. Unless you will exceed the 20,000 keep all your disks on one account for the sake of simplicity. We will prove that does not have an impact on performance.
With that said, I want to quantify the solution for my given scaling problem with the notion that if the tests are simple enough to run, this approach can be used for any future scaling problem as well.

Putting it All Together

We'll use the testing methodology outlined in part 1 of this article to collect the results. In this case we need to first add disks and set up stripes in Azure Windows VMs.
Note: To jump straight to Azure disk performance tests, scroll to the bottom of this article.

Create New Disks and Attach to Designated VM

In order to run all the tests listed below, you need to know how to create new disks and attach them to your virtual machine. My favorite solution to this is to use a locally created dynamic VHD and upload it to the location you would like using PowerShell. Let's go through the process of attaching one disk as a primer:
  1. Decide which storage account you will use for these disks. If you plan on doing striping of any kind, ensure the storage account is set to "Locally Redundant" replication (Storage->Desired Storage Account->Configure), as "Geo Redundant" is not supported. Since the replication setting applies to all blobs (Azure's terminology; disks) in that account you may want to have a dedicated account for these disks to keep your others Geo Redundant.


  2. Determine what container you would like to store your Azure disk blob by opening the Azure management portal and navigating to Storage->Desired Storage Account->Containers and copy the URL to your clipboard. To keep things simple you may want to create a new storage container, so do so now and use that URL if desired.
  3. Using Hyper-V (On Windows 2008 or higher including Windows 8) create an empty dynamically expanding VHD disk of your desired size. For my testing I have been using 10GB disks. Note 1: Do not create a VHDX; Azure uses the older VHD format. Note 2: You'll need to re-create the VHD for each disk if you intend on using Storage Spaces as each disk must have a unique ID. 
  4. #create a dynamically expanding 10GB VHD; change size as appropriate
    New-VHD –Path $sourceVHD –SizeBytes 10GB -Dynamic
    
  5. This disk will be uploaded to the container we selected in step 1. Determine the name you want the disk to be referenced by in Azure and execute the following script:
  6. #import Azure cmdlets
    import-module azure.psd1
    #specify your subscription so PS knows what account to upload the data to
    select-azuresubscription "mysubscriptionname"
    #$sourceVHD should be the location of your empty vhd file
    $sourceVHD = "D:\Skydrive\Projects\Azure\AzureEmpty10G_Disk.vhd"
    #$destinationVHD should be the URL of the container and the name of the vhd you want created in your account. Obviously for subsequent disks you need to change the VHD name. 
    $destinationVHD = "http://yourstorageacctname.blob.core.windows.net/vhds/data02.vhd"
    #now upload it. 
    Add-AzureVhd -LocalFilePath $sourceVHD -Destination $destinationVHD
    


  7. Add this new disk as available to VMs by navigating to Virtual Machines->Disks->Create


  8. Enter the desired management name for this disk and input or browse to the URL of the VHD you just uploaded and click the check box.
  9. Attach the disk to your VM by navigating to Virtual Machines->Ensure your desired VM is highlighted->Attach->Attach Disk


  10. Select the disk we just added. Your cache preference will depend on the application. In my case this is off but you will want to use the methodology outlined in the first part of this article to test caching impact for your application. Note a change of cache status requires a VM reboot.
Now for a brief tutorial on how to set up our two types of striped disks; you'll likely only be using one of the two but I'll cover both just in case. Performance results of each are outlined later in this article.

Set Up a Traditional Software Stripe in Windows

Setting up a traditional software stripe is easy. I've tested this on Windows 2003 and higher.

  1. Logon to your VM as an admin and open the Disk Management tool.
  2. If prompted, allow the initialization of the disks.
  3. Right-click on one of the newly created empty volumes and select New Striped Volume.


  4. Select the desired disks and continue.


  5. Create and format a new NTFS disk using your striped volume. Make sure to pay attention to the cluster size (results below).

Setup a Storage Spaces Software Stripe in Windows 2012 or Higher

Microsoft introduced a new approach to disk pooling in Windows Server 2012 and Windows 8 called Storage Spaces. This interesting new tech allows for a myriad of different configuration options including disk tiering which can be useful for on-premise servers. In this case we'll be using the "simple" pool type which is similar to disk striping.
  1. Open Server Manager and navigate to File and Storage Services -> Volumes -> Storage Pools
  2. Under Storage Pools you should see "Primordial". (As opposed to "Unused Disks". I'm guessing someone was pretty proud of that.) Right click it and select "New Storage Pool".


  3. Walk through the Wizard selecting each disk you would like to be part of the pool.


  4. On the results page, ensure "Create a virtual disk when this wizard closes" is selected and click "Close".


  5. Walk through the Virtual Disk Wizard, specifying a meaningful name and selecting simple storage layout and fixed provisioning.


  6. On the results page, ensure "Create a volume when this wizard closes" is selected and click "Close".
  7. Complete the New Volume Wizard specifying your desired drive letter and desired NTFS cluster size.


Run Tests/Collect Results!

Now that we have our disks configured, we need to run our tests. For instructions how how to do so, see part 1 of this topic here.
When analyzing the IOMeter output you will want to pay special attention to the following metrics:
  • IOPS (Cumlative, Read, Write, Higher is better)
  • MBps (Cumlative, Read, Write, Higher is better)
  • Response Time (Avg, Avg Read, Avg Write, lower is better) 
If putting the data together for a report, Excel works nicely as I'll display below.

Results

Now for the most important part, the findings. Tests performed:

Sector Size Tests:

  • 1 Disk, 4k Sector Size (default)
  • 1 Disk, 8k Sector Size
  • 1 Disk, 16k Sector Size
  • 1 Disk, 32k Sector Size
  • 1 Disk, 64k Sector Size
  • 3 Disks, 4k Sector Size (results confirmation test)
  • 3 Disks, 32k Sector Size (results confirmation test)


Table 1-Cluster Size Tests
Table 2-Cluster Size Verification
Sector size tests echo what others have observed with Azure; since IOPS are capped at 500 (or 300 for basic VMs) larger sector sizes can result in higher throughput. In my case 32k was the sweet spot; depending on your workload your results will vary slightly. I have seen consistently (albeit slightly) higher performance with larger sector sizes in Azure.

Legacy Disk Striping Tests:

  • 1 Disk, 32k Sector Size
  • 2 Disks, Striped Volume, 32k Sector Size
  • 2 Disks in 2 Storage Accounts, Striped Volume, 32k Sector Size (Multiple Storage Account Test)
  • 3 Disks, Striped Volume, 32k Sector Size
  • 4 Disks, Striped Volume, 32k Sector Size
<See Bar Charts Below Under Disk Striping Methodology>
Table 3-Legacy Striping and Storage Account Tests
You can see with one disk we get 500 IOPS as expected. From there we can see a scaling trend that is most definitely not linear. Two disks result in 33% higher performance, while three disks add an additional 23% (64% from one disk). Adding the fourth disk actually results in a drop from three disks, coming it at 5% lower than three and 56% higher than one.
Additionally, we also see that splitting disks across storage accounts makes no appreciable difference.  Note: Bar charts for this results section have been combined into the graphs below.

Disk Striping Methodology Tests:

  • 2 Disks, Striped Volume 32k Sector Size
  • 2 Disks, Storage Spaces Simple, 32k Sector Size
  • 3 Disks, Striped Volume, 32k Sector Size
  • 3 Disks, Storage Spaces Simple, 32k Sector Size
  • 4 Disks, Striped Volume, 32k Sector Size
  • 4 Disks, Storage Spaces Simple, 32k Sector Size




Table 4-Legacy Striping vs. Storage Spaces Test
Now we compare legacy striping to the newly introduced Storage Spaces. Two disk scaling is a definitive win for Storage Spaces, while beyond that legacy striping generally performs better (save max latency). In my opinion two disk Storage Spaces stripe is the sweet spot here (56% IOPS improvement!) when considering that the with more disks we add complexity that doesn't pan out on the performance side.

Conclusion

I hope you have found these results interesting; I certainly have. Even if you choose not to run these tests yourself I hope my results prove helpful when sizing your machines. Since the access pattern I used is relatively universal it should be applicable in most scenarios.
Software level disk striping works relatively well in Microsoft Azure to increase per-disk performance in lieu of a provider level solution similar to Amazon EBS provisioned IOPS. Splitting the workload across logical disks or VMs is preferred but not applicable to all workloads. When employing this solution make sure you select only locally redundant replication because Micrsoft warns that geo-redundant replication may cause data consistency issues on the replication target.
For additional information see the links near the top of this article. Thanks for reading!

13 comments:

Unknown said...

This is a nice post, very thorough and detailed comparison. Especially useful is the debunking of the separate storage accounts recommendation, and I am curious about the falloff in IOPS after 3 disks.

Toby Meyer said...

Thanks Duane! I'm guessing (not substantiated) that after 3 or so the overhead of managing that many disks at a software level can't keep up with the pace of the disks. I have yet to see a software RAID that doesn't have diminishing scaling, and this seems to be no exception.

Glad you enjoyed it!

Unknown said...

Thanks for posting your test result!

Can I know the size of VM you use? is it A2 standard?

Also, it looks like SQL IaaS offers much better performance compared to Azure SQL Databases.

A2 + SQL Web = 170USD, and it offers comparable performance to P2, which is 930USD per month:

http://cbailiss.wordpress.com/2014/09/16/performance-in-new-azure-sql-database-performance-tiers/

Do you feel the same too about performance between the 2 offerings?

Toby Meyer said...

Hey @ Samuel Chou, great observation.

That's a bit sticky; getting like for like performance is something that will take a bit more work, but overall it does seem like it's possible to get decent performance out of SQL in IaaS in medium->small implementations. Since the IO falls off above 3 disk one may need to pursue Azure SQL or the new SSD instances.

Also keep in mind that if you were to use IaaS and install SQL you would need to pay licensing, which is included with Azure SQL, so that makes the direct comparison increasingly difficult. Azure SQL also has better native performance monitoring for anyone who isn't comfortable doing perfmon themselves.

Overall, you bring up a great point and have linked a great post as well. The sheer number of options available in Azure and Amazon are impressive, and it's comparisons like this that make me excited to be in this field. At the end of the day the solution that makes sense will depend mostly on your client's needs.

Thanks!

Jean B said...

Thanks for this post,

It confirms all the tests I've done on my account.

What I can see with my tests is that on my current dedicated server, a virtual one but In my own office, performances are about 4 or 5 times better for the read speed : about 60MBps VS 13MBps on azure.

So, I think it's difficult to migrate my productions servers on Azure with such performances issues.

Have you already migrated heavy applications/websites on the Azure platform ?

Toby Meyer said...

Hi @Jean,

Well, it depends. I have done some deployments, but not everything is cut out for IaaS deployments, particularly I/O heavy workloads. That said, Microsoft has just released "D" series VMs backed by SSDs, and they perform substantially better. For your I/O heavy workloads it's worth taking a look. You can learn more at ScottGu's site here. Make sure to check the comments on that post as well, there is some good info there.

Good luck!

Jean B said...

Thank you Toby,

I'll take a look at these D series !

Mike Malter said...

Have you done any work with hot spares? In the storage space, when you identify a drive to include, you can specify a hot spare. This is a level above the virtual drive where you pick striping or parity. Would the inclusion of a hot spare in a storage space to be used for a striped array help? What is the risk of creating a striped array in Azure? Thanks.

Toby Meyer said...

Hi Mike!

If any of the drives are corrupted for any reason all data would be lost. That said, as of now a striped array is "supported" in Microsoft Azure IaaS provided geo-replication is disabled. I assume that MSFT handles fault tolerance at a level lower than the IaaS implementation itself, meaning that they're accepting the responsibility of "disk" availability. Since striping drives is effectively RAID 0, a hot spare wouldn't help as by the time that a spare is needed the data is already corrupted. This would work for a striping with parity set, but the performance there wasn't high enough to even include it in this article.

Striping disks in Azure is more complex, and thus more risky. At this point I would advise looking at the G-Series VMs if you need substantially more IOps. This solution will continue to work to the performance levels realized though, and for less $$. I have this solution employed for a few customers and they have been satisfied with the price/performance.

Unknown said...

We have had good luck using the new premium storage with our VMs. We use them for elasticsearch.

Toby Meyer said...

Thanks for the feedback Matt, glad to hear it. Elasticsearch looks pretty cool, I haven't had a chance to focus on it as of yet.

Fortunately the content of my(this) article is getting less and less relevant as time goes on. :)

Martin Trott said...

Good to know about this informative blog. Thanks for sharing this blog. 24x7serversupport.com is one of the leading providers of AZURE management services.

Laura Bush said...

Thanks for sharing this article here about the Cloudy I/O Performance. Your article is very informative and I will share it with my other friends as the information is really very useful. Keep sharing your excellent work. Best Google Cloud Training in Delhi