itToby

Friday, June 23, 2017

Automating Service Principal Setup in Azure Active Directory

When automating tasks in Azure, you'll often need a service principal. Setting these up using the UI feels like driving my car to the mailbox. Let's automate this.

The final product when we finish automating everything.

Terms Bingo

Before getting into the article let's get a couple basic terms out of the way first:

Active Directory (AD): Microsoft's on-premises solution for managing users, computers, etc. introduced in Windows 2000.
Azure Active Directory (AAD): Microsoft's cloud solution for managing users, applications, and more. Hosted on Microsoft's Azure platform, this can integrate with on-premises Active Directory if desired, but is not required.
(AAD) Tenant: An instance of Azure Active Directory for a customer is called a "tenant". Most customers will have one tenant, but larger organizations may have multiple for varying reasons.

A Service Principal?

In the Active Directory world, automated tasks are performed by service accounts, be they traditional dedicated accounts or (group) managed service accounts. The key to securely performing these tasks is ensuring that unique security principals are used for each task, ensuring they have only the amount of access they need to perform the task in question, and that they're monitored proactively.

Azure Active Directory does not have the concept of service accounts, but there is a functional equivalent: Service Principals. Service principals are comprised of:

Azure Active Directory Registered Application: Registering an application in AAD is a way to then grant permissions (using a Service Principal) to that application within Azure and/or Azure Active Directory. This can be an application hosted in Azure, externally, or in our case, an automation task of another nature. An AAD registered application can also be used by other tenants (with a Service Principal on their side), but as we're talking about automating our own tasks that is outside the scope of this article.
The Service Principal itself: The service principal is an association with an AAD Application that allows for granting of permissions within that tenant. In our case we'll be discussing AAD registered apps and service principals in a 1:1 ratio.

Azure Entitlement Domains

To facilitate the principle of least privilege, we need to understand the levels of granularity by which permissions are assigned in Azure. There are three levels rights can be assigned at:

Subscription: One can grant a service principal, user, or other object access at an entire subscription level. There are very few cases where doing so would adhere to the principle of least privilege, so don't do this unless you have no option.
Resource Group: A resource group is a logical grouping of Azure resources. Depending on how you split your resource groups, this is likely a common place to assign privileges.
Object: Privileges can be assigned on a per-object basis as well, but managing security on a per-object basis is very complex and usually only done when absolutely necessary from a security perspective.

Sorry son, your RBAC doesn't give you access to the keyvault.

Usage Examples

So when would we use a service principal? Here are a couple examples:

Webjobs that interact with other resources: If you have webjobs that interact with other resources in your subscription you may choose to use a service principal to access those resources. An excellent example of this is the Let's Encrypt! web app extension.
Automated Tasks: Azure automation and/or external jobs running against Azure can leverage service principals to authenticate and perform their work. Code deployment platforms such as OctopusDeploy and VSTS are perfect examples.

Let Us Do This

And by this I mean the point of the article.

To make quick work of this from an automation perspective, we'll make a quick PowerShell function we can re-use in other scripts. The main PowerShell cmdlets we'll be leveraging are:

Just running these two commands is easy enough, but that's not all that useful from an automation perspective where we want to automate multiple operations that rely on the creation or existence of a service principal. To that end, we'll make a function that can accept a desired service principal name, check to see if it exists & create it if not. This function will output an object with all the necessary information for further use. We'll also generate a password if one isn't specified and report status regarding if the service principal already existed or not.We'll return everything in an object so our scripts can take appropriate actions for all possible scenarios.

Critical Note: The code below contains reference to a function that is not included, so before copying/pasting please read at least the "Note" sections in the code discussion below.

Update 1/2018: AzureRM 5.0 cmdlets require a securestring for New-AzureRmADApplication whereas it was not supported previously.

HERE COMES ANOTHER SCRIPT!

################################################################################
# Register-AzureServicePrincipal
# Given the correct input, does one of the following: 
# 1> checks for existence of application registration
# 2> checks to see if the app is registered as a service principal
# 3> if neither of those is true, creates the app and service principal
# > outputs an object with all details possible. If the app already exists the 
# password will be null because we can't look it up. 
#  INPUT: servicePrincipalName, the displayname of the desired App/ServicePrincipal and the desired password.
#       The password field is optional and if omitted a 30 character random password will be generated and returned. 
#  OUTPUT: an object containing the following NoteProperties
#       > ClientID: the GUID representing the application ID
#       > ServicePrincipalID: the GUID representing the Service Principal association
#       > SPNNames: The service principal names of the SP
#       > ServicePrincipalPassword: A securestring of the Application Password. NOTE: This will be NULL if the app is already registered in AD as we cannot retrieve it.
#       > ServicePrincipalAlreadyExists: boolean to indicate if the sp already existed or not
# USAGE NOTES: Assumes already logged into Azure with proper permissions and that the desired subscription is selected. 

Function Register-AzureServicePrincipal{
    param(
        # The name for the service principal. We won't make this mandatory to allow for manual entry mode with guidance. Obviously it needs to be specified for automation. 
        [string]$servicePrincipalName,
        # the password if you choose to specify it, otherwise the script will generate one for you. 
        [string]$servicePrincipalPassword
    )
    # Set the regex for the input validation on the SPN
    $SPNNamingStandard='^[--z]{5,40}$'
    Write-Host "Provisioning AzureAD App/Service Principal"
    Write-Warning "The account operating this script MUST have the role Subscription Admin or Owner in the desired subscription"
    $ErrorActionPreference = "Stop" # Error handing is not yet sufficient; try/catch the stuff below!
    if (!$servicePrincipalName){
        do {
            Write-Host "SPN naming standard is (in RegEx): $SPNNamingStandard"
            $servicePrincipalName=Read-Host "Service Principal Name not specified on startup; Please enter desired name or type GUID and press enter for a guid based random name"
            if ($servicePrincipalName -eq "GUID"){
                $guid=([guid]::NewGuid()).toString()
                $servicePrincipalName="SPN-$guid"
            }
        } until ($servicePrincipalName -match $SPNNamingStandard)
    }
    # handle command line specification of GUID 
    if ($servicePrincipalName -eq "GUID"){
        $guid=([guid]::NewGuid()).toString()
        $servicePrincipalName="SPN-$guid"
    }
    # set URL and IdentifierUris
    $homePage = "http://" + $servicePrincipalName
    $identifierUri = $homePage

    Write-Host "Desired Service Principal Name is $servicePrincipalName `n"

    # Now we need to determine if 1> the Application exists and 2> if it has been registered as a service principal. This will guide our execution through the end of the function. 
    $appExists=Get-AzureRmADApplication -DisplayNameStartWith $servicePrincipalName -ErrorAction SilentlyContinue
    # check for SPN only if app exists. SPN can't exist without app so no reason to check if not. 
    if ($appExists){$spnExists=Get-AzureRmADServicePrincipal | Where-Object {$_.ApplicationId -eq $appExists.ApplicationId} -ErrorAction SilentlyContinue}

    # we only need a password if the app hasn't been created yet.
    if (!$appExists){
        # Generate a password if needed
        if (!$servicePrincipalPassword){
            $servicePrincipalPassword=New-RandomPassword -passwordLength 40
        }
        # NOTE! We had a convertto-securestring here but as it turns out new-azurermadapplication doesn't take a securestring, only a string
        # NOTEUPDATE! AzureRM 5.0 and higher requires a securestring (yay!) This has been updated but notes left here for reference.
        $servicePrincipalPassword=ConvertTo-SecureString $servicePrincipalPassword -AsPlainText -Force
    }
    # we set this to NULL as a "valid" return as the appID already exists and we can't lookup the password from here 
    else {$servicePrincipalPassword=$null}

    # Create the App if it wasn't already
    if (!$appExists){
        $azureADApplication=New-AzureRmADApplication -DisplayName $servicePrincipalName -HomePage $homePage -IdentifierUris $identifierUri -Password $servicePrincipalPassword
        Write-Host "Azure AAD Application creation completed successfully"
    }
    # if it already exists we'll just redirect the variable
    else{$azureADApplication=$appExists}
    $appID=$azureADApplication.ApplicationId

    # Create new SPN if needed
    if (!$spnExists){
        $spn=New-AzureRmADServicePrincipal -ApplicationId $appId
        Write-Host "SPN creation completed successfully"
    }
    else{$spn=$spnExists}
    $spnNames=$spn.ServicePrincipalNames

    # Create object to store information. 
    $outputObject=New-Object -TypeName PSObject
    $outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalName -Value $servicePrincipalName
    $outputObject | Add-Member -MemberType NoteProperty -Name ClientID -Value $appID
    $outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalID -Value $spn.Id
    $outputObject | Add-Member -MemberType NoteProperty -Name SPNNames -Value $spnNames
    $outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalPassword -Value $servicePrincipalPassword
    if ($appExists -and $spnexists){$outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalAlreadyExists -Value $true}
    else {$outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalAlreadyExists -Value $false}

    return $outputObject
}
################################################################################

Discussion

    param(
        # The name for the service principal. 
        [string]$servicePrincipalName,
        # the password if you choose to specify it, otherwise the script will generate one for you. 
        [string]$servicePrincipalPassword
    )

This is where the input to the function is defined; as you'll see below the password is optional, but the servicePrincipalName is mandatory. I don't mark the parameter as mandatory in the parameter definition to facilitate interactive use of this script. Feel free to change add mandatory=$true if desired.

While it would be logical to use [securestring] for the servicePrincipalPassword, the PowerShell cmdlet we're going to use downstream only supports regular strings at this time.

A quick note on interactive vs. non-interactive scripts

While the goal of automation should be running tasks headless, thus fully non-interactive, there are some scenarios where facilitating both non-interactive and interactive running can be very useful. In some cases when acting as a consultant I will allow for full non-interactive running of automation scripts so the customer can walk through each option guided once to understand the context of the options available to them. At the end of script execution, I echo out what the equivalent command line would be to the console if it were run entirely headless. This scenario does require a bit more control flow logic, mainly using an additional parameter to specify we're in a headless scenario and error out quickly prior to execution when running in those scenarios. This also allows for use of code snippets (mainly functions) on a day-to-day basis as well as in dedicated automation framework.

    # Set the regex for the input validation on the SPN
    $SPNNamingStandard='^[--z]{5,40}$'
    Write-Host "Provisioning AzureAD App/Service Principal"
    Write-Warning "The account operating this script MUST have the role Subscription Admin or Owner in the desired subscription"
    $ErrorActionPreference = "Stop" # Error handing is not yet sufficient, try/catch the stuff below!
    if (!$servicePrincipalName){
        do {
            Write-Host ""
            Write-Host "SPN naming standard is (in RegEx): $SPNNamingStandard"
            $servicePrincipalName=Read-Host "Service Principal Name not specified on startup; Please enter desired name or type GUID and press enter for a guid based random name"
            if ($servicePrincipalName -eq "GUID"){
                $guid=([guid]::NewGuid()).toString()
                $servicePrincipalName="SPN-$guid"
            }
        } until ($servicePrincipalName -match $SPNNamingStandard)
    }

$SPNNamingStandard and the following logic is if you want to use a RegEx to ensure interactive input is validated. Change this expression to meet your needs or return the entire section if you don't want to facilitate interactive running. Limited time bonus offer, here's a link to my favorite regular expression evaluation site!

Note: As the script warns, the credential used to create the Application ID/Service Principal must have the role Subscription Admin or Owner to perform the provisioning actions.

    # handle command line specification of GUID 
    if ($servicePrincipalName -eq "GUID"){
        $guid=([guid]::NewGuid()).toString()
        $servicePrincipalName="SPN-$guid"
    }

This little trick allows the specification of "GUID" (i.e. Register-AzureServicePrincipal -$servicePrincipalName GUID) to tell our function to generate a GUID for the name. At the end you see I'm pre-pending a "SPN-" to the GUID to ensure the programmatically generated ApplicationIDs/SPNs stand out.

    # set URL and IdentifierUris
    $homePage = "http://" + $servicePrincipalName
    $identifierUri = $homePage

    Write-Host "Desired Service Principal Name is $servicePrincipalName `n"

Homepage and IdentifierURI settings for a service principal that we're using in the aforementioned capacity don't matter, but they do need to be set, so we base them on the SP name itself an move on.

Update 1/2018: AzureRM 5.0 cmdlets require a securestring for New-AzureRmADApplication whereas it was not supported previously.

    # Now we need to determine if 1> the Application exisists and 2> if it has been registered as a service principal. This will guide our execution through the end of the function. 
    $appExists=Get-AzureRmADApplication -DisplayNameStartWith $servicePrincipalName -ErrorAction SilentlyContinue
    # check for SPN only if app exists. SPN can't exist without app so no reason to check if not. 
    if ($appExists){$spnExists=Get-AzureRmADServicePrincipal | Where-Object {$_.ApplicationId -eq $appExists.ApplicationId} -ErrorAction SilentlyContinue}

    # we only need a password if the app hasn't been created yet.
    if (!$appExists){
        # Generate a password if needed
        if (!$servicePrincipalPassword){
            $servicePrincipalPassword=New-RandomPassword -passwordLength 40
        }
        # NOTE! We had a convertto-securestring here but as it turns out new-azurermadapplication doesn't take a securestring, only a string
        # NOTEUPDATE! AzureRM 5.0 and higher requires a securestring (yay!) This has been updated but notes left here for reference.
        $servicePrincipalPassword=ConvertTo-SecureString $servicePrincipalPassword -AsPlainText -Force
    }
    # we set this to NULL as a "valid" return as the appID already exists and we can't lookup the password from here 
    else {$servicePrincipalPassword=$null}

This code allows us to insert this function into a workstream regardless of if the Application ID and service principal already exist. If they do, we get all the information we can but set the password to $null since we can't look it up. As you'll see below we also add another noteproperty to inform the caller explicitly that the AppID/service principal already existed.

Note: This script block contains reference to another function that I have not provided, New-RandomPassword. You'll need to provide your own function that generates a password and call it here or specify the desired password when calling the function explicitly. Perhaps I'll write another post in the future to cover generating a random password in PowerShell.

    # Create the App if it wasn't already
    if (!$appExists){
        $azureADApplication=New-AzureRmADApplication -DisplayName $servicePrincipalName -HomePage $homePage -IdentifierUris $identifierUri -Password $servicePrincipalPassword
        Write-Host "Azure AAD Application creation completed successfully"
    }
    # if it already exists we'll just redirect the variable
    else{$azureADApplication=$appExists}
    $appID=$azureADApplication.ApplicationId

    # Create new SPN if needed
    if (!$spnExists){
        $spn=New-AzureRmADServicePrincipal -ApplicationId $appId
        Write-Host "SPN creation completed successfully"
    }
    else{$spn=$spnExists}
    $spnNames=$spn.ServicePrincipalNames

Now we do the actual creation of the Application ID and service principal if necessary. Notice that if they already existed we set the downstream variables to relay to the user. This section could be improved with additional error handling if desired.

    # Create object to store information. 
    $outputObject=New-Object -TypeName PSObject
    $outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalName -Value $servicePrincipalName
    $outputObject | Add-Member -MemberType NoteProperty -Name ClientID -Value $appID
    $outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalID -Value $spn.Id
    $outputObject | Add-Member -MemberType NoteProperty -Name SPNNames -Value $spnNames
    $outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalPassword -Value $servicePrincipalPassword
    if ($appExists -and $spnexists){$outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalAlreadyExists -Value $true}
    else {$outputObject | Add-Member -MemberType NoteProperty -Name ServicePrincipalAlreadyExists -Value $false}

    return $outputObject

Now we'll create our output object. This gives us everything we might want to know (and probably more) for our consuming processes. Here's what our object looks like:

$output.ServicePrincipalName=(Suprise!!!) The service principal name
$ouput.ClientID=The client ID (appID) is one of the critical pieces of info for downstream applications. This is what you'll specify when authenticating later (think of it as your user ID).
$output.ServicePrincipalID=The SP ID, though not used in any capacity directly that I've seen yet short of programmatically referencing it when deleting, etc.
$output.SPNNames=The reference names of the service principal. These would be used by third party apps, but in most cases I'm addressing with this article they'll go unused.
$output.ServicePrincipalPassword=Keep it secret! Keep it safe! This is a clear text copy of the password associated with this service principal. Obviously this is the other key piece of information you'll need as a takeaway. The prudent next step would be to check this into a Azure Keyvault or something similar, but that's for another article...
$output.ServicePrincipalAlreadyExists=$true or $false, this is also critical for downstream processing. If $true, you'll know that this is newly created and the password is contained in the object meaning you'll need to scrape and store/use that accordingly. If $false it means you can look up the ID by the name if needed, but you better have the password stored somewhere else as we can't look it up now. Either way you have two clear courses of action. While we could have relied on the password being $null, I added this property to definitively set it one way or the other to account for any unknown circumstances due to upstream changes down the road.

Note: Make sure you both store the password for the newly created service principal as you won't be able to retrieve it in the future. Also, make sure your session or variables are cleared after running as the password exists in clear text in memory.

Update/Note 2: By default, this password is only good for one year, and will expire after that time making it impossible to use the SPN. To manage the password on an existing object, you'll need to use the Get/Remove/New-AzureAdApplicationPasswordCredential cmdlets in the AzureAD module (not AzureRM).

Bringing it Home

Now that we've created the function, let's use it to create a principal and give it access to a resource group:

$spnOutput=Register-AzureServicePrincipal -servicePrincipalName <myPrincipalName> -servicePrincipalPassword <myPassword>
New-AzureRmRoleAssignment -ObjectId ($spnOutput.ServicePrincipalID.toString()) -RoleDefinitionName Contributor -ResourceGroupName <myResourceGroup>

This will use our function to create a service principal and give it contributor level access to <myResourceGroup>. If you have multiple environments in your subscription you should create a principal for each and restrict access to resource group(s) associated with each environment. I also recommend splitting production into a separate subscription; the entire reasons behind that are a story for another article...

Did it Work?

If you would like to manually check your work, you can find it in the Azure portal by navigating to the hamburger menu -> "More Services" -> "Azure Active Directory" -> "App Registrations".

There you should see your newly created app.

You can also check the role based access controls on your resource group or whatever object you applied the permissions to.

Only the chosen shall access Nachos Deathstar.

In Conclusion

The Azure model of auth/auth management is sound, but adherence to long standing security design principles requires a bit of effort. Hopefully this article will assist you in doing so. Please leave any comments/criticism/coffee donations below.

References

Microsoft: Application and service principal objects in Azure Active Directory (Great article on the differences between an AppID vs a service principal)

Microsoft: Role based access in Azure

Sunday, February 5, 2017

HyperV Live Migration Changes in Windows Server 2016

After upgrading my lab servers to Windows Server 2016, I had an “interesting” (ask a Minnesotan what that means) weekend troubleshooting Hyper-V Live Migration, finally finding that there has been a major change in the way virtual machine migration works, and a couple gotchas. In an effort to save others the same trouble, I’ll discuss them here.

Image From Polarstein on Flickr

Kerberos Constrained Delegation, 0x8009030E, and You(r Network Service Account)

“No credentials are available in the security package”, Event ID 20306. Under previous circumstances, this would have indicated that you didn’t have constrained delegation set up correctly as outlined in numerous other articles on the internet, but due to an underlying change the correct configuration is now different.

Previously, failover would be set up as outlined in articles such as this, with each HyperV host set up to allow constrained delegation over the Kerberos protocol only.

Starting in server 2016, the delegation must be set up to allow delegation over any protocol as displayed here:

The reason for this is that 2016 has changed the WMI provider used to a new version, which relies on WinRM to execute remote procedures rather than DCOM. WinRM, running as the Network Service, cannot access the Kerberos service ticket obtained to perform the action. By allowing any protocol, a “S4U” logon is sufficient to authenticate the request. While this setting is somewhat less secure, the point is made by the Team PM (published a few days ago, link below) that sensitive (privileged) accounts in any domain should have the “Account is sensitive and cannot be delegated” flag enabled to mitigate delegation risk.

NIC Teaming, 0x8007274C/0x80072741, and You(r Service Startup Problem)

This may impact 2012/R2 as well, though for some reason it only bit me on 2016. If using NIC teaming on your host for your failover network, the interface may not be available when the Virtual Machine Management Service (VMMS) attempts to start on bootup. This condition will result in the service not opening the port (6600) on the server, which makes it impossible to failover virtual machines. To fix this, change the service startup type from “Automatic” to “Automatic(Delayed Start)”. With PowerShell as our weapon of choice (hey nano server!) this is a two-step process:

Set-Service –Name vmms –StartupType "Automatic"
Set-ItemProperty -Path "Registry::HKLM\System\CurrentControlSet\Services\vmms" -Name "DelayedAutoStart" -Value 1 -Type DWORD

The service type should already be automatic, but we’ll re-assert that here to be sure. This will only delay service (and thus VM) startup by a small bit, but ensure that the adapter is available when it does.

EventID 21024, Failed at Migration Source, and You(r Crazy, Still Unexplained Error)

This is an odd one I can’t fully explain, but I’m including it in the hopes it may save others some time. On 2 of the 3 hosts, I had the following error preventing live migration after full 2016 setup:

Virtual machine migration operation for 'VMNAME' failed at migration source 'VMHOST'. (Virtual machine ID GUID-GOES-HERE)

This error message was not accompanied by any supporting information whatsoever. After numerous network captures and log combing, I found evidence of something slightly off with domain membership. In both cases the host was able to process group policy for the computer object, but never for any logged on users. This led me to attempt leaving and re-joining the domain, which in all cases remediated the problem. Note that when doing so you will need to delete the computer account prior to re-joining, then set up the constrained delegation as outlined above for each host again.

I wish I had more information about the root cause of this issue, but with it fixed I’m moving on.

In Closing

The upgrades to my lab didn’t go as smoothly as I would like, but I’m glad to have these issues out of the way to make for smoother efforts with production efforts. Hopefully this information will help you as well!

Additional References

Microsoft Virtualization Blog: Live Migration via Constrained Delegation with Kerberos in Windows Server 2016

Microsoft GTCS Romania EPS: Shared Nothing Migration Fails

Canberra PFE Team Blog: Kerberos Troubleshooting

Nyan Cat: 10 Hours 4k UHD For Endless Kerberos Packet Caps and Analysis!

Thursday, October 27, 2016

Tunnel to the Cloud: Azure Site to Site IPsec Connection

Do you have multi-continent datacenters with gobs of bandwidth, IOps, and processing capability? No? I can help get you part of the way there... a network presence in one.

A tunnel.

A Site to Site Connection?

It's easier to think of this as an extension to your network into another datacenter over the internet. Using IPsec we can provide a relatively (comments at the end) secure, direct connection between on on-premises datacenter and Azure hosted resources by encrypting the traffic that flows between the two. What do I mean by:

Secure: IPsec tunnels all your traffic so it is encrypted over the internet; in reality, this is really "more secure" rather than definitively secure, as the effective security depends highly on implementation specifics.
Direct: Your router (played by pfSense in this case) will recognize the Azure site as another routable network within the boundaries of your own, enabling you to talk to Azure resources as if they were in your own datacenter.

While I'll be using pfSense for the initiator side as it exposes the options in the most clear way I've found, this article will also be useful for non-pfSense devices since we discuss the details of the IPsec tunnel; the information here should be applicable to any IPsec solution. Update 1/2017: I've personally tested on various Cisco, Sonicwall, and pFsense equipment, and Microsoft has added some great documentation about overall device support here.

Note: this works for Amazon Web Services (AWS) as well but is slightly more complex. Fortunately pfSense includes a wizard that works, but takes a lot of the fun out of it as it strips you of understanding how it works. In addition the wizard is necessary because of how Amazon does VPC routing, whereas Azure is a bit more straightforward.

With that, let's get to it!

Pre-Requisites

pfSense firewall(s): The steps in this article were performed on a pair of HA SG-4860 firewalls running pfSense 2.32p1.
Microsoft Azure account with adequate permissions: We'll be performing our actions using the "new" portal based on Azure Resource Manager (ARM or AzureRM).
AzureRM PowerShell Cmdlets installed: On Win10/Server 2016 this can be accomplished with Install-Module AzureRM; for more info see this post.

Configure Azure IPSec Endpoint

Before we set up and initiate the connection from pfSense, we need to set up our endpoint in Azure. To do so, we'll create the following objects:

A Resource Group
A Public IP Address
A Virtual Network
A Gateway Subnet
A Virtual Network Gateway
A Local Network Gateway with a Connection

Resource Group

A Resource Group is a logical grouping of Azure Resources. This logical group allows for easy organization and clearer billing reports. We won't get too much into concepts and naming standards here other than to say groups should be logically tied with similar lifecycle expectancies and you should be consistent. For more information, see this Azure article.

Note: We'll be doing most of our steps in the web portal, but this whole process is much more efficient with PowerShell.

Let's go!

Open and log into the ARM Azure Portal at portal.azure.com; ensure you're working with the subscription you intend to use.
Navigate to "Resource Groups".
Click "Add" and type a name for your resource group, select the subscription, and resource group location. Note: The resource group location has no bearing on where you'll be connecting to as it's just a location the metadata is stored.
Click "Save".

Public IP Address

While we could do the IP at the time we make the Virtual Network Gateway, we'll take care of it now to ensure it's provisioned prior to getting to that step and to discuss the IP details.

Navigate to "Public IP addresses".
Click "Add" and populate the following:

Name: Select a name leveraging consistent naming standards.
IP address assignment: Select "Dynamic". I know what you're saying.. you're saying "but Toby, I'm not saying anything", and what I'm saying is it seems this should be static. Unfortunately if we make a static IP we'll be greeted later with the following:

Why Azure, WHY?!

Now I've been running a tunnel straight for almost a month thus far and my dynamic IP has not shifted on me; I suspect it will behave the same as IPs for other resources and stay static so long as it is used. If this does change, you'll need to change the info in the Phase 1 and 2 setup of the tunnel on the pfSense side as outlined below. For the record, as of the writing of this article the pricing of IPs in Azure is a bit odd; dynamic IPs and static IPs beyond the first 5 in any region are charged the same (pretty trivial), while the first 5 static in a region are free. See here for more info.
Idle Timeout: The default of 4 minutes should be fine here.
DNS Name Label: Optionally, specify a DNS alias here, though we will not reference it again in this guide as I'm not addressing DNS issues associated with IPsec at this time.
Subscription: Select your subscription.
Resource Group: Click the resource group we created in the last step.
Location: The IP is our IPsec target, so select a location close to your local network connection. The Azure Speed Test comes in quite handy here.

Click "Create". This provisioning will take a few minutes minutes as Azure re-arranges its SDN infrastructure to give you an IP.

What a successful deployment looks like!

Virtual Network/Subnet/Gateway Subnet

A "Virtual Network" is a network space within Azure that you can carve up and protect (firewall) to suit your needs. We're required to make one subnet, and we'll create our "gateway subnet" (landing point) as well. If this is your first foray into virtual networks on Azure you may want to take a step back and consider your design before proceeding. Oh, you're back already? Let's go.

Navigate to "Virtual Networks".

Click "Add" and supply the following:

Name: Type a name for your Virtual Network; you should follow the naming standards as discussed above.
Address Space: This is the overall space for your logical network within Azure. You can create more granular subnets within this space at any time, so erring on the side of a large subnet would be wise. If you're unsure, use 10.1.0.0/16.
Subnet Name: You're required to create one subnet within your virtual network off the bat. You need to name it here and ensure you use a consistent and meaningful naming standard.
Subnet Address Range: Specify a subnet range within your virtual network. This won't be used by our IPsec connection directly, but we will use it later as a target for testing. If unsure, use 10.1.10.0/24.
Subscription: Select your desired subscription.
Resource Group: Select "Use Existing" and select the resource group we created earlier.
Location: Select the same location used for the IP above.

After the Virtual Network has been created (use the refresh key if necessary), click it to navigate to the next pane, and then click "Subnets".

On the next pane, click "+ Gateway Subnet". And specify a subnet in "Address Range". This subnet needs to be different than the one we created earlier and should not be used for non-network resources, but rather as an ingress point to your Virtual Network. If unsure, use 10.1.0.0/24.

Virtual Network Gateway

The "Virtual Network Gateway" is our configuration element that facilitates the IPsec tunnel. Microsoft refers to this as a "VPN" gateway (as opposed to Express Route). There are three different VPN gateway SKUs; we'll be doing the "Standard" offering (of Basic, Standard, High-Performance). It's worth having a read about the differences here.

Navigate to "Virtual Network Gateways".

Click "Add" and supply the following:

Name: Again, follow consistent naming standards.
Gateway Type: Select "VPN".
VPN type: Select "Route-Based" (packets routed by routing table) in this case; it would be advisable to familiarize yourself with the difference between route and policy here. Note that policy requires IKEv1, so if you need to use it note the settings will be quite a bit different.
SKU: Update 1/2018: The SKU selection is now VpnGw<x> or Basic. Note you cannot change a basic VNG to the higher tier (VpnGwX) or vice versa at a later time. For more information see this article.
Virtual Network: Select the virtual network we created in the last step.
Public IP address: Select the IP address we created earlier.

Click "Create".

Note: This step may take up to 45 minutes to complete provisioning. I've tracked 8 of these and it's averaging almost 40 minutes per regardless of if you pre-provision the IP or not. You may want to consider skipping ahead to the pfSense section for a bit and coming back here.

Local Network Gateway/Connection

A Local Network Gateway is the specification of our local IP and networks you would like to route over the tunnel.

This actually works fine with a dynamic IP if that is your scenario, but we'll cover the details of that later.

Navigate to "Local Network Gateways".

Click "Add" and supply the following:

Name: Naming? Standards? Consistent? Yeah!
IP Address: Enter the public IP address of your device that will instantiate the tunnel.
Address Space: This is where you enter the CIDR notation of the local networks you would like to route over the tunnel... for example, if you would like to route 192.168.1.x over the tunnel, then enter "192.168.1.0/24"
Subscription: Enter your desired subscription.
Resource Group: Select our resource group we created above.
Location: For consistency, select the same location as you have selected above.
Update 1/2018: You can configure BGP settings here now as well, cool eh?

Click "Create".
After provisioning, (you may need to hit "refresh) click your newly created Local Network Gateway and click "Connections".

On the newly expanded pane, click "Add" and supply the following:

Name: You know the drill by now.
Connection type: This should be fixed to "Site to Site (IPsec)"
Virtual Network Gateway: Enter the Virtual Network Gateway we entered in the step above.
Shared Key: Specify a unique, randomly generated passphrase comprised of alphanumeric characters. Some devices have issues with special characters, hence the omission of. I recommend using at least 30 characters; since it has no impact on tunnel performance I personally use at least 60 characters for each key. You'll need to specify this key on your local side as well.
Subscription: This should be hard coded to the same subscription as the LNG.
Resource Group: This also should be locked to the same resource group as the LNG.
Location: Locked to that of the LNG.

Click "OK".

Configure pfSense

Now we'll set up the IPsec initiator connection on your pfSense firewall(s).

Phase 1 Setup

Login to the firewall and navigate to "VPN->IPsec"
Click "Add" and specify the following:

Key Exchange Version: Auto
Internet Protocol: IPv4
Interface: Select the WAN interface from which you would like to instantiate the connection
Remote Gateway: Enter the Azure public IP address created in the "Public IP Address" section above
Description: Whatever you would like; maybe troll your firewall team with a message here for fun times.
Authentication Method: Mutual PSK
Negotiation Mode: Main Note: Do not use "Aggressive" mode as the hash of the PSK is sent over the internet in clear text.
My Identifier: If the WAN interface selected above holds your public IP address, you can select "My IP Address". If that interface lies behind another edge device that holds the public IP, you'll need to select "IP Address" and specify your external IP.
Peer Identifier: Peer IP Address
Pre-Shared Key: Enter the same Pre-Shared Key used in the Azure connection specification above.
Encryption Algorithm: The strongest available in Azure is AES 256 bit, so preferably specify that. For more information on supported features in Azure, see the References section below.
Hashing Algorithm: The best we can do here is SHA256, so let's go with that.
DH Group: 2(1024 bit)
Lifetime (seconds): 10800
Disable Rekey: Unchecked
Responder Only: Unchecked
NAT Traversal: Auto (Even in NAT scenarios Auto usually works)
Dead Peer Detection: Checked
Delay: 10
Max Failures: 5

Click "Save" to return to the "VPN->IPsec" menu.
Since we don't want to use this yet, click "Disable" in front of the new tunnel definition and then "Apply Changes".

Phase 2 Setup

Under our newly created tunnel definition, click "Show Phase 2 Entries"
Click "Add P2" and supply the following information:

Disabled: Unchecked
Mode: Tunnel IPv4
Local Network: Select "Network" and specify the same network range(s) that you specified during the set up of the local network gateway on Azure using CIDR notation, i.e. 192.168.1.0/24. This specifies which local network(s) you would like to route through the tunnel.
NAT/BINAT translation: None ; Note: even in scenarios where your pfSense device is using NAT behind an upstream router, this should not be necessary. NAT-T will take care of that scenario.
Remote Network: Select "Network" and specify the same network range(s) that you specified during the set up of the target virtual network in Azure using CIDR notation, i.e. 10.1.0.0/16. This specifies the remote network(s) present in Azure.
Description: Put something here to help you remember what all this fun stuff is about.
Protocol: ESP
Encryption Algorithms: Check only "AES" and "256 bits".
Hash Algorithms: Unfortunately Azure only supports "SHA1" at this time. Update 1/2017: SHA256 supported now, use that!
PFS Key Group: Azure documentation states that PFS groups are only supported when Azure acts as responder, and in this case it is being set up as the initiator. Oddly, I've actually had luck specifying DH Group 14, but there is no guarantee that will work. I'm going to stick with it but for this by the book exercise you'll need to select "off". Note: Because of this setting and the prior Hash Algorithm setting, I do not consider this tunnel secure against state-level or similarly equipped actors. If that is a concern you may wish to investigate alternatives. In less extreme cases, however, this can be considered relatively secure. Update 1/2017: The compatibility has been improved here as well; match your Encryption/Auth with the right group using the table here.
Lifetime: 3600
Automatically ping host: blank

Click "Save".

Note: Depending on your configuration it may be necessary to navigate to "VPN->IPsec->Advanced Settings" and check "Enable Maximum MSS", then specify 1350. If you get packet loss with large packets this setting may be needed.

Firewall Rules

Now that our tunnel is set up we have to create local firewall rules that allow for traffic to pass. First we'll create a network alias for the Azure side network and then we'll make a rule to allow out Azure based traffic to pass here.

Navigate to "Firewall->Aliases"
Click "Add" and supply the following:

Name: Supply something that explains this network is to represent the Azure side of the tunnel; only alphanumeric and "_" are allowed.
Description: Enter full description here; there are no special character limitations.
Type: "Network(s)"
Network(s): Enter the CIDR notation of the network you created for your Virtual Network in Azure. If you followed the example addresses in this article, that would be 10.1.0.0/16.

Click "Save" and then "Apply Changes".
Navigate to "Firewall->Rules->IPsec".
Click "Add -^" and supply the following:

Action: "Pass"
Disable this rule: Unchecked
Interface: "IPsec"
Address Family: "IPv4"
Protocol: "Any" Discussion: Feel free to limit the traffic that goes through the tunnel if you like. In this example I'm allowing all traffic through.
Source: "Single host or alias" and then specify the Azure network alias you created in step 2.
Destination: This needs to be the local network(s) to which you would like to allow traffic. You can either do "network" with a CIDR notation or specify the entire network represented by an interface on the firewall. Note: if you have multiple networks you'll need a rule for each, so repeat the last couple steps for each.
Log: Unchecked; keep in mind that should you need to troubleshoot temporarily logging traffic using this rule can be very useful.
Description: It's a description, so let's do that!
No advanced options necessary unless you would like to do so.

Click "Save" and then "Apply Changes".

As long as you have a blanket egress traffic rule we should now be able to route traffic over the tunnel. If you do not I expect you are aware of how to make a more specific rule to suit your needs.

A Note on NAT-T and Upstream Routers

If your pfSense device is behind another upstream router, you may need some changes to facilitate the port switchover after initialization. If this matches your configuration, consider that you may need the following on the upstream router:

A firewall rule that allows UDP port 4500 into your pfSense device(s).
A NAT port mapping rule that forwards UDP port 4500 to your pfSense device(s).

Try without first; some devices are aware enough of the switch to 4500 to perform the transition without rules, but if it does not work consult the documentation for the device in question.

Enable and Test

There are several ways to test our connection; in this case I'll be pinging a VM host in Azure assigned to the same virtual network that this tunnel is connecting to. We won't go through the provisioning of that; should you need to refer to this basic guide and ensure you place the VM in your target virtual network and initially created subnet (10.1.10.x in the example above).

We're about to go into a tunnel... a long one.

Preparing Your Target

Ensure your VM is up and provisioned in the correct target virtual network.
Since you can't put anything in the gateway subnet (correctly) this would be a good opportunity to put the VM in the subnet you were forced to create when creating the virtual network in the first place. Check/change VM->Network interfaces->Details->Settings->IP Configurations
Get the private IP address from VM->Network interfaces. It should be 10.1.10.x if you're following the example addresses in this article.
Make sure your VM is pingable! If you have instituted Network Security Groups that would inhibit access you'll need to modify them, though this should work by default since we're tunneled in. Make sure the firewall on the VM allows for incoming ICMP requests as well; on Win 2012 and higher set-netfirewallrule -DisplayName "File and Printer Sharing (Echo Request - ICMPv4-In)" -Enabled True will take care of you.

Bring Up The Tunnel

In the pfSense interface, navigate to VPN->IPsec.
In front of our new tunnel, click "Enable" then "Apply" toward the top.
Check tunnel status under Status->IPsec. The tunnel should come up automatically in about a minute. If there is trouble you can check the Status->System Logs->IPsec section for more details.

Check Tunnel Status in Azure & Ping Dat VM!

For this portion we'll use PowerShell; ensure you have the Azure ARM cmdlets installed. If not, give install-module AzureRM a shot from an elevated PowerShell prompt.

Login to your account: Login-AzureRmAccount
Look at your subscriptions and grab the name of the target sub: Get-AzureRmSubscription
Change to your correct subscription: Select-AzureRmSubscription -SubscriptionName <subscription name>
Check the status: Get-AzureRmVirtualNetworkGatewayConnection -name <Local Gateway Connection> -ResourceGroupName <Name of Resourcegroup to which it belongs>
On the output pane, check the "ConnectionStatus" property. It should be "Connected".

The Get-AzureRmVirtualNetworkGatewayConnection has a series of other interesting properties as well, including EgressBytesTransferred and IngressBytesTransferred.

Now proceed to ping your VM by the private IP listed in Azure. As long as everything is configured correctly you should receive a response!

Cost

VPN Tunnels are subject to a costs from a few different categories:

VPN Gateway Pricing: This is an hourly cost incurred while the tunnel is available, not necessarily used. This means once it's provisioned you will incur charges at the hourly rate. As of the writing the standard performance level that we'll be using is billed at $0.19/hr in the US. If you have multiple Virtual Networks you will also be subject to a fee for outgoing traffic destined for another VNet. This rate depends on the zone and varies between $.035 and $.16 per GB. Data outbound to your site is charged at the standard data transfer rates (below) and inbound data is free. Update 1/2017: Updated pricing for the new tiers can be found here.
Data Transfer Rates: This depends on your level of utilization. The first 5GB of outgoing/month is free and the prices are set on a curve thereafter.
Virtual Network Pricing: VNets are free; you can have up to 50 VNets per subscription across all regions.
IP Addresses: You'll be using at least one IP address. The first 5 static in a given region are free, additional and dynamic are charged at a rate of $.004/hr.

Overall the cost of a "standard" class data tunnel each month for a single IP address, no additional support, and without including outgoing bandwidth, is about $140/month.

Note: Costs as of 10/26/2016, subject to change. Up to date pricing information is available here.

Dynamic IP? Changing Your IP Address

There is no reason that this IPsec tunnel will not work without a dnymic IP, but each time the IP changes you'll need to take a series of steps to restore tunnel functionality. These are:

In Azure, the local network gateway specifies your IP, change it under "Local network gateway-><NAME>->Configuration->IP address".
In pfSense, the Phase 1 tunnel definition under "My identifier" needs to reflect your current external IP address.
If there are any implications on upstream routers you'll need to handle that as well.

After taking care of these you will need to restart the tunnel. This could all be automated with PowerShell and SSH if you like, but I won't be covering that here.

Update 1/2017: FWIW, over the last year none of my clients have had their IP rotate for an active tunnel.

A Note on Effective Security

As mentioned earlier, this set up does have a couple security issues; the impact of which I would like to discuss briefly. Without an optimal security configuration, including the support of Perfect Forward Secrecy, this tunnel may not be strong enough to stand up to attacks of a state-sponsored actor over a long period of time. Because of that I cannot recommend this solution if your traffic may be subject to that level of attack, for example traffic facilitating substantial financial transaction activity. Update 1/2017: As noted above, this situation has improved with the support of PFS and SHA256 for authentication.

With that said, this tunnel is still (for better or worse) more secure than the configuration I have seen at many clients, and should be suitable for most traffic. It also performs very well; added latency between my modestly equipped pfSense devices and Azure is trivial.

For science!

References

Tuesday, April 19, 2016

Installing Chef on A Raspberry Pi 2/3

Introduction

So you’ve got 1,103 Raspberry Pis that you need to manage. Two things:

Why?
Wanna hang out Saturday? No? You’re busy managing all your mini computers manually? I can help you with that!

Scope

In this article we’ll cover installing and configuring the Chef version 12 client on a Raspberry Pi. This has been tested on Raspberry Pi versions 2 and 3; in theory it should work on a 1 as well, albeit slowly.

Assumptions

Raspberry Pi with Rasbian, Hypriot, or similar build with connections to interwebs
Chef server/org you would like to point clients to
Chef workstation capable of bootstrapping clients; if needed see this excellent article by Digital Ocean.

Execution

The main point of this article is really the installation of Ruby, which is the foundation on which Chef is based. Because the Ruby package in the Rasbian locations is out of date (2.1 as of this writing) we need to compile our own from source. Chef 12 requires Ruby 2.0 or greater, but Rack, which is installed with Chef, requires 2.2.2. UPDATE 7/9/2017: Ruby 2.4 or newer is now needed to continue successfully. Thanks Mike (from comments below)!

Step 1: Install Ruby

Clearly this should be scripted for optimal efficiency, but for learning purposes we’ll do it step by step to see exactly what is going on first hand. Log onto your Raspi via SSH and execute the following:

Most commands we’ll be executing require root, so let’s elevate our session:
```
sudo su
```
Where sudo = "super user do" and su ="super user"; using root privs to assume root identity.
Pull the newest package lists from configured repositories to ensure we get the newest packages:
```
apt-get update
```
Install pre-requisites for Ruby: gcc, make, and libssl-dev with their dependencies.
```
apt-get install gcc make libssl-dev
```
Note: On some distros, such as Raspbian, gcc and make are installed by default. It won't hurt to include the in the command line none the less, and including it here will cover most distros.
Download the Ruby source to the /usr/src directory:
```
cd /usr/src
wget https://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.5.tar.gz
```
Note: 2.2.5 is the newest 2.2 version at the time of this writing, but you should make sure there isn't a newer version avialable. Check the Ruby page here.
Extract the source & navigate to the directory (Make sure you update filenames/directory names for differing versions):
```
tar -xvzf ruby-2.2.5.tar.gz
cd ruby-2.2.5
```
Prepare to compile with configure, omitting unnecessary components:
```
./configure --enable-shared --disable-install-doc --disable-install-rdoc --disable-install-capi
```
Note: This will take between 2 and 10 minutes depending on which Pi and the speed of your SD card.
Compile it using make!
```
make -j4 ; make install
```
Note: "make -j4" will multi-thread the execution, using each of the Raspi's processors. This will take between 15 and 30 minutes depending on your Pi and SD card.

Step 2: Install Chef

Now we’ll use the gem install command to get Chef

Execute gem install chef as root (sudo if not).
```
gem install chef
```
Note: This will take between 5 and 25 minutes depending on which Pi, SD card, and network connection.
Relinquish root privileges as they are no longer needed. This should only exit the root session and not the SSH session itself. If you’re logged in directly as root ignore this, but don’t do that next time!
```
exit
```
Test the install to ensure it worked
```
chef-client --version
```

Step 3: Configure Chef

For this step move to your Chef workstation and logon using your account that is configured to manage your organization.

Use the knife command to boostrap the newly installed client
```
knife bootstrap rasdock02.truckchase.lan -N rasdock02.truckchase.lan -x {user} -P {password}
```
Where: {user} is a user on the target platform with root privs and {password} is the password for that account.

Note: It is normal to see errors on the first portion on the bootstrap since the Chef ARM client will not be found in the Chef repo, but the second phase should work utilizing the client we just installed.

That’s it! For further verification you can check against the Chef server using your workstation (knife client show {node name}) or even better yet, use Chef Manage if you have it available.

References

Chef Documentation

Ruby Documentation