Skip to the content.

Deploy Vault HA in Azure - PART 2

20 May 2022

UPDATES:
-2022-05-28: add auto-unseal with keyvault alternative. Full code should be shared through public github repository in a few weeks.
-2022-07-22: add DNS config to use cluster without application gateway (see Conclusion). Also, removed the Azure AD App Registration part to publish it back with Azure AD auth (oidc) future article.

Table of Content

1- Introduction

Last time, we had a high level look at how we can leverage Azure to deploy a Vault HA cluster. We took a detour to detail few tips and feature we need to use in the project. In this part 2, we go to the deployment of the cluster itself.

Below is what we aim for:

Vault

2- Prerequisites

To be able to go through this demo, you need to have a few things:

*At time of writing, this project is tested with terraform 1.1.9 and azurerm provider 3.7.0 as well as azuread provider 2.22.0

Your settings

First, let’s setup your input variables and json template for the vm module. Pay attention to the comments on top of the certname variable.

Your variables.tf should look like this:

# Private DNS zone managed in your Azure Subscription
variable "privDomain" {
  default = "priv.mydomain.com"
}

# certificate object name in your keyvault
# This certificate MUST contain your cluster fqdn as well as your 3 nodes FQDN
# For example: 
# - vault.priv.mydomain.com, 
# - vault1.priv.mydomain.com,
# - vault2.priv.mydomain.com,
# - vault3.priv.mydomain.com
variable "certname" {
  default = "vaultpriv" 
}

# app roles for app registration
variable "appRoles" {
  default = [
    "User",
    "Admin"
  ]
}

# path to json file containing vms specs
variable "configFileName" {
  default = "vm.json"
}

# resource group name for your project
variable "rgName" {
  default = "vault"
}

# provide existing keyvault name
variable "keyVault" {
  default = "" 
}

# provide existing keyvault resource group name
variable "keyVaultRg" {
  default = ""       
}

# provide existing storage account name
variable "vmDiagSta" {
  default = "" 
}

# provide existing storage account resource group name 
variable "rgVmDiagSta" {
  default = ""          
}

# provide existing vnet name
variable "vnetName" {
  default = ""
}

# provide existing vnet resource group name
variable "vnetRg" {
  default = ""
}

Then since we use our reusable module, you need to prepare a json like below. Since the variable configFileNameabove is setup as vm.json you have to name it the same and leave it in the current folder. We use ubuntu but the bootstrap script should work with debian without too much tweaking.

As you ca see, we spread the 3 nodes on different availability zones to ensure better resilience.

{
  "vault1":{
    "size":"Standard_D2s_v3",
    "vmAdminName":"localadm",
    "publisher":"Canonical",
    "offer":"0001-com-ubuntu-server-focal",
    "sku":"20_04-lts-gen2",
    "version":"latest",
    "subnet":"coreHubSubnet",
    "zone":"1",
    "osDiskSize":"64",
    "disks":[
        {
          "lunId":"1",
          "size":"64"
        }
    ]
  },
  "vault2":{
    "size":"Standard_D2s_v3",
    "vmAdminName":"localadm",
    "publisher":"Canonical",
    "offer":"0001-com-ubuntu-server-focal",
    "sku":"20_04-lts-gen2",
    "version":"latest",
    "subnet":"coreHubSubnet",
    "zone":"2",
    "osDiskSize":"64",
    "disks":[
      {
        "lunId":"1",
        "size":"64"
      }
    ]
  },
  "vault3":{
    "size":"Standard_D2s_v3",
    "vmAdminName":"localadm",
    "publisher":"Canonical",
    "offer":"0001-com-ubuntu-server-focal",
    "sku":"20_04-lts-gen2",
    "version":"latest",
    "subnet":"coreHubSubnet",
    "zone":"3",
    "osDiskSize":"64",
    "disks":[
      {
        "lunId":"1",
        "size":"64"
      }
    ]
  }      
}

3- The main Infrastructure Code

Now that we have our input parameters, we can work on the main.tf file which will provide us with the deployment plan.

Data Sources

First, we have a few data sources to load.


data "azurerm_client_config" "current" {}
data "azurerm_key_vault" "kvCore" {
  name                = var.keyVault
  resource_group_name = var.keyVaultRg
}

Provision the VMs for the cluster

Next step is to provision the VMs. We’ll be using our module from my personal github repository.

# 3 VM will be created as specified in the `vm.json` file
module "vault" {
  source         = "github.com/nfrappart/azTerraVmLinuxAvZoneJsonPool?ref=v1.0.4"
  configFileName = var.configFileName
  rgName         = var.rgName
  keyVault       = var.keyVault
  keyVaultRg     = var.keyVaultRg
  vmDiagSta      = var.vmDiagSta
  rgVmDiagSta    = var.rgVmDiagSta
  vnetName       = var.vnetName
  vnetRg         = var.vnetRg
}

# Add access policy for each VM identity so they can access the certificate and the app reg client secret
resource "azurerm_key_vault_access_policy" "vaultAccess" {
  for_each     = module.vault.vmIdentity
  key_vault_id = data.azurerm_key_vault.kvCore.id
  tenant_id    = data.azurerm_client_config.current.tenant_id
  object_id    = each.value

  key_permissions = [
    "Get", "Decrypt", "List", "UnwrapKey", "WrapKey"
  ]

  secret_permissions = [
    "Get", "List",
  ]

  certificate_permissions = [
    "Get", "GetIssuers", "List", "ListIssuers",
  ]
}

That’s it for the compute. Very simple since we are reusing an already written module.

2022-05-28 UPDATE:
If you decide to go the auto-unseal road, you will have to provision a few more resources, see below.

# Create key for Vault
resource "azurerm_key_vault_key" "vault" {
  name         = "vaultunseal"
  key_vault_id = data.azurerm_key_vault.kvCore.id
  key_type     = "RSA"
  key_size     = 4096

  key_opts = [
    "unwrapKey",
    "wrapKey",
  ]
}

# add some permissions to access keys in keyvault
resource "azurerm_key_vault_access_policy" "vaultAccess" {
  for_each     = module.vault.vmIdentity
  key_vault_id = data.azurerm_key_vault.kvCore.id
  tenant_id    = data.azurerm_client_config.current.tenant_id
  object_id    = each.value

  key_permissions = [
    "Get","List",
    "WrapKey","UnwrapKey",
  ]

  secret_permissions = [
    "Get", "List",
  ]

  certificate_permissions = [
    "Get", "

Custom script extension

Since the whole purpose is to automate as much as possible the deployment, we will be using a custom script extension to install and bootstrap vault. There will remain manual task at the end of this demo but we can do a whole lot of thing with terraform.

resource "azurerm_virtual_machine_extension" "bootstrapvault" {
  for_each             = module.vault.vmId
  name                 = "vaultInstall"
  virtual_machine_id   = each.value
  publisher            = "Microsoft.Azure.Extensions"
  type                 = "CustomScript"
  type_handler_version = "2.0"

  settings = jsonencode({
    "script" = base64encode(templatefile("bootstrap.sh.tpl", {
      domain      = var.privDomain,
      node        = each.key,
      kv          = data.azurerm_key_vault.kvCore.name,
      certificate = var.certname
      }
    ))
  })
  depends_on = [
    azurerm_key_vault_access_policy.vaultAccess,
    module.vault
  ]
  
  lifecycle {
    ignore_changes = [
      settings # we add a lifecycle to avoid redeploying the extension in case we make changes to the bash script
    ]
  }
}

2022-05-28 UPDATE:
If you decide to go the auto-unseal road, pass a few more variables into the templatefile via the custom script extension.

resource "azurerm_virtual_machine_extension" "bootstrapvault" {
  for_each             = module.vault.vmId
  name                 = "vaultInstall"
  virtual_machine_id   = each.value
  publisher            = "Microsoft.Azure.Extensions"
  type                 = "CustomScript"
  type_handler_version = "2.0"

  settings = jsonencode({
    "script" = base64encode(templatefile("bootstrap.sh.tpl", {
      domain      = var.privDomain,
      node        = each.key,
      kv          = data.azurerm_key_vault.kvCore.name,
      vaultKey    = azurerm_key_vault_key.vault.name,
      tenantId   = data.azurerm_client_config.current.tenant_id,
      certificate = var.certname
      }
    ))
  })
  depends_on = [
    azurerm_key_vault_access_policy.vaultAccess,
    azurerm_key_vault_key.vault,
    module.vault,
  ]
  lifecycle {
    ignore_changes = [
      settings
    ]
  }
}

templated bash script

Once again, we try to stay as “DRY” as possible. So instead of writing a script for each vm, we use the native terraform function templatefile() with some variables (see settings bloc in previous section). You will need a file name bootstrap.sh.tpl. If you want to change the name, remember to modify the azurerm_virtual_machine_extension settings accordingly. You will however have to maintain the .tpl extension if I remember well.

So what does it do?

#!/bin/bash
#
# Bootstrap script to install vault
# Install Hashicorp repository for Vault
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install -y vault
sudo apt-get install -y jq

# Initialize datadisk
sudo parted /dev/sdc --script mklabel gpt mkpart xfspart xfs 0% 100%
sudo mkfs.xfs /dev/sdc1
sudo partprobe /dev/sdc1
sudo mkdir /vault
sudo mount /dev/sdc1 /vault
vault_uuid=$(blkid -o value -s UUID /dev/sdc1)
echo "UUID=$vault_uuid   /vault   xfs   defaults,nofail   1   2" | sudo tee -a /etc/fstab

# Create user without login for vault service
sudo adduser vault --shell=/bin/false --no-create-home --disabled-password --gecos GECOS

# Add config.hcl to destination folder
sudo tee -a /etc/vault.d/config.hcl > /dev/null <<EOT
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address  = "0.0.0.0:8201"
tls_cert_file = "/vault/config/vault.cer"
tls_key_file = "/vault/config/vault.key"
}
storage "raft" {
path = "/vault/raft"
node_id = "${node}"
}
api_addr = "https://vault.${domain}:8200"
cluster_addr = "https://${node}.${domain}:8201"
#seal "azurekeyvault" {
#}
ui = true
disable_mlock = true
EOT
sudo chmod 755 /etc/vault.d/config.hcl
sudo chown vault:vault /etc/vault.d/config.hcl

# add fqdn for localhost
echo "127.0.0.1 ${node}.${domain}" | sudo tee -a /etc/hosts > /dev/null

# Retrieve Managed Identity access token
azure_token=$(curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net' -H Metadata:true |  jq -r '.access_token')

# Add folder for storage backend
sudo mkdir /vault/raft

# Add folder for cert files
sudo mkdir /vault/config

sudo curl https://${kv}.vault.azure.net/secrets/${certificate}/?api-version=2016-10-01 -H "Authorization: Bearer $azure_token" | 
jq -r ".value" | base64 -d | openssl pkcs12 -nocerts -out ./vaultpkcs.key -nodes -passin pass:
openssl rsa -in vaultpkcs.key -out vault.key

sudo cp ./vault.key /vault/config/

sudo curl https://${kv}.vault.azure.net/secrets/${certificate}/?api-version=2016-10-01 -H "Authorization: Bearer $azure_token" | 
jq -r ".value" | base64 -d | openssl pkcs12 -nokeys -out ./vault.cer -passin pass:

sudo cp ./vault.cer /vault/config/

sudo chown -R vault /vault
sudo chgrp -R vault /vault
sudo chmod -R 755 /vault
sudo chmod 600 /vault/config/vault.key

# Create unit file for vault service
sudo tee -a /etc/systemd/system/vault.service > /dev/null <<EOT
[Unit]
Description=Vault
Documentation=https://www.vault.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/vault.d/config.hcl 

[Service]
User=vault
Group=vault
ProtectSystem=full
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
SecureBits=keep-caps
AmbientCapabilities=CAP_IPC_LOCK
Capabilities=CAP_IPC_LOCK+ep
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
NoNewPrivileges=yes
ExecStart=/usr/bin/vault server -config=/etc/vault.d/config.hcl
ExecReload=/bin/kill --signal HUP $MAINPID 
KillMode=process 
KillSignal=SIGINT 
Restart=on-failure 
RestartSec=5
TimeoutStopSec=30
StartLimitInterval=60
StartLimitBurst=3
LimitNOFILE=65536
LimitMEMLOCK=infinity

[Install]
WantedBy=multi-user.target
EOT

# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl start vault.service
sudo systemctl enable vault.service

2022-05-28 UPDATE:
If you decide to go the auto-unseal road, you will have to amend the above bootstrap script. Just modify the config.hcl creation part with the folowing.

sudo tee -a /etc/vault.d/config.hcl > /dev/null <<EOT
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address  = "0.0.0.0:8201"
tls_cert_file = "/vault/config/vault.cer"
tls_key_file = "/vault/config/vault.key"
}
storage "raft" {
path = "/vault/raft"
node_id = "${node}"
}
api_addr = "https://vault.${domain}:8200"
cluster_addr = "https://${node}.${domain}:8201"

seal "azurekeyvault" {
  tenant_id      = "${tenantId}"
  vault_name     = "${kv}"
  key_name       = "${vaultKey}"
}

ui = true
disable_mlock = true
EOT

4- Setup HA Cluster

Now that’s great but for now, you have 3 standalone Vault Servers, all of them sealed and uninitialized. If you didn’t change the json file you should have something like that: VMs

For the following steps there’s not workaround manual configuration. Reason #1 beaing that Vault will generate its master key and the key shards, as well as the root token.

You can either use an Azure Bastion if you have one, or the serial console on the VMs to save some bucks. To log into the VM, the default user is localadm as defined in the vm.json file, unless you changed it. As for the passwords, the module has register them in the provided keyvault:

Keyvault

Initialize master node

Once you’re in the first node, follow the below commands to initialize vault.

# setup required variables
myprivdomain="priv.mydomain.com"
export VAULT_ADDR=https://${HOSTNAME}.${myprivdomain}:8200

# initialize vault
vault operator init

Then, vault wil output something like this:

localadm@vault1:~$ vault operator init
Unseal Key 1: FTmncRQigxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 2: KmPQpirL7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 3: y+FsAIThJxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 4: Kz5CPZsdQxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unseal Key 5: mPJifrXwmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Initial Root Token: hvs.7JToxxxxxxxxxxxxxxxxxxxx

Vault initialized with 5 key shares and a key threshold of 3. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 3 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated root key. Without at least 3 keys to
reconstruct the root key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.
localadm@vault1:~$ 

These unseal keys and root token are extremely sensitive information. Actually, the root token is to be disabled after you finish setting up your users and policies, because the root policy allows you to basically destroy everything. So keep it away from bad people !

Now you can unseal the cluster. Because we didn’t specify otherwise, vault initialized with 5 keys and require 3 of them to unseal. You have to repeat the command below 3 times and provide a different unseal key each time. Again, keep your keys secured. You should distribute them to reliable team mates who will have to keep them secure. This way, as explained it the offical documentation, you need at least 3 people to unseal, or rekey Vault.

# Proceed to unseal
vault operator unseal

You can now check the status:

# Login with root token
vault login

# check status
vault status

# you should see something similar
localadm@vault1:~$ vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            5
Threshold               3
Version                 1.10.3
Storage Type            raft
Cluster Name            vault-cluster-dad214b8
Cluster ID              771ffcb1-47bc-16a2-25ea-ab3c6a68b71a
HA Enabled              true
HA Cluster              https://vault1.priv.ryzhom.com:8201
HA Mode                 active
Active Since            2022-05-20T08:29:50.912682857Z
Raft Committed Index    42
Raft Applied Index      42
localadm@vault1:~$ 

# then check the raft cluster
vault operator raft list-peers

# only one node will appear
Node      Address                        State       Voter
----      -------                        -----       -----
vault1    vault1.priv.ryzhom.com:8201    leader      true
localadm@vault1:~$ 

Join node 2 & 3 to raft cluster

Now connect to node #2, and setup your shell and joint first node in RAFT.

# setup required variables
myprivdomain="priv.ryzhom.com"
export VAULT_ADDR=https://${HOSTNAME}.${myprivdomain}:8200

# join raft
vault operator raft join https://vault1.${myprivdomain}:8200

Now unseal it with the unseal key from the master node. Just like node #1, you will be asked for 3 key shares

vault operator unseal

Now login and check raft

# login with root token
vault login

# Check raft, it should show both nodes
vault operator raft list-peers

Node      Address                        State       Voter
----      -------                        -----       -----
vault1    vault1.priv.ryzhom.com:8201    leader      true
vault2    vault2.priv.ryzhom.com:8201    follower    true
localadm@vault2:~$ 

Repeat these steps for node #3 and you’re done :)

5- Conclusion

You now have a zone redundant Vault HA cluster. The missing part is the Azure Application Gateway to act as load balancer. We will see this in another post. However, if you want, you can setup a DNS record vault.priv.mydomain.com in your Azure Private DNS zone with all 3 nodes IP and use DNS round robin instead.* Actually, Vault is designed to work like that. However this implies that the nodes are directly accessible from the client (either user or machine). If your network is segmented in a way that your vault subnet can’t be reached directly without a loadbalancer, then DNS round robin will be of no use to you. Check the official documentation.

This is getting exciting :). Next time, we will be able to start playing with vault internal. We will show you oidc auth method to leverage AzureAD for user access, but also azure auth to authenticate VM with their managed identity, and of course use the secrets engine to help you understand how to use Vault to secure your workload and application delivery.

*If you want to use DNS round robin, just add the following resource to your code.

resource "azurerm_private_dns_a_record" "vault" {
  name                = "vault"
  zone_name           = data.azurerm_private_dns_zone.dns.name
  resource_group_name = data.azurerm_private_dns_zone.dns.resource_group_name
  ttl                 = 300
  records             = [ for k,v in module.vault.vmName : v ]
}