Key management servers (KMS), VMware vCenter, vSphere & vSAN. Each of these play an important part of vSAN encryption. What are the KMS requirements, the host requirements, and so on? How is vSAN encryption set up and managed? How do normal tasks change (or do they)? Does encryption impact the performance of vSAN? Each of these items will be covered at length, so you can become a Captain of vSAN Encryption (wink, wink).
In vSAN 6.6 and vSAN 6.7, VMware introduced another option for native data-at-rest encryption - vSAN Encryption.
vSAN Encryption is the industry’s first native HCI encryption solution; it is built right into the vSAN software. With a couple of clicks, it can be enabled or disabled for all items on the vSAN datastore, with no additional steps.
Because it runs at the hypervisor level and not in the context of the virtual machine, it is virtual machine agnostic, like VM Encryption. While vSAN Encryption and VM Encryption meet similar requirements, they do so a bit differently, each with use cases they excel at. Most importantly, they provide customers choice for when deciding how to provide data-at-rest encryption for their vSphere workloads.
And because vSAN Encryption is hardware agnostic, there is no requirement to use specialized and more expensive Self-Encrypting Drives (SEDs), unlike the other HCI solutions that offer encryption.
When you enable encryption, vSAN encrypts everything in the vSAN datastore. All files are encrypted, so all virtual machines and their corresponding data are protected. Only administrators with encryption privileges can perform encryption and decryption tasks.
vSAN uses encryption keys as follows:
1. vCenter Server requests an AES-256 Key Encryption Key (KEK) from the KMS. vCenter Server stores only the ID of the KEK, but not the key itself.
2. The ESXi host encrypts disk data using the industry standard AES-256 XTS mode. Each disk has a different randomly generated Data Encryption Key (DEK).
3. Each ESXi host uses the KEK to encrypt its DEKs, and stores the encrypted DEKs on disk. The host does not store the KEK on disk. If a host reboots, it requests the KEK with the corresponding ID from the KMS. The host can then decrypt its DEKs as needed.
4. A host key is used to encrypt core dumps, not data. All hosts in the same cluster use the same host key. When collecting support bundles, a random key is generated to re-encrypt the core dumps. You can specify a password to encrypt the random key.
When a host reboots, it does not mount its disk groups until it receives the KEK. This process can take several minutes or longer to complete. This can be monitored under Physical disks > Software state health.
First question we need to tackle is “What is encryption anyway?”
Encryption is the process of transforming information in such a way that an unauthorized third party cannot read it; a trusted person can decrypt data and access it in its original form though. There are a lot of popular encryption/decryption methods, but the key to security is not a proprietary algorithm. The most important thing is keeping the encryption key (password) a secret so only trusted parties know it.
It is important to distinguish encoding from encryption. Encoding also transforms information, but it’s typically performed for the convenience of storage or transmission, not keeping secrets. Widely known encoding methods are Morse code and binary encoding for computer storage.
Now comes the second important question - why should we encrypt our data? Please allow me quickly run you through the three simple reasons why you should be interested in encrypting your data.
- Privacy - Encryption helps protect privacy by turning personal information into “for your eyes only” messages intended only for the parties that need them — and no one else. You should make sure that your emails are being sent over an encrypted connection, or that you are encrypting each message. Same thing applies to data; encryption allows you to secure the data stored on the datastore (VMs) from being access by an unauthorized party. This can help to secure your data even if the physical media (i.e. disks) get stolen or copied without permission.
- Security - Hackers aren’t just bored kids in a basement anymore. They’re big business, and in some cases, they’re multinational outfits. Large-scale data breaches that you may have heard about in the news demonstrate that people are out to steal personal information to fill their pockets. Again, encryption allows you to set an additional layer of anti-hacking protection which in today’s world of GDPR, Internet and user-awareness can be a “life or death” situation.
- Law & Regulations - Depending on the type of data you are processing or storing, there are certain government-level policies which more or less force you to invest in encryption. For example, healthcare providers are required by the Health Insurance Portability and Accountability Act (HIPAA) to implement security features that protect patients’ sensitive health information. Institutions of higher learning must take similar steps under the Family Education Rights and Privacy Act (FERPA), while retailers must contend with the Fair Credit Practices Act (FCPA) and similar laws. Encryption helps businesses stay compliant as well as helps protect the valuable data of their customers.
Quick KMS Summary
Key management refers to management of cryptographic keys in a cryptosystem. This includes dealing with the generation, exchange, storage, use, crypto-shredding (destruction) and replacement of keys. It includes cryptographic protocol design, key servers, user procedures, and other relevant protocols.
Key management concerns keys at the user level, either between users or systems. This is in contrast to key scheduling, which typically refers to the internal handling of keys within the operation of a cipher.
Successful key management is critical to the security of a cryptosystem. It is the more challenging side of cryptography in a sense that it involves aspects of social engineering such as system policy, user training, organizational and departmental interactions, and coordination between all of these elements, in contrast to pure mathematical practices that can be automated.
Cryptographic systems may use different types of keys, with some systems using more than one. These may include symmetric keys or asymmetric keys. In a symmetric key algorithm the keys involved are identical for both encrypting and decrypting a message. Keys must be chosen carefully, and distributed and stored securely. Asymmetric keys, also known as public keys, in contrast are two distinct keys that are mathematically linked. They are typically used together to communicate. Public key infrastructure (PKI) is the term most often used to describe the implementation of public key cryptography. PKI requires an organization to establish an infrastructure to create and manage public and private key pairs along with digital certificates.
The starting point in any certificate and private key management strategy is to create a comprehensive inventory of all certificates, their locations and responsible parties. This is not a trivial matter because certificates from a variety of sources are deployed in a variety of locations by different individuals and teams - it's simply not possible to rely on a list from a single certificate authority. Certificates that are not renewed and replaced before they expire can cause serious downtime and outages.
Once keys are inventoried, key management typically consists of three steps: exchange, storage and use.
- Key exchange - Prior to any secured communication, users must set up the details of the cryptography. In some instances this may require exchanging identical keys (in the case of a symmetric key system). In others it may require possessing the other party's public key. While public keys can be openly exchanged (their corresponding private key is kept secret), symmetric keys must be exchanged over a secure communication channel.
In more modern systems, a session key for a symmetric key algorithm is distributed encrypted by an asymmetric key algorithm. This approach avoids even the necessity for using a key exchange protocol like Diffie-Hellman key exchange.
Another method of key exchange involves encapsulating one key within another. Typically a master key is generated and exchanged using some secure method. Once the master key has been securely exchanged, it can then be used to securely exchange subsequent keys with ease. This technique is usually termed key wrap. A common technique uses block ciphers and cryptographic hash functions.
- Key storage - However distributed, keys must be stored securely to maintain communications security. Security is a big concern and hence there are various techniques in use to do so. Likely the most common is that an encryption application manages keys for the user and depends on an access password to control use of the key. Likewise, in the case of smartphone keyless access platforms, they keep all identifying door information off mobile phones and servers and encrypt all data, where just like low-tech keys, users give codes only to those they trust.
- Key use - The major issue is length of time a key is to be used, and therefore frequency of replacement. Because it increases any attacker's required effort, keys should be frequently changed. This also limits loss of information, as the number of stored encrypted messages which will become readable when a key is found will decrease as the frequency of key change increases. Historically, symmetric keys have been used for long periods in situations in which key exchange was very difficult or only possible intermittently. Ideally, the symmetric key should change with each message or interaction, so that only that message will become readable if the key is learned (e.g., stolen, cryptanalyzed, or social engineered).
A Key Management Server (KMS) cluster provides the keys that you can use to encrypt the vSAN datastore.
Before you can encrypt the vSAN datastore, you must set up a KMS cluster to support encryption. That task includes adding the KMS to vCenter Server and establishing trust with the KMS. vCenter Server provisions encryption keys from the KMS cluster.
The KMS must support the Key Management Interoperability Protocol (KMIP) 1.1 standard.
The biggest requirement of key management is availability. The analog we can use when talking about this is DNS - nobody runs their big production environment with a single DNS server (I hope!). You have multiple replicating DNS servers “just in case” something goes wrong. Maybe in your single site you’ll have at least two and maybe three or four. Maybe you even have a DNS server or two running in a cloud or have servers at another site, again, “just in case”. Why? Because if DNS is down then everything is essentially dead in the water. All roads lead to DNS, all roads can end if there’s no DNS.
The same holds true for key management. If the key management infrastructure is down, we can’t encrypt new VMs or re-key existing VMs! Even more importantly, we DON’T want that single point of failure. If you have just one KMS and something bad happens to it and you can’t recover the keys then you have some serious issues to attend to! There are no back doors to decrypt a VM. If you lose the keys, you’ve lost the data unless you’ve backed it up (that backup should also be encrypted, but that’s something for another day…).
This is why Key Management will become the next (critical) datacenter infrastructure requirement, just like DNS and NTP have become. This isn’t a “I need a KMS for vCenter” discussion as much as it is a “I need a KMS for the business” discussion.
After all, you don’t install DNS to make it easier to run just the datacenter. You run DNS because without it the business won’t run. Today you may only need a KMS for vSphere but going forward the business may need it for a whole host of things. Encrypted VMs on an encrypted vSAN might be your first need for a KMS but it won’t be your last.
Note how I didn’t list other requirements like HSM’s (Hardware Security Modules). vSphere is just a KMIP client. These functions are handled by the Key Manager you choose. If you want HSM’s then the Key Manager will talk to them and vSphere will talk to the KMS. Honestly, this lessens complexity while providing you the best choice to meet your needs.
Here are some good examples of KMS systems which are officially supported by VMware:
- Dell, CloudLink 6
- EntIT ESKM 5.0.6
- Fornetix Key Orchestrator 2.1
- Fujitsu ETERNUS SF KM v3.0
- IBM Security Key Lifecycle Manager v3.0
- IBM KMIP for VMware on IBM Cloud 1.0
- QuintessenceLabs qCrypt 200V 1.6.3
Consider these guidelines when working with vSAN encryption:
- Do not deploy your KMS server on the same vSAN datastore that you plan to encrypt.
- Encryption is CPU intensive. AES-NI significantly improves encryption performance. Enable AES-NI in your BIOS.
- The witness host in a stretched cluster does not participate in vSAN encryption. Only metadata is stored on the witness host.
- Establish a policy regarding core dumps. Core dumps are encrypted because they can contain sensitive information such as keys. If you decrypt a core dump, carefully handle its sensitive information. ESXi core dumps might contain keys for the ESXi host and for the data on it.
- Always use a password when you collect a vm-support bundle. You can specify the password when you generate the support bundle from the vSphere Client or using the vm-support command.
- The password recrypts core dumps that use internal keys to use keys that are based on the password. You can later use the password to decrypt any encrypted core dumps that might be included in the support bundle. Unencrypted core dumps or logs are not affected.
- The password that you specify during vm-support bundle creation is not persisted in vSphere components. You are responsible for keeping track of passwords for support bundles.
Setting up vSAN Encryption
Key Managers today are usually set up in a way that they replicate keys to one another. If I have three instances of a key manager, KMA, KMB and KMC, they replicate the keys between them. If I create a key on KMA it will show up in KMB & KMC at some point. You’ll have to check with your key manager product to see how quickly the replication happens.
Using the example above, in vCenter I would create a key manager cluster/alias. In my example I’ll call it “3KMS” and add KMA, KMB and KMC into that 3KMS cluster. I would then establish a trust with each of the key managers. There are multiple ways to establish trust and most of the time you’ll need to follow the way which is in your KMS system official recommendations.
Using the example above, if KMA is unavailable then vCenter will try KMB (next KMS in order). If KMA is up but the KMS service, for some reason, doesn’t respond, then vCenter will wait 60 seconds for a response. If after that 60 seconds there is no response, vCenter will try KMB. If KMA doesn’t respond (e.g. no IP connection at all) then vCenter won’t wait the 60 seconds and will try KMB immediately.
The maximum vCenter will wait for a KMS to respond is 60 seconds. This is not something that can be configured.
Before you can encrypt the vSAN datastore, you must set up a KMS cluster to support encryption. vCenter Server provisions encryption keys from the KMS cluster. The KMS must support the Key Management Interoperability Protocol (KMIP) 1.1 standard.
Setting up the encryption itself is quite easy can be summarized in these 7 steps:
Adding a KMS to vCenter Server
- Add Key Management Server (KMS) to your vCenter Server system from the vSphere Client
- Set the Default KMS Cluster
- Complete the Trust Setup
- Navigate to an existing cluster
- Click the Configure tab.
- Under vSAN, select Services and click the Encryption Edit button.
- On the vSAN Services dialog, enable Encryption, and select a KMS cluster. -If you generate a new KEK, all hosts in the vSAN cluster receive the new KEK from the KMS. Each host's DEK is re-encrypted with the new KEK.
- If you choose to re-encrypt all data using new keys, a new KEK and new DEKs are generated. A rolling disk reformat is required to re-encrypt data.
- If your vSAN cluster has limited resources, select the Allow Reduced Redundancy check box. If you allow reduced redundancy, your data might be at risk during the disk reformat operation.
Encryption vs Performance
When an I/O goes through the filter there are several actions that an application can take on each I/O, such as fail, pass, complete or defer it. The action taken will depend on the application’s use case, a replication application may defer I/O to another device, a caching application may already have a read request cached so it would complete the request instead of sending it on to the storage device. With encryption it would presumably defer the I/O to the encryption engine to be encrypted before it is written to its final destination storage device.
So there are definitely a few more steps that must be taken before encrypted data is written to disk, how will that impact performance? VMware did some testing and published a paper on the performance impact of using VM encryption. Performing encryption is mostly a CPU intensive as you have to do complicated math to encrypt data, the type of storage that I/O is written to plays a factor as well but not in the way you would think. With conventional spinning disk there is actually less performance impact from encryption compared to faster disk types like SSD’s and NVMe. The reason for this is that because data is written to disk faster the CPU has to work harder to keep up with the faster I/O throughput.
The configuration VMware tested with was running on Dell PowerEdge R720 servers with two 8-core CPU’s, 128GB memory and with both Intel SSD (36K IOPS Write/75K IOPS Read) – and Samsung NVMe (120K IOPS/Write750K IOPS Read) storage. Testing was done with Iometer using both sequential and random workloads. Below is a summary of the results:
- 512KB sequential write results for SSD – little impact on storage throughput and latency, significant impact on CPU - 512KB sequential read results for SSD – little impact on storage throughput and latency, significant impact on CPU - 4KB random write results for SSD – little impact on storage throughput and latency, medium impact on CPU - 4KB random read results for SSD – little impact on storage throughput and latency, medium impact on CPU - 512KB sequential write results for NVMe – significant impact on storage throughput and latency, significant impact on CPU - 512KB sequential read results for NVMe – significant impact on storage throughput and latency, significant impact on CPU - 4KB random write results for NVMe – significant impact on storage throughput and latency, medium impact on CPU - 4KB random read results for NVMe – significant impact on storage throughput and latency, medium impact on CPU
As you can see, there isn’t much impact on SSD throughput and latency but with the more typical 4KB workloads the CPU overhead is moderate (30-40%) with slightly more overhead on reads compared to writes. With NVMe storage there is a lot of impact to storage throughput and latency overall (60-70%) with moderate impact to CPU overhead (50%) with 4KB random workloads. The results varied a little bit based on the number of workers (vCPUs) available used.
They also tested with VSAN using a hybrid configuration consisting of 1 SSD & 4 10K drives, below is a summary of those results:
- 512KB sequential read results for vSAN – slight-small impact on storage throughput and latency, small-medium impact on CPU - 512KB sequential write results for vSAN – slight-small impact on storage throughput and latency, small-medium impact on CPU - 4KB sequential read results for vSAN – slight-small impact on storage throughput and latency, slight impact on CPU - 4KB sequential write results for vSAN – slight-small impact on storage throughput and latency, small-medium impact on CPU
The results with VSAN varied a bit based on the number of workers used, with less workers (1) there was only a slight impact, as you added more work workers there was more impact to throughput, latency and CPU overhead.
Overall though the impact was more reasonable at 10-20%.
Now your mileage will of course vary based on many factors such as your workload characteristics and hardware configurations but overall the faster your storage and the slower your CPUs and # of them the more performance penalty you can be expected to encounter. If you have a need for encryption and the extra security it provides it’s just the price you pay, how you use it is up to you. With whole VM’s being capable of slipping out of your data center over a wire or in someone’s pocket, encryption is invaluable protection for your sensitive data.
Is encryption something you'd like to setup on your vSAN? I can answer any questions about vSAN encryption or vSAN itself - get in touch @wilk_it_wizard or #sshguru