Rapidly Clone Whole Test/Dev/Lab Environments with vRA and NSX

Have you ever wanted an easy way to clone complex, multi-VM application environments for upgrade testing, code progression, or training?  I’ve sunk a lot of my professional life into helping build and refresh countless test/dev/build/stage environments. Even with some of the best scripting gurus around, I’ve found that these environments can still take a ton of time and coordination to build, as multiple teams must each install their pieces (OS, DB’s, middleware, apps, etc.) in sequence.

As such, I’ve looked into various cloning techniques over the years as a way to help speed up this process. But what always seemed to prevent cloning from being truly effective was the fact that I would still have to re-IP and rename the cloned VMs to prevent network conflicts. And that re-IP and re-name would inevitably break several of the applications’ components. In this article, I’ll show you how to overcome that hurdle and clone your application VMs, 100% intact, without changing any IP addresses or names. Better still, since we’ll be using vRealize Automation (vRA) and VMware NSX to automate the cloning process, your customers/users will be able to request these cloned environments whenever they need them via a self-service portal, and have them ready-to-go within minutes.

The way we’re going to accomplish this feat is by creating a blueprint in vRA with the following elements:

  • VM Templates for each application VM to be cloned.
  • An on-demand NAT network.
    • This will allow each instance of the cloned application VMs to retain their original IP addresses, without conflicting with one another. vRA will instruct NSX to spin up a new NAT network each time the blueprint is requested. It will consist of an NSX Logical Switch (VXLAN), and an NSX Edge (router) configured to perform the NAT’ing.
  • A jump box VM. We’ll use this to RDP into the NAT’d network.
  • Optional: a stand-alone AD/DNS domain controller.

Now let’s get to it!

Create your VM templates

1. Jump Box

Since each blueprint instance is going to have the same, overlapping IP Range, we can’t simply RDP into any VM on that network. When vRA creates the NSX Edge for our on-demand NAT network, it will setup a Source NAT rule which routes all the outgoing traffic from VMs in the blueprint instance.  However, it will not route any incoming traffic (unless we setup Many-To-Many NAT, but that’s a lot more complicated than this scenario requires). In order for us to access these VMs, we’ll need to include a Windows jump box in the blueprint that we can RDP into. This jump box will provide us with full network access to the other VMs in the blueprint instance.

So how do we get access to this jump box? While we could try to add a Destination NAT rule to pass incoming RDP traffic to the jump box, doing that in an automated fashion would require creating a custom vRO workflow. Instead, it’s actually much easier to configure our blueprint to connect the jump box to both the internal NAT network and an external (non-NAT) network. Then we can RDP to the jump box’s external interface. I’ll show you how to set this up a little later, when we create our blueprint.

For now, we just need to make sure we have a vanilla Windows VM template and corresponding Customization Spec in vCenter. I used a Windows 2012 template in my lab. We’ll use this to create the jump box in our blueprints. For bonus points, you can also add some of your favorite admin tools to the template, such as Putty, WinSCP, Notepad++, or PowerCLI.

2. Application VMs

With the Jump Box out of the way, let’s move on to something more interesting: the application VMs we want to clone. The idea here is that we’ll leave these VMs intact, with all their software fully loaded, all IP addresses intact, etc. We’ll simply clone or convert them to templates, and then include the templates in our blueprint. Side note: since we want to make sure that cloning does not wipe out IP addresses and whatnot, we will NOT be applying a Customization Spec like we normally would when cloning a template.

Sounds simple, eh? But like so many things in life, there are, of course, a few caveats to keep in mind:

  • Use a dedicated subnet for your application VMs.
    • Or at least on a subnet that doesn’t contain anything the VMs will need to talk to once the blueprint clones them and puts them behind NAT. Since each blueprint instance will have a NAT’d network that overlaps with the original IP range, the original subnet/network won’t be reachable from the NAT’d network. So ideally, we’d put the whole application deployment on its own dedicated subnet, and then make templates of all the VMs on that subnet. Make sense?
    • In my lab, I used as the subnet for my application VM templates.
  • AD-joined VMs will need a stand-alone AD domain and corresponding DNS zone.
    • If your application includes Windows VMs, cloning them for each blueprint instance will absolutely break their relationship/membership with Active Directory. Not to mention the fact that the NAT’ing will impede the network communication with AD. To avoid both of these issues, if you need to clone AD-joined VMs then it’s best to create a stand-alone AD Domain and Domain Controller on your dedicated subnet to include in your blueprint.
    • Also, since AD requires owning its own DNS zone, be sure to use a DNS zone for AD that isn’t already in use on your corporate network.
  • Clear any NSX Logical Switch memberships from VMs before converting them to templates.
    • If you have any Templates whose virtual network adapter was last connected to an NSX Logical Switch, vCenter will remember this and prevent you from deleting that Logical Switch in the future. This can be extremely frustrating when you are cleaning up old/unneeded Logical Switches, and vCenter is giving you cryptic error messages about a Logical Switch still being in use. Even though you can see that no VMs are attached to the switch, you won’t be able to see templates that might be attached. Save yourself this mystery and the associated frustration by making sure your VMs aren’t connected to an NSX Logical Switch before converting them to templates.
    • For example:
      Change this:   to this: .


When creating a stand-alone AD Domain Controller template for inclusion in a NAT blueprint, don’t forget to include DNS which is required for AD. For reference, here is how I setup Windows’ Roles for the AD-DNS template in my lab:

You can ignore the “File And Storage Services” role – I needed that for a different scenario that I was also testing in my lab. As I mentioned before, be sure to setup a DNS Zone for your Active Directory that does not conflict with an existing DNS Zone on your corporate network. That way, your AD-DNS server can be authoritative for its zone, but forward lookups for other zones out to a corporate DNS server. Here are a few quick steps you can follow to setup the DNS forwarder in your AD-DNS server: https://technet.microsoft.com/en-us/library/cc754941(v=ws.11).aspx.


Connect vRA to NSX

This blog article assumes that you already have working installations of vRA 7+ and NSX 6.2+ in your lab. If you still need to configure vRA to talk to NSX, here is a short little article that will walk you through it: http://virtualinsanity.com/index.php/2016/11/23/start-using-nsx-in-your-vra-blueprints/.


Define Your Networks in vRA

Now that vRA knows how to talk to NSX, we can specify which IP ranges vRA will use in its NAT blueprints. We’ll specify these by creating two new Network Profiles in vRA.

1. External Network Profile

The first will be an External (to NSX) Network Profile. It will refer to a Distributed Port Group and IP Range that already exists on your regular network, outside of NSX. We’ll use this External Network Profile for our blueprint’s Jump Box, which will not only have an interface on the on-demand NAT network, but also an External interface for us to RDP into. Additionally, each instance of our blueprint will spin up an on-demand NSX Edge to route its NAT’d traffic. That NSX edge will need an External interface, which will also come from this External Network Profile.

In vRA, browse to the Infrastructure tab > Reservations > Network Profiles, and create a new External Network Profile. Here is what mine looks like:

In retrospect, I think that including the word “NAT” in the profile name, “Jump Boxes and NAT,” might be a bit confusing. To clarify, I was referring to the external NAT interfaces, whose IP addresses will come from this profile. So perhaps a more accurate name would have been “Jump Boxes and External NAT interfaces.”

In the DNS/WINS section, I set Primary DNS to the IP address of the AD-DNS VM, which will be included in our example NAT blueprint.

2. NAT Network Profile

The second Network Profile will be for the network in which all the blueprint’s VMs will reside. It will define the range of IP addresses that will be duplicated behind NAT for each blueprint instance. The NAT Network Profile also needs to refer to the External Network Profile, so that the corresponding NSX Edge can obtain an External IP for routing the NAT’d traffic. Here’s what mine looks like:

For the NAT Type be sure to choose “One-to-Many,” which lets vRA know that it should create Source NAT rules within the NSX Edge that it spins up for each instance of the blueprint. For the gateway, I specified an address from the same subnet as its IP Range. When vRA creates the NSX edge, it will use this address for its internal NAT interface.

Also, in the DNS/WINS section, I set Primary DNS to the IP address of the AD-DNS VM, just like I did in the External Network Profile.  This will allow the cloned VMs within an instance of blueprint to resolve names for each other and Active Directory.

Here’s what you should see when you’ve finished adding both Network Profiles:

Finally, we need to update our Reservations to include our new External Network Profile:


Create your Blueprint

At this point, all of our prerequisite steps should be complete, and we’re ready to start creating our blueprint. Here’s what it will look like when we’re done:

  1. First, create a new Blueprint. In its Properties, click the NSX Settings tab and add your NSX Transport Zone:

  2. Drag-and-drop an On-Demand NAT Network onto the blueprint canvas:

  3. Configure it to reference our “Internal Lab Network” NAT Network Profile. It will auto-populate the rest of the info in the form, from the Network Profile:

  4. Drag-and-drop an Existing Network onto the canvas:

  5. Configure it to reference our “Jump Boxes and NAT” External Network Profile. It will auto-populate the rest of the info in the form, from the Network Profile:
  6. Next, drag-and-drop a vSphere Machine onto the canvas for our Jump Box. Since we’re using a vanilla Windows template for this VM, be sure to enter a Customization Spec. The Customization Spec is important because vRA will use it to inject all the network information into the OS:

  7. On the Network tab of the Jump Box, be sure that the “Jump Boxes and NAT” External Network is listed first, with ID 0. We do this to ensure that it’s gateway becomes the default gateway for Windows. If Windows’ default gateway isn’t the External Network, then we won’t be able to RDP into the Jump Box:

  8. Drag-and-drop another vSphere Machine onto the canvas for our AD-DNS VM. Since we want our AD-DNS VM to keep its existing IP address, we won’t provide a Customization Spec:

  9. On the Network tab of the AD-DNS VM, I went ahead and entered the original IP address. Since we didn’t enter a Customization Spec, vRA won’t actually be able to apply this IP address to the OS (its already baked into the AD-DNS template anyway). The purpose of entering the IP address here is so that vRA knows that it’s allocated, and won’t try to assign it to another VM that does use a Customization Spec (such as the Jump Box):

  10. Drag-and-drop additional vSphere Machines onto the canvas for all of your Application VM templates. Like AD-DNS, you’ll probably want them all to keep their existing IP addresses, so don’t provide Customization Specs:

  11. Also like AD-DNS, we’ll want to enter the IP addresses for the Application VMs, so that vRA knows their IP addresses are allocated, and doesn’t try to give them out to other VMs we might add to the blueprint later:

  12. If everything has gone well, our end-result should look like this:


A Few More Tricks Up Our Sleeve

Before we go ahead and request our blueprint, here are a few more tricks you might need, depending on your application’s requirements:

  • Keeping the original MAC Address.
    • Some applications generate configuration info or license keys based on MAC address. In which case, cloning the application requires the clones to retain the original MAC address. To do this, there are 2 custom properties you’ll need to add to the VM in your blueprint.
  • Adding an ISO-based appliance.
    • One use case for cloning test/dev/lab environments is to try out integrations with new VM appliances. Often, these appliances are provided as an ISO, which you install into a new VM. In order to do this, add a VM to your blueprint, and on its Build tab set Action to “Create” (instead of Clone). Then add the following custom properties to configure the VM and mount the ISO from a datastore. In this example, which was an OpenFiler appliance I was testing with my app, the datastore was called MoC-Management:
    • VMware.VirtualCenter.OperatingSystem is a required property for “Create” VMs.
    • The Image.ISO properties only work with “Create” VMs, not “Clone” VMs. For “Clone” VMs that need an ISO mounted, the easiest way is to add the ISO to the template. Additionally, vRO comes with “Add CD-ROM” and “Mount CD-ROM” workflows, which you could combine to create a vRA Day-2 Action for mounting ISOs on the fly.
    • If your blueprint customers/users will want to unmount the ISO after the installation is done, there is a vRO workflow called “Disconnect all detachable devices from a running virtual machine”, which is ready to be published as a vRA Day-2 Action.


Time for the Magic!

OK, let’s request the blueprint, and then take a look at what vRA creates for us!

First, let’s check out the provisioned blueprint instance in our Items tab.

As you can see, vRA created:

  • The 3 VMs from our blueprint.
  • An NSX Edge “Edge-LabFoundation-33f787c7…”
  • NSX Logical Switch “InternlLabNetwork-532bdc…”
  • A reference to the External Network.

If we go and look in NSX, we can see the Logical Switch that our blueprint just created, “InternlLabNetwork-532bdc…”

We can also see the edge that our blueprint just created, “Edge-LabFoundation-33f787c7…”

And if we drill into the edge, we can see the expected interfaces and NAT rules:




Despite all of this fantastic cloning wizardry, you’ll still need to use your old build process occasionally – even if only to create a source environment to clone. However, if you can switch to cloning for the majority of your organization’s test/dev/lab builds, you’ll end up saving yourself a TON of time and hassle!

On a side note, there is a related use case that we really didn’t dive into here: cloning whole vSphere environments. This can be awesome for vSphere admins who frequently need to test out their upgrades and migrations ahead of time, or test out new VMware-related products and scenarios. While you could clone such an environment using the steps in this article, the nested ESXi VMs will need some additional TLC that I didn’t cover here. My good friend and colleague, Jeremiah Megie, has just written a great article on the subject: Building a Nested ESXi Lab For Your Company. In an upcoming article, we’ll show you how to pull it all together with a blueprint for cloning whole vSphere environments, including the TLC for nested ESXi, and all the needed VMs (vCenter, ESXi, etx).

UPDATE!  Our “upcoming article” is now published: Build Your Very Own VMware Hands-On-Labs! (with vRA and NSX).