DR is not the same as Backup
Backup is a point in time snapshot of a system with the operating system, all the applications, and files stored as they were at the time of the backup. Backups have come a long way in the last several years. Certainly they have added dedupe capabilities. EMC’s Data Domain device is a backup appliance with incredible dedupe and encryption algorithms built in. Veeam is a backup software that works with Hyper V or VMware hypervisor to take snapshots of the whole virtual machine. Yes, you can restore the whole VM or just individual files. The newest Veeam allows you to fail over to a VM backup image wherever you placed it.
However, it isn’t the same as Disaster Recovery. Yes, the newest backup software allows you to fail over in the time of emergency, but failing back to your original system is generally not easy or pretty. And depending on size of the VM and bandwidth, it could take hours or days to return to normal. In fact the fail over process is often fairly complicated, requiring quite a bit of setup and configuration in order to do the initial fail over. Failover may require identical hardware and hypervisor software at the failover point. This can get costly and limits your options since you can’t use cloud options in order to recover.
A true DR system can fail back easily and seamlessly, and is hardware and hypervisor agnostic. Since it works at the hypervisor level it can extract the Virtual Machine and allow you to recover in a cloud environment or a target virtual environment – even allowing you to transfer a vm from Microsoft HyperV to VMware.
You can use DR during the Day for important IT work
One of the things IT people should be doing regularly is actually testing backups and DR scenarios regularly. In fact, I’d go as far to say you are only as good as your last Backup or DR test. Typical backups are fairly difficult to recover from. Sometimes the Differential or Incramental backups have to be fused into the original backup before the backup image can be launched in order to recover a file or a whole VM. This can take time. You may need to do some creative work within your hypervisor in order to do a full VM recovery. Most often this process can take hours to perform.
A good DR solution can allow you to test a backup or a DR site during the day, without any impact on the users. It does it in a fraction of the time of a typical backup recovery. Since it handles all of the hypervisor work, the process is easy enough for low level admins to perform as their daily or weekly activities. They can even use the DR system to do file level restores, similar to what backups can typically do.
Testing Backups and DR is Easy. In fact it makes it simple to fail over to a test environment and use that test VM to try Windows Updates, or an upgrade. Since it is separate from the rest of the system, you can see if an upgrade or update causes errors, or the blue screen of death before you try it in the live environment. If an upgrade goes poorly, you can kill the test VM and start again and work with tech support to solve the issue before you try it in a live environment.
Fail Over is Fast
Traditional backup failover is slow. As we mentioned before, the way backups typically handle incrementals, it needs to piece the units together in order to bring the VM up. DR is engineered to turn up quickly. If you get hit with a virus, malware, or something more predictable like a hurricane, you can turn on the VM at multiple points in time, quickly. Although equipment and bandwidth are part of the speed equation, a typical DR fail over can happen in a minute or two. Failover in minutes creates high availability.
DR must be done at the Hypervisor Layer
The two main hypervisors are VMware and Microsoft HyperV. Most backup software work below the hypervisor layer. For example, Veeam and Vizioncore use snapshots but work under the hypervisor. Double-take and InMage use Guest software that run on top of the VM’s themselves. EMC, Netapp, IBM, HP and others use enterprise level array based replication which are proprietary. All of these options make doing DR difficult, time consuming, and adds complexity. The more complexity there is, the less likely IT will take the time to test backups and/or DR. In order to recover, you have to have a like environment, meaning your hyper visor needs to be set up just right, and your hardware often needs to be identical to the source hardware. In the case of hardware replication you are stuck using the same hardware manufacturer and equipment. When dealing with DR scenarios, this can cost far more in both time and money, and poses a real challenge to recovery.
A plethora of other issues can occur when you have inflexible architectures. What if you merge with another company? Or let’s say you open a small office and you don’t need to go out and buy a whole EMC VNX in order to support a mere 10 people, just so you can interface with your replication solution.
Another key reason to do the DR at the hypervisor layer is that you can separate the VM from the hypervisor. This allows you to move the VM to cloud solutions, or to another infrastructure that might use a different set of hardware, or even a different hypervisor altogether.
The Cloud Grid DR software allows you to replicate at the hypervisor level and gives you flexibility at the target recovery point, both on the Virtual software and on the Hardware it sits on. And since it is cloud aware, it can also recover to cloud infrastructures – lowering IT complexity and infrastructure costs.
Fail Back is just as important as Fail Over
DR without FailBack is not DR at all. Ok, maybe that is a bit of a stretch. But, let’s look at the scenario. You have a hurricane coming or a virus hit one of your main servers. So, you need to turn up the VM in your alternate datacenter. Once the issue passes, your users are now adding new changed data to your system. If you intend to keep that valuable data, you need to fail back this new VM back to your original server set after the hurricane or disaster is gone. This is where DR shines brighter than any Backup solution on the planet. Although a fail back typically takes longer than a ready-to-go fail over, a DR solution can use block changes to make sure the fail back is fully in synch before it transfers the services of the live server to the original location.
DR does not have to be expensive
When I considered DR solutions years ago, without the cloud solutions, and hypervisor aware software, the cost of DR was considerable. However, now DR can be done at a fraction of the cost. In fact, considering the Symantec solution, I found this new DR solution is around 35% of the price.
It is easy to calculate the value of highly available disaster recovery solutions by just multiplying the hourly rate of each employee by the total number of employees. This gives you an hourly cost of downtime. Although insurance may exist for some downtime, often policies are difficult to collect from and have a threshhold of hours/days you have to pass before a payment is given. Avoiding that loss with proper DR design is a cost that can pay for itself with only one incident.
DR can Recover nearly any application on nearly any operating system
If it works on VMware or on HyperV, it will work fine in a DR scenario. Almost all new apps and most older apps run fine in a Virtual environment. Some things may have to be configured in order to make sure email or voice calls are flowing over to the new infrastructure. This is a question of ROUTING. That is part of DR but should be looked at as a separate piece of the puzzle. Take for instance, if you have a line of business application such as Microsoft Exchange. We know that it will run fine in a DR scenario just as it does in the live scenario. However, the routing must be set up so that when services move to the new environment the mail will flow (noticing that Server A is down, it will move gracefully over to the DR situation Server B). This is part of the initial design and implementation of the DR system. However, once it is set up, a transfer of services to a DR during an event is simple.
Some things can be challenges. For example, if you have a VM that requires a locally attached dongle (typically for licensing), this can pose a challenge during failover unless you can get an identical dongle for the target side. And testing needs to be done on each and every server and application to see how the server would react in a DR scenario. But it is safe to say that most everything will run just fine.
DR works Hybrid – in your Org, Your Datacenter, or in the Cloud
One of the drawbacks of DR in the past has been that you had to get duplicate everything in order to get the DR scenario to work. Sometimes this meant that you were forced to get a datacenter which can be costly. The new DR solution is flexible. You can deploy it in your organization, across to your own datacenter, or you could rent space in the cloud environment in order to deploy your DR solution. This changes the game completely. No longer are you forced to relegate DR to non-maintained old hardware gear that you mothballed when you upgraded your server room equipment. Now you can use low cost cloud services to host that data so that you have high speed resources when you need it, as you need it.
DR is hardware agnostic
The new DR does not care what type of hardware you are running at the core. If you have Dell EqualLogic SAN’s and Dell servers at your office server room, you can easily replicate your DR over to EMC sans and Cisco servers at your favorite cloud provider.
Many backup solutions are not the same here. Even though they may have bare metal restore options, often the capacity must be the same and other hardware requirements must be met before the restore can work effectively. DR is quickly becoming a way to outstrip what traditional and even modern backups are capable of. Although I might not recommend dumping your backup solution, a DR solution is definitely something you should not go without.