Even those that are not technically hands on or who work in the management layers will have heard of horror stories relating to data loss of some sort causing major outages. I’m sure you thought to yourself “Ooh I wouldn’t want to be an admin at the Heathrow airport right now” when all the planes of one of the major airports in Europe were grounded for over a day because of an IT issue…

Whether these are caused by disasters such as a fire, flooding, power outage or by human errors isn’t relevant at the end of the day. When it comes to disaster recovery and data loss, we should be 100% focused on returning the IT services to a running state and not on finger pointing. Getting it done is easier said than done but it can be made significantly simpler, faster and less error prone by preparing a disaster recovery plan and ensuring that data backups are available and safe at all the times following the 3-2-1 backup strategy.

Protect Your Data with BDRSuite

Cost-Effective Backup Solution for VMs, Servers, Endpoints, Cloud VMs & SaaS applications. Supports On-Premise, Remote, Hybrid and Cloud Backup, including Disaster Recovery, Ransomware Defense & more!

The explosion of data

It is no surprise that the last ten years are referred to as the data decade. The amount of data globally produced and stored around the world has completely exploded in the last decade or so. It’s sufficient to look at this trend that was in a study sponsored by Seagate from IDC. The 2018 study predicted 175 ZB to be generated every year by 2025 and it looks like we are right on track to make the mark.

Backup and Disaster Recovery

Part of this data increase is due to the growing number of Cloud services, cloud providers and how easy it is now to run workloads that generate data of some type. While cloud services are munching more and more market shares from on-premise environments with sovereign cloud initiatives, a large number of organizations and public entities will always require solid private SDDCs (Software Defined Data Centers). Meaning, data stored both on-premise and, in the cloud, must be protected equally. This is where BDRSuite can add value by backing up workloads in mixed environments.

Download Banner

Backups vs Disaster Recovery

While this will be a second nature for many that are immersed in infrastructure on a daily basis, the terms and concepts pertaining to data protection may not appear so obvious to those just getting in this space or those that deal with it indirectly. Hence, it is important to clearly define what is what to avoid any confusion and to be able to ask the right questions when the day comes.

What is Backups?

First of all, you are probably familiar with corporate backups (i.e. backups of the organization’s data). Backups provide a copy of your data that must be stored following the famous 3-2-1 rule with 3 different copies of your data, stored on 2 different media types and have at least 1 copy stored offsite (in a different site or in the cloud). This rule was actually recently extended to 3-2-1-1 following the increase in ransomware attacks, adding the requirement to have a copy of the data that is immutable.

Backup and Disaster Recovery

When something happens, it may be necessary to restore data from a previous backup. Various scenarios can trigger the need for a restore operation such as:

  • Corruption of a set of data rendering it unreadable
  • A Ransomware attack encrypted a disk or a datastore
  • Accidental deletion of a virtual machine by an administrator
  • Accidental deletion of a file by a user in a server
  • Failed upgrade or test of a software requiring to go back to an earlier version. (Although it can be argued that this should be done with snapshots)
  • … Hundreds more use cases – leave yours in the comments!

Speaking of snapshots, I wanted to point out, for those starting out, that they are not backups and should be kept for 72 hours at most. A common error that is made by beginners is to consider snapshots as backups and end up with massive snapshots which can cause performance issues and damage when deleting them.

What is Disaster recovery?

While backups let you restore your data or virtual machine to a previous state, it is usually related to data loss that occurred unrelated to the state of the SDDC which didn’t sustain any issue itself. Disaster recovery plans (DRP) refer to, as the name suggests, recovering the workloads to a running state following an outage (failover). When talking about outages, the common example used is the one where the whole datacenter is down. In other words, a worst-case scenario. However, there can be a number of reasons to trigger a disaster recovery plan:

  • Failed storage array which is replicated to a secondary site
  • Failure of a cluster or outage on part of the datacenter
  • Failure of the primary site’s ISP, resulting in external services being inaccessible
  • Planned migrations (Although this doesn’t really fit in the DR section, flexible BDR solutions like BDRSuite can be used to migrate workloads across datacenters in an orchestrated fashion)
  • … Hundreds more use cases – leave yours in the comments!

Because backups are often stored on slower storage with basic orchestration capabilities, recovering a large number of virtual machines following a datacenter outage would take a frustratingly long time and would be incredibly inefficient. For that reason, industry popular solutions like BDRSuite offer disaster recovery capabilities which let you recover your workloads in a record time, granted you were ready for it obviously.

Backup and Disaster Recovery

In order to speed up recovery time, the DR software replicates the VMs to a second site where they are stored in production grade storage and registered in the environment, ready to be spun up. That way, virtual machines can be started on the second site following an outage instead of having to copy the data from the backup repository to the production storage.

Now, it sounds way too simple when I put it this way. A number of hurdles exist along the way that need to be addressed such as:

  • IP subnets, this one is probably the trickiest one because you can’t have the same network gateway in two sites for a single subnet! Spanning VLANs is not recommended (please don’t do it!). Do you use SDN such as NSX-T to create overlays across sites and avoid re-IP? It simplifies DRPs but adds complexity in the environment. Do you bite the bullet and include re-IP in your disaster recovery plan? Much cheaper but how do you test that all your in-house applications support it?
  • How do you document it and how do you keep it up to date? There needs to be a solid procedure that can be executed by anyone in case the admins are off, have left or if it happens while you’re on duty on New Year’s Eve (the other worst case scenario)
  • Testing is “very” important. A disaster recovery plan (DRP) must be tested at least once a year (more if possible). This is to ensure that it will actually work the day you need it. Meaning you need to get in touch with internal and external clients as to how to process to not hinder the agreed SLA
  • Where do you replicate the workloads? There needs to be some sort of a secondary site where you can replicate the virtual machines. Do you replicate all virtual machines or only the most critical ones? How do you deal with “unused capacity” which can be seen as “sitting there doing nothing” by your management?
  • … Again, hundreds more use cases exist – leave yours in the comments please!

The importance of Backups and Disaster recovery

I only skimmed the very surface of the topic here as there literally are entire books on both subjects that you can find out there. Vembu actually has a decent library of content about it that you can find on vembu.com. The idea here isn’t to give you all the tools to start your own Disaster Recovery Plan (DRP) or draw the architecture of your backup infrastructure. The point is to get your foot in the door and get acquainted with both concepts and understand their importance in the context of the IT landscape.

Having or not having readily available backups and a disaster recovery plan can make or break a business. Companies have, unfortunately, gone under following an outage because they had no backups or because they were completely obsolete. It happens with small businesses where they hire a contractor to set up a simple backup infrastructure in the first place but no one is there to look after it following the initial setup (I’ve witnessed it first hand with a small customer). In which case, newly created VMs may not be backed up, credentials on a NAS expire, snapshots fail, you name it, there are hundreds of reasons why backups fail every day. While it may not be a full-time job in SMBs, it definitely needs some attention now and again to make sure everything runs accordingly.

The Ransomware threat

Building on the previous chapter, I also wanted to touch base on the topic of Ransomware. Unless you’ve been living under a rock for the past two years, you will know that Ransomware is a threat looming over everyone’s head these days. Testimonies of organizations being hit by Ransomware are increasing in numbers and bad actors are proving more and more resourceful as backup and security companies mitigate vulnerabilities.

Increasing security to protect a company against Ransomware attacks is great but there is only so much you can do about it. Unless you spend millions tightening every single bolt (which will make everyone’s work so much harder), you must accept to live with the eventuality of being hit as well (it doesn’t only happen to others). In which case, your last line of defense is the backups, which are also targeted by bad actors and must be protected at all costs. There are a number of recommendations to protect your backups and backup infrastructure against Ransomware such as immutable backups by BDRSuite.

Conclusion

I purposefully kept this article fairly high level (with the odd technical terms thrown in there) for everyone to grasp the concepts and the importance of it. While backup and DRP may seem like a chore to many admins, it can also be a very fun topic to work with. Architecting optimized VM replication based on offsite copy of array replication will push you to learn a lot about infrastructure technologies and about your own environment as a whole.
Once you realize the consequences of a disaster happening on the data stored in the SDDC or in your cloud providers, tying all the loose ends and generating reports following disaster recovery plans tests is incredibly gratifying and a great source for peace of mind when you are responsible for an organization’s IT environment.

Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.

Rate this post