- Date published:
- Author:AIS Team
Our experts have answered these questions to
help demystify Disaster Recovery
Expand each item in the list below by clicking the arrow to the left of the question
[expand title=”What is disaster recovery?”]
Disaster recovery is a set of policies, tools, guidelines, procedures, people, and technology that come together to ensure the availability of IT services in accordance with the requirements of the business. These are activities that are done pre, during, and and/or post-disaster to eliminate or reduce the effects of a [simple_tooltip content=’Examples: Hazardous material spills, infrastructure failure (i.e. UPS failure or loss of utility provided power), bio-terrorism, critical IT bugs, or failed change procedures.’]man-made disasters[/simple_tooltip] or [simple_tooltip content=’Examples: Avalanche, Earthquake, Flood, or Hurricane.’]natural disasters[/simple_tooltip].[/expand]
[expand title=”Why is IT disaster recovery important?”]
Disaster recovery is important because it allows different [simple_tooltip content=’Line of Business’]LoB[/simple_tooltip] applications or services to continue to operate within established [simple_tooltip content=’Recovery Point Objective’]RPO[/simple_tooltip] and [simple_tooltip content=’Recovery Time Objective’]RTO[/simple_tooltip] metrics.[/expand]
[expand title=”What should a disaster recovery plan include?”]
[simple_tooltip content=’Disaster Recovery Plans’]DRPs[/simple_tooltip] protect organizations in the event of an IT service outage as well as a partial/full data loss. The objective of a DRP is to have pre-created policies and procedures that will be followed to recover as gracefully as possible from various disaster scenarios.
There are three different approaches to disasters:
- Preventative Measures:
Attempts to mitigate or prevent disaster events from happening. Examples of this include deploying infrastructure with multiple redundancies, distributed geographies, perforance of routine hardware inspection (rounds), off-site backup of data, etc.
- Detective Measures
Efforts to discover the presence of any unwanted anomalies within the IT infrastructure. These routine checks or probes are used to uncover new or unknown potential threats before they become larger issues. Examples of this are anti-malware agents, proactive monitoring with trend analysis, employee training sessions, fire alarms, etc.
- Corrective Measures:
Actions taken to restore an environment, or attempt to mitigate negative repercussions, after disaster strikes. These could be in the form of detailed guides on how to restore old copies of data or rebuilding servers. These measures may even include contracting/contacting public relations firms or securing/executing proper insurance policies.
DRPs should be easy to follow and be updated frequently to keep up with ever-changing business requirements. More importantly, disaster scenarios should be regularly acted out through [simple_tooltip content=’Regular DR testing to ensure backups are done properly and restoration is completed successfully’]validation[/simple_tooltip] as a backup is not a backup until its properly restored. There is no set format to a disaster recovery plan as no two organizations are the same. With that in mind, all plans should include clear outline of the different stakeholders or actors, communication and escalation paths, clear step by step instructions or flow charts, identified [simple_tooltip content=’Line of Business’]LoB[/simple_tooltip] applications, criticality, [simple_tooltip content=’Recovery Point Objective’]RPO[/simple_tooltip]/[simple_tooltip content=’Return to Operations’]RTO[/simple_tooltip], and interdependencies.[/expand]
[expand title=”What is the importance of failback in disaster recovery?”]
Whether the choice is made due to location, resources, size, convenience, etc. – IT departments take good care in choosing the primary site for their IT resources. Therefore, after a successful failover to a secondary site, graceful restoration of services back at the primary site will be equally as important.
Multiple solutions will cover the initial failover easily, as conditions are assumed to be ideal at the destination site. However, failback, or restoration of services at the primary site sometimes proves difficult and turns out to be a highly manual process.[/expand]
[expand title=”What are some of the best practices for disaster recovery?”]
Properly identify and align disaster recovery policies with the needs of the business. It’s paramount to understand these requirements before any technical solutions or approaches are discussed. These requirements are commonly unearthed by performing a [simple_tooltip content=’Business Impact Analysis’]BIA[/simple_tooltip] and a [simple_tooltip content=’Threat Risk Analysis ‘]TRA[/simple_tooltip]. A BIA will identify urgent vs non-urgent functions and activities and allow the business to assign an [simple_tooltip content=’Recovery Point Objective’]RPO[/simple_tooltip] and [simple_tooltip content=’Recovery Time Objective’]RTO[/simple_tooltip] metrics to each [simple_tooltip content=’Line of Business’]LoB[/simple_tooltip] application. The TRA will then identify a list of potential threats to each [simple_tooltip content=’Line of Business’]LoB[/simple_tooltip] application (hardware failure, power loss, data corruption, software update, etc.).
With this data in-hand, business stakeholders possess the necessary information to start to build a DR plan that will address the the identified threats, while providing the identified RPO/RTO identified by the BIA.
It is at this time when IT personnel can initiate the search for technical solutions that will provide the uptime dictated by needs of the business.[/expand]
[expand title=”What are the advantages of Disaster Recovery as a Service?”]
It is very common for IT environments to operate with a thin budget. This is especially true for companies where their IT department acts as a cost-center, where most capital or operational expenditures are just enough to keep the “lights on”. Additionally, a well-oiled IT department with few to no outages lacks the “pain” factor that, in some cases, drives businesses to increase Disaster Recovery spend. This dries up IT budget and could potentially lead to increased risk of failure due to decreased manpower, know-how, aging equipment, etc.
Conversely, in businesses where IT acts as a profit-center, IT departments tend to over-invest in “beefing-up” primary environments to increase efficiency, transaction speed, and/or application feature set, which in turn drive down costs, increase customer satisfaction, etc.
At the end of the day, it is very common for Disaster Recovery resources to be last on the list of budgeted items. Deploying a mirror of the primary environment at a geographically diverse site is expensive and hard to justify. Especially when all it may end up doing is collecting dust if disaster never strikes.
Selecting a [simple_tooltip content=’Disaster Recovery as a Service’]DRaaS[/simple_tooltip] partner provides a much lower TCO profile due to economies of scale and pay-as-you-go billing models. DRaaS companies will build, maintain, support, and update the DR resources so they are ready for the client when they are needed. Additionally, a DRaaS partner may have more experience, approaches, architectures, tips, and overall solutions to help achieve the [simple_tooltip content=’Recovery Point Objectives’]RPOs[/simple_tooltip]/[simple_tooltip content=’Recovery Time Objectives’]RTOs[/simple_tooltip] required by the business.[/expand]
[expand title=”Which is the best tool for IT disaster recovery and back up?”]
At this point there is no “best tool” that covers all customer requirements, workloads, and scenarios. However, based on the needs of the business, AIS leverages best in breed technologies like Zerto, Veeam and others along with frameworks to provide desired application uptime. Many times, the most optimal approach is to leverage different tools for different application tiers.[/expand]
[expand title=”What is high availability?”]
High availability allows for a certain service, component, or application to operate continuously. Normally, this is achieved by operating redundant components simultaneously in an [simple_tooltip content=’Concurrently active nodes where traffic from the failed node is passed to another node or load balanced across remaining nodes.’]active/active[/simple_tooltip] or [simple_tooltip content=’Fully redundant nodes where the failover instance is brought online only when its primary counterpart has failed.’]active/passive[/simple_tooltip] manner.[/expand]
[expand title=”Why do companies store data in a data center?“]
IT equipment is environmentally sensitive and ideal conditions are necessary to extend service life of equipment and provide infrastructure for continuous operation. Data Centers are purposefully designed from the ground up to optimally host equipment. Building, maintaining, and running a data center is a very expensive proposition as power, cooling, network, and physical security need to be delivered in a reliable and resilient manner in a facility designed to provide these services.
To achieve this at an affordable price, economies of scale must be reached. Data center operators take on these tremendous capital expenses and provide affordable options as monthly service fees. This enables companies to expand their resources when necessary. If a company chose to build a data center on its own, the firm would either have to invest the capital necessary to anticipate growth or undertake additional data center builds when capacity is exhausted.[/expand]
For more about our DR services, visit our Disaster Recovery page.