Bringing Sexy Back to Disaster Recovery

Always be prepared.
Failure to plan is planning to fail.
There are those who have and those who will.
So what’s your excuse for not having a Disaster Recovery plan in place — and exercising that plan periodically to ensure that you can recover as quickly as needed?
Summary article by David Weldon in FierceCIO, original article by Jack Bailey in Data Center Journal.
Emphasis in red added by me.
Brian Wood, VP Marketing

Downtime’s most damning moments of 2013

Downtime is one of the most destructive forces in IT, costing billions annually in lost time, productivity and sales. Increasingly, downtime is being caused by outside security threats, but there are several scenarios to worry about.
Two recent reports highlight the need for aggressive threat monitoring solutions and disaster recovery programs.
As noted in a recent Data Center Journal article, 91 percent of respondents to the 2013 Ponemon Institute study on downtime reported having had unplanned downtime in the last two years. Most troubling, says the report, “an estimated 30 percent of organizations that experience a severe outage never actually recover.”
The largest outages in 2013 were those reported by tech giants that aren’t going anywhere soon. But the impact of the outages was gigantic in proportion to lost service, lost consumer confidence and negative publicity.
“It is near impossible to determine the exact cost of downtime, since so much depends on the organization, the industry, the number of people impacted, etc.,” notes the security company NeverFail, in its recent research study, Downtime Report: Top Ten Outages in 2013.”
NeverFail shared its findings with FierceCIO, including the list of the top outages last year, the duration of the outage and the result to each organization.
Three criteria were used in the NeverFail list: the expansive reach of the outage, damage to the reputation of the organization and revenue lost.
According to the report, “2013 has seen some massive outages: and given our heavy reliance on technology today, there is more at stake than ever before. Outages affect not only internal users, but a company’s customers and partners–and impact revenue, credibility, trust, reputation and productivity.”
In ranked order, the Downtime Report cites the following as the top ten outages of 2013:
1. Microsoft’s Windows Azure
Date: October 30, 2013
Duration: Over 20 hours
Failure: A sub-component of the system failed worldwide
Impact: Every single Azure region was affected (including West U.S., West Europe, Southeast Asia, South Central U.S., North Europe, North Central U.S., East Asia and East U.S.)
2. Google
Date: August 16, 2013
Duration: less than 5 minutes
Failure: All of its services went down.
Fallout: The volume of global Internet traffic plunged by about 40 percent.
3. Amazon Web Services
Date: Sept. 13, 2013
Duration: Under 3 hours
Failure: Connectivity issues affected a single availability zone, disrupting a notable portion of Internet activity.
Reminder: If you rely heavily on the cloud for your infrastructure, have a failover plan.
Date: August 22, 2013
Duration: 3 hours
Failure: A software bug, followed by inadequate built-in redundancy capabilities, triggered a massive trading halt in the U.S.
Impact: With all the exchanges dependent on one another, this outage had impact rippling across the globe.
5. OTC Markets Group Inc.
Date: November 7, 2013
Duration: over 5 hours
Failure: A network failure due to a “lack of current quotation information,” prompted a complete shutdown in trading of over-the-counter stocks in the U.S.
Impact: The shutdown happened on one of the biggest trading sessions this year as Twitter Inc.’s shares debuted. While the disruption only paused less significant equities such as Fannie Mae and Freddie Mac, it tested investors’ nerves following a series of technical mishaps since August and exacerbated concerns about problems in the electronic infrastructure underpinning U.S. exchanges.
Date: October 27-28, 2013
Duration: 16+ hours
Failure: A service outage at a Verizon Terremark data center caused downtime for, the trouble-plagued online insurance marketplace created by the Affordable Care Act.
Impact: With all of America watching the progress of the trouble-plagued online insurance marketplace created by the Affordable Care Act, a data center outage only add more fuel to the flame and perhaps make the public question where to point the finger of blame.
Date: January 31, 2013
Duration: 49 minutes
Failure: Internal issues caused the home page to go down, displaying an error message.
Impact: The outage demonstrated the extremely high value of uptime to services such as Amazon. Analysts calculated that one hour of interrupted service may have translated to $5 million in lost revenue.
8. Microsoft – Hotmail And
Date: March 13, 2013
Duration: nearly 16 hours
Failure: A firmware update caused the company’s servers to overheat; Hotmail and both suffered a loss of service.
Impact: Microsoft admitted that it required some human intervention to bring the services back online, thus delaying the restoration attempt further. Microsoft’s online service reputation took a big hit.
9. Google Drive
Date: March 18-20, 2013
Duration: 17 hours total
Failure: A glitch in the company’s network control software, which caused latency and recovery problems. Users faced slow load times or full-on timeouts while trying to access their Drive documents and files.
Impact: As much as one-third of the customer base was impacted, leading to a virtual hue-and-cry across the Internet.
10. Google’s Gmail
Date: September 23, 2013
Duration: 12 hours
Failure: Prolonged slow download times were triggered by a dual network failure.
Impact: The outage affected 29 percent of users. For 1.5 percent of Gmail messages, the delay in downloading large attachments was up to two hours. While its impact may not have been catastrophic, the outage at Gmail is a potential cause for concern, especially as businesses are turning to Google and other providers to run cloud-based email and SaaS.
While our index measures through the end of November, the very recent Yahoo Mail outage deserves a considerable honorable mention.
11. Yahoo Mail
Date: December 9-13, 2013
Duration: almost 4 days
Failure: A specific hardware problem in one of the company’s storage systems caused the prolonged partial email outage for users.
Impact: The multiday email outage impacted countless individuals and the many small businesses that rely on the service. Not only did the outage cast a dark shadow over the once-mighty Internet player, but the company was also majorly criticized for the way it handled its damage control, particularly its negligence in informing its users about the problems.

Five Reasons Not to Cut Disaster Recovery from Your Budget

When was the last time you experienced unplanned downtime? How much did it affect your organization?
Ninety-one percent of respondents in a 2013 Ponemon Institute study reported experiencing unplanned downtime in the last two years. Although that’s not a shocking statistic to those in the cloud arena, what is alarming is the fact that an estimated 30 percent of organizations that experience a severe outage never actually recover.
Business leaders often think a disaster is something that happens to someone else. When most people think of disasters, they immediately think about hurricanes or earthquakes. But a disaster doesn’t have to be a naturally occurring one. It could be human error or a cyber attack. Therefore, it’s important to have a plan B to protect your mission-critical applications and data. It’s also important to know that it’s more than a loss in revenue or five minutes of downtime—an IT disaster can wreak havoc on your overall brand and customer loyalty.
Here are six reasons why not to cut disaster recovery (DR) from your budget.

1. Disaster Recovery Is Cost Effective

On average it takes organizations two days to recover from an IT disaster, according to a 2012 Ponemon Institute study on disaster recovery. The same study found that this duration equates to $366,363 in costs a year. There are some hidden costs to experiencing downtime, however, such as lost revenue and damage to the brand. For example, when a major airline’s reservation system goes down for eight hours straight, it leaves customers stranded, scrambling to make other arrangements and thinking twice the next time they book a flight.
Organizations that use disaster-recovery-as-a-service (DRaaS) providers reported cost savings as the leading benefit of using the public cloud for disaster recovery, according to a study by the Aberdeen Group—a research firm helping businesses understand the implications and results of technology deployments. You don’t have to worry about a large capital investment; you can trade that in by contracting DRaaS.
Costs to work with a DRaaS provider vary depending on how many virtual machines an organization needs to replicate and the size of the data. Costs can range from $60 to $120 per month per virtual machine and can vary depending on factors such as recovery-point objective (RPO), recovery-time objective (RTO), storage and so on.

2. DR Is Easy to Implement

Disaster recovery in the cloud is now more attainable for businesses of all sizes than it was five years ago. Before virtualization, disaster recovery would cost at least three times as much because an organization needed to have multiple data centers, specialized software and large network connections. To do this in the physical world is extremely costly. That’s why only the largest of enterprises were able to do it. Now, virtualization makes disaster recovery easier by encapsulating virtual machines (VMs) into a few files, making the data portable and in turn reducing costs.
DR solutions also give users the flexibility to take a look at their applications and define how they want them to be recovered. Do they want to protect the entire infrastructure? Do they want to protect just Tier 1 applications? Do they need a variable recovery time and variable recovery point from Tier 1 down to Tier 3 applications? Gone are the days of having to build a secondary site identical to a primary site and incur all the additional management costs and operational challenges.

3. DR Reduces Data Loss

The risk of business interruption, loss of business-critical data and the length of time to recover that data are three leading pressures driving organizations’ use of the public cloud for disaster recovery, according to a study by the Aberdeen Group. DRaaS users are able to recover three times faster and drive up the percentage of data they’re able to recover twofold.
Consider this example: Having your company in Delray Beach, Fla. is great…until a hurricane hits. That’s what happened to Fleet Lease Disposal when Hurricane Wilma struck in October 2005 and took down the company’s main office and primary data center. Although it successfully recovered and operated for four months from its backup office in New Jersey, four years later, Fleet Lease discovered that it lacked adequate hardware and bandwidth to support data recovery at its secondary site.
In working with a DRaaS provider, Fleet Lease Disposal was able to restore its IT infrastructure and data in less than one hour, save thousands of dollars and man-hours a year, and achieve an 18-month return on investment. With replication and highly available cloud-computing resources, the company now knows it can count on quick access to its applications and data—especially during hurricane season.

4. DR Restores Applications and Operations Quickly and Effectively

According to a survey conducted by the Aberdeen Group, businesses that use DRaaS reported faster recovery time from downtime incidents as the second leading benefit of relying on the public cloud.
Many believe backing up data on tapes is the equivalent of DR. Protecting data is important, but the ability to recover applications efficiently and quickly is essential in restoring operations. Having data on tapes or using data storage without virtual resources and the ability to easily test isn’t disaster recovery; it’s just off-site back up and not true business continuity.
For example, one major global biotechnology firm dramatically simplified its recovery process by closing one of its data centers and moving from an environment that employed multiple technologies to support the replication of systems and data to a single solution that automates the recovery of all applications.

5. Testing 1, 2, 3…

Businesses can put their DR plan to the test anytime throughout the year without bringing down production. It’s extremely important to ensure applications and data or IT environments come up on another site and no data is lost. Businesses can conduct planned or unplanned outages in the first few months of replication to ensure their DR plan works.
In a 2012 study, Forrester reported that only about half of companies conduct full tests once a year. Although organizations cite limited employee resources as the biggest stumbling block, cloud DR tests are now more automated and require less manual intervention.
When Verdande Technology needed a way to extend its infrastructure without making a significant investment, it turned to a DRaaS provider to provide a flexible, cost-effective means to extend its testing environment and eliminate the hardware and operational headaches at the same time.
“We have the flexibility and control we need,” said Peter Varlien, IT systems engineer at Verdande Technology. “We can remotely configure a test bed according to testing requirements, and our DRaaS provider bills us only for what we’ve used. We don’t have to worry about when, where or how we need to test anymore!”
The value of DR testing is to ensure all systems you want to replicate are recovered and that you have access to them. Once you are in the middle of an actual DR event, it’s too late.