Cloud Outage – Recovery Management in Real-Time

Disasters as far as cloud computing are inevitable and so, as a result, downtime is a reality. It is not all bad news for cloud users, there is some good news as well. Cloud customers must perform certain things to make sure that they can survive an outage.

There is nothing constant in the world. Inevitable even the strongest systems may breakdown. But what can be avoided is the massive amount of downtime due to such events.

One such incident was the catastrophic failure of the Azure Cloud by Microsoft. Due to a violent thunderstorm, a major data center was knocked out. People most commonly refer to that incident as the day the Azure Cloud fell from the sky. During this incident, customers were offline for over two days making it one of the worst cloud-based catastrophe. The issue has been addressed by Microsoft in entirety but this incident is not something that the IT professionals will forget soon.

It’s not all bad news, here is some good news as well. You can survive any cloud outage by implementing disaster recovery and/or high-availability provisions with data replication in real-time. Customers who do this beforehand can expect little or no downtime with no data loss even during a major catastrophe.

We have come up with four options for disaster recovery and high availability in both the pure Azure configuration and the hybrid configuration. Here are the four options that can be used in such situations,

  1.  SQL Server Always On Availability Groups
  2.  The Azure Site Recovery (ASR) Service
  3.  SQL Server Failover Cluster Instances with Storage Spaces Direct
  4.  Third-party Failover Clustering Software

RPO and RTO 101

Before selecting any of the four options mentioned above, it is essential for all the users to have a basic understanding of the two metrics that are used to assess the effectiveness of both HA and DR provisions. These are the Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

The basic definition of RTO is the maximum tolerable duration of a cloud outage. E-commerce portals that have to provide online transactional service, generally have very low RTOs along with mission-critical processes that have an RTO of just a few seconds. On the other hand, RPO is the maximum period during which data loss can be tolerated. If a process has no tolerance to data loss then we can say that the RPO is zero.

As per the requirements, the best course of action as far as cloud outage management is concerned can be selected. All the four options, be it operating in a concert or separately can have different roles and tasks in ensuring DA and HR protection affordable and effective to an organization. A combination of the four processes can cater to the needs of an organization by minimizing the downtime and cost associated with it.

If you are looking for IT Support, Managed IT Services or Business IT Support in the Manhattan area then you can get in touch with us.

Back to Blog

Share:

Related Posts

How to Protect Your Computer From Viruses and Malware

Even the most diligent of computer users run the risk of picking…

Read More

How you benefit from IT Managed Services

There are several concrete benefits to be gained through outsourcing IT Managed…

Read More

Security Advantages that Managed IT Services Provide Businesses

Outsourcing the management of your network or remotely Managed Tech Services Manhattan,…

Read More