fbpx

Ninety-nine point what?

A Simpler Approach to Eliciting Availability Requirements

In all the years I’ve spent doing Requirements Analysis, there has been one area that I’ve always found particularly challenging:  Availability and Disaster Recovery. High Availability (HA) design is a bit of a dark art, whispered about at Business Continuity forums where back-room IT folk congregate to compare storage media, dungeons and dragons fortunes and moon tans. It’s not a topic for the faint hearted.

Given that this (rather quirky) audience is the one that has to implement the Non-Functional Requirements (NFRs) that you capture, and given that your business stakeholders are usually fairly senior (and not particularly interested in the optimum configuration of a load balancer), it’s probably of little surprise that facilitating Availability and DR workshops is a bit of a chore. In my experience I’ve often spent more time performing interventions than I have doing actual facilitation.

So, having just spent the last 12 months doing Non-Functional Requirements (NFRs) for a significant banking program, I figured that now would be a good time for me to reflect on what I’ve learned. In the series of blog posts to follow, I’m going to attempt to develop a model for communicating and capturing Availability / DR NFRs by:

  1. Providing a list of key Availability / DR requirements terms;
  2. Providing some information about solutions to Availability / DR requirements;
  3. Defining a set of User Personae describing the stakeholders in Availability / DR Requirements Analysis; and,
  4. Developing an interactive Visual Metaphor for capturing requirements.

We’ll kick off with the list of key Availability / DR requirements terms because you’ll need everyone to be on the same page when you’re talking about these things. The following section should come with a health warning: you’re at serious risk of excitement overload:

Availability

Usually expressed as a percentage (e.g. 99.5%), defines the design objective for up-time of the system. Availability requirements are realised through the use of redundancy in hardware, software and networking components and by the choice of hardware / software platforms.

Availability may be different during peak and off-peak periods and there may be maintenance periods during which system activity is not expected and maintenance is permitted.

Recovery Point Objective (RPO)

Defines the data point to which the solution can be recovered, from a time perspective. For instance, if the system went down at 2pm, an RPO of 1 hour would mean that the system could recover all data to the point 1 hour prior to failure, in this instance 1pm. This feeds into the backup strategy (e.g. snapshots of data to be taken hourly).

Recovery Time Objective (RTO)

Defines the target period for recovery from disaster including bringing the system back up and restoring data to the RPO.

Maximum Acceptable Outage (MAO)

Derived from an assessment of the impact to the business in financial, regulatory or reputational terms, this defines the maximum outage that the business can cope with. For mission critical systems this might be <6 hours whereas systems supporting non-critical processes may have longer MAOs.

The RTO is the target, whereas the MAO is the imperative. There is some debate in the BC community regarding the validity of the MAO (also known as the Maximum Tolerable Period of Disruption – MTPD).

Single Point of Failure (SPOF) / Single Point of Recovery (SPOR)

A Single Point of Failure is a component (hardware, software or network) of the solution that would bring the entire system down if it failed.

A System has a Single Point of Recovery if there is only one point that it can be recovered from in the event of failure. This might be a data point (e.g. if a rolling history of backups is not maintained), or a hardware/software component (e.g. a single hardware interface through which the recovery process can be started).

It’s worth noting here that Availability/DR Requirements are Design Objectives – they should not be considered the same as SLAs, although they may provide a starting point for SLA negotiation. Although infrastructure and software architects will design to an availability target, it is rare that the agreement will bind a 3rd party to meeting the design objective.

That’s probably enough excitement for now. In my next post I’ll start to explore some solutions to Availability/DR requirements, because in this space you kind of need to know where the road leads in order to correctly direct traffic.

It’s worth noting here that Availability/DR Requirements are Design Objectives – they should not be considered the same as SLAs, although they may provide a starting point for SLA negotiation. Although infrastructure and software architects will design to an availability target, it is rare that the agreement will bind a 3rd party to meeting the design objective.

That’s probably enough excitement for now. In my next post I’ll start to explore some solutions to Availability/DR requirements, because in this space you kind of need to know where the road leads in order to correctly direct traffic.

Leave a Reply

Your email address will not be published. Required fields are marked *