Resiliency, High Availability in Cloud, some key principles

Resiliency in the Cloud is composed of many areas including:

  • Failover (instances, servers)
  • High Availability (multi-zones, regions)
  • Backup
  • Disaster Recovery (pilot light, cold restarts)
  • Monitoring and Telemetry (Observability)
  • Automation in the above
  • Security (zero trust, zero touch or limited touch, IAM, RBAC)

All of the above can be used in patterns, a good example of many Resiliency Patterns is offered by Microsoft.

A Centralised Model can be useful

The virtualisation and API integration offered by the Cloud, helps us build a Centralised Model of control, where an enterprise can bring all data related operations into one single platform. A centralised system will break down silos and integrate data management functions automatically without IT oversight. This convergence of management into one interface, no matter where data is located, makes the task of data protection, security and above all resiliency, incredibly less resource intensive.

As more organisations migrate their business processes to multiple-cloud environments, they must be able to protect all of their data within this centralised platform. A key aspect of resiliency and control is to set up a ‘Control Plane’ which can support one or more cloud platforms and deployments.  This will help in establishing ‘Data Planes’ and the data environments using company approved patterns around access, security, rate throttling, infrastructure setup, data management and monitoring.  A self-service functionality will also help IT teams delegate responsibility to certain application owners, while retaining centralised control.

While a multi-cloud control pane will provide central oversight, organisational data must be protected. As cyber threats and ransomware attacks continue to advance at an alarming rate, organisations increasingly need to not only prevent threats to their data but bounce back with minimal disruption when they eventually experience the inevitable attack.

A Zero Trust Approach is often deployed as a security pattern to ensure that different layers of the security stack work independently from each other. This mindset is about always assuming that layers above and below one another are already compromised. It’s about never trusting, and always verifying.

Zero trust is based on a verification process which should treat every access attempt as if it originates from an untrusted source. Access to an organisation’s network should only be granted after an identity has been authenticated through single sign-on (SSO) and/or multi-factor authentication (MFA). This is critical because it will help prevent bad actors, whether internal or external, from being able to gain access to an organisation’s network.

A data resiliency cloud is fully AutonomousAutomated capabilities can detect threats and keep a business’ protection environment up to date on security updates, security patches and best practices without depending on any people. This eliminates daily management and allows IT teams to focus on larger priority tasks that add greater value to the business. In addition, an autonomous engine is able to provide insights into unusual activity which helps businesses to rapidly detect, fight and mitigate internal and external threats.

In addition, a cloud-native solution that enables cyber and operational resilience backs up data regularly, while keeping it air-gapped and immutable, and always available to restore. It also helps organisations orchestrate their recovery from an attack with minimal downtime and disruption.

A true cloud experience provides organisations with a level of self-service. This means that when IT teams need to perform daily tasks such as creating a backup copy or running a restore, they do not need to wait for a central team and the task can be completed almost instantaneously. Of course, with self-service there should always be some form of central oversight to provide a safety net for the organisation and its data.

To summarise there are some important elements in providing a resilient and fail-over premised application environment.  These include:

  1. A Centralised governance model,
  2. Comprehensive Control Pane which may need to run across multiple cloud platforms,
  3. Zero-Trust Approach, which assumes a multi-layered security model,
  4. Fully Automated Operations,
  5. and a Cloud-Native Target model.

Source