Disaster Recovery for your Data Integration on AWS

businesscont

Data powers business. Therefore, it becomes imperative to prepare your data processing architecture for any disaster. In case of data integration, this means keeping your data sources, data integration infrastructure, ETL mappings and the logs immune to any disaster.

Let’s look at some of the ways in which you can make your data and the data integration infrastructure on Amazon Web Services (AWS) ready for any disaster in a cost-effective manner, meeting your recovery time objective*.

A typical big data customer uses the following services on AWS:

Data Integration Platform

Informatica Big Data Management (BDM)**

Hadoop Cluster

Amazon EMR: Managed hadoop service (Another popular option: Cloudera on EC2)

Data Storage

S3: Object-based Storage for storing files

RDS: Managed Relational Databases

Dynamo DB: NoSQL Database Service

Redshift: Managed peta-byte scale Data Warehouse

 

There are two steps in any disaster recovery plan – Prepare and Recover. Let’s look at each for a holistic data integration architecture using above services.

Prepare

In this step, create a backup strategy for each component of your architecture along with the frequency at which the backups should be taken. The services used should minimize the redundancy and in turn, costs, and allow you to recover your architecture within the acceptable time.

Informatica BDM

Informatica BDM needs an EC2 instance to host the Informatica server along with two database instances to host Informatica Domain and Model Repository Service (MRS).

To prepare for any disaster, install and configure the BDM server on Amazon’s Elastic Block Store (EBS) volume. Once the setup is complete, take a snapshot of the EBS volume.This snapshot is stored in S3, which is highly durable storage service from Amazon. Alternatively, you can also schedule snapshots using CloudWatch events (Scheduled Snapshots).

Use Amazon’s Relational Database Service (RDS) for the two database instances that BDM needs. Deploy the two instances in multi-availability zones (multi-AZ). Amazon will keep a read replica of the database in a nearby availability zone. Also, take the snapshots of the database instances to be able to migrate the instances from one region to another quickly.

Hadoop Cluster

Usually, use hadoop cluster only for large-scale distributed processing and not for data persistence on HDFS. Once the processing is done, move the data to S3, RDS, Redshift or any other service through Informatica workflows or custom scripts.

Also, use an external database (RDS) for hive metastore (External Hive Metastore). Thereafter, create snapshots and or enable multi-AZ deployment to prepare your hive metadata for any disaster.

Data Storage

Amazon, itself, takes care of preparing the data sources for disaster through automated backups and snapshots. Let’s look at the popular services:

S3 – Amazon takes care of durably storing your data on multiple devices and across multiple facilities in a region.

If you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. – Amazon Web Services

RDS – You can backup RDS in two ways – Automated Backups and DB Snapshots. With automated backups, RDS automatically performs daily snapshots and captures transaction logs as the database is updated. Use DB Snapshots to take point-in-time backups at your convenient time and frequency.

DynamoDB – Either prepare a backup in a secondary DynamoDB instance in another region or take point-in-time snapshot at regular intervals.

Redshift – Take point-in-time snapshot of the redshift data warehouse, just like you did for RDS and DynamoDB.

Recover

If you ‘prepared well’, you can easily recover every component of your architecture from any disaster or system failure. Let’s look at each of the components individually:

Informatica BDM

Launch a new EC2 instance from a pre-configured AMI (Amazon Machine Image) or your own AMI.

Create a new EBS volume from the latest available snapshot as created in the preparation phase above. Attach this volume to the newly launched EC2 instance or the desired pre-existing instance. Additionally, restore your database instances from the latest RDS snapshots, as taken in preparation phase.

Update the gateway node (More Info) to restore the Informatica domain to its last saved state.

Hadoop Cluster

Restore hive metastore db from the last snapshot available. Spin up a new EMR cluster and use the restored DB as the external database for hive metastore (External Hive Metastore).

Data Storage

S3 – Your objects in S3 should survive the disaster as S3 can easily handle at least two simultaneous site failures with a 99.999999999% durability.

RDS – Restore the database using the latest available snapshot.

DynamoDB – Restore the NoSQL database using the latest available snapshot.

Redshift – Restore the data warehouse using the latest available snapshot.

With this, the whole architecture is recovered to the last known state before the disaster.

dr-on-aws

Disaster Recovery – Informatica BDM on AWS

Smart businesses want and ensure business continuity. The key component of this is a ‘Smart Disaster Recovery Plan’ with key emphasis on ‘Prepare’. Smart preparation can provide the following advantages:

  • Business continuity by recovering from any region or system failure.
  • Ability to replicate the whole setup within minutes in a new geographic region.
  • Ability to debug issues by quickly replicating the production environment for QA/Dev.

 


* Recovery Time Objective (RTO): Acceptable time to recover from any disaster and bring the system back to an acceptable state. Typically 8-12 Hours for data integration jobs

** Disclaimer: I am a Big Data and Cloud Specialist (Professional Services) in Informatica in the Big Data Team. The views and opinions expressed here are my own (based on my experience) and do not necessarily reflect the official policy or position of Informatica LLC.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s