AWS Migration: Phase 3: Data Migration Phase

In this phase, enterprise architects should attempt to understand these important aspects within a migration:

  1. Usage of, and differences between, block, file and object storage within AWS.
  2. RDS Managed DB instances versus deploying my own PaaS on EC2.
  3. How do I stage, load and migrate my data to AWS in real time (need to use DMS)?

 

When building on the AWS platform there are trade-offs among various dimensions – cost, durability, query-ability, availability, latency, performance (response time), relational (SQL joins), size of object stored (large, small), accessibility, read heavy vs. write heavy, update frequency, cache-ability, consistency (strict, eventual) and transience (short-lived). Weigh your trade-offs carefully, and decide which ones are right for your application.

Data Storage options

 

Amazon S3 + CloudFront

Good for: Storing large write-once, read-many types of objects, Static Content, Distribution Media files, audio, video, images,

Use Cases: Backups, archives ,Versioning

Not Good for: Querying, Searching

Not used for: Database

 

EC2 Ephemeral Store:

Good for: Storing non-persistent objects, transient updates

Use Cases: Config data, scratch files, TempDb

Not good for: Storing database logs, customer data

Not used for:  Shared drives, sensitive data

 

Amazon RDS:

Good for: Relational data, Querying, Indexing, Structured data

Uses Cases:  Complex apps, Web apps, OLTP

Not good for: Clusters

Not used for: Clustered DB, Simple lookups

 

Amazon EBS or SSD:

Good for: Off-instance storage, persistences

Uses Cases:  Clusters, Boot data, Log, RDBMS data

Not good for: Static data, web facing content, key-value data

Not used for: Content Distribution

 

SimpleDB:

Good for: Light Weight Queries

Uses Cases:  Query, Indexing, Tagging, Meta-Data, Logs

Not good for: Complex joins, BLOBs, Relational data, Typed data

Not used for: OLTP, DW, OLAP

(Data Storage Options in AWS cloud)

 

We can also add Redshift a heavy DataWarehousing managed solution, to the above table.  Redshift is ideal as a platform basis for Business Intelligence, Predictive Analytics, or complex querying of large data sources.  It can also be used as a staging platform to clean, transform and move data into a SaaS data model such as Salesforce.com, when many different sources from different databases are imported, filtered, organized within Redshift and then exported to the target platform.

 

Migrate your Fileserver systems, Backups and Tape Drives to Amazon S3

 

Post-it Note: If your existing infrastructure consists of Fileservers, Log servers, Storage Area Networks (SANs) and systems that are backing up the data using tape drives on a periodic basis, you should consider storing this data in Amazon S3.

 

Existing applications can utilize Amazon S3 without a major change. If your system is generating data every day, the recommended migration flow is to point your “pipe” to Amazon S3 so that new data is stored in the cloud right away. Then, you can have an independent batch process to move old data to Amazon S3. Most enterprises take advantage of their existing encryption tools (256-bit AES for data at-rest, 128-bit SSL for data in-transit) to encrypt the data before storing it on Amazon S3.

 

Migrate your MySQL Databases to Amazon RDS

If you use a standard deployment of MySQL, moving to Amazon RDS will be a trivial task. Using all the standard tools, you will be able to move and restore all the data into an Amazon RDS DB instance. After you move the data to a DB instance, make sure you are monitoring the key metrics of usage and load. It is also highly recommended that you set your retention period so AWS can automatically create periodic backups.

 

Migrate your Commercial Databases to Amazon EC2 using Relational DB AMIs

If you require transactional semantics (commit, rollback) and are running an OLAP system, simply use traditional migration tools available with Oracle, MS SQL Server, DB2 and Informix. All of the major databases are available as Amazon Machine Images and are supported in the cloud by the vendors. Migrating your data from an on-premise installation to an Amazon EC2 cloud instance is no different than migrating data from one machine to another.

 

Move Large Amounts of Data using Amazon Import/Export Service

When transferring data across the Internet becomes cost or time prohibitive, you may want to consider the AWS Import/Export service With AWS Import/Export Service, you load your data on USB 2.0 or eSATA storage devices and ship them via a carrier to AWS. AWS then uploads the data into your designated buckets in Amazon S3.

For example, if you have multiple terabytes of log files that need to be analyzed, you can copy the files to a supported device and ship the device to AWS. AWS will restore all the log files in your designated bucket in Amazon S3, which can then be fetched by your cloud-hosted business intelligence application or Amazon Elastic MapReduce services for analysis.

If you have a 100TB Oracle database with 50GB of changes per day in your data center that you would like to migrate to AWS, you might consider taking a full backup of the database to disk then copying the backup to USB 2.0 devices and shipping them. Until you are ready to switch the production DBMS to AWS, you take differential backups. The full backup is restored by the import service and your incremental backups are transferred over the Internet and applied to the DB Instance in the cloud. Once the last incremental backup is applied, you can begin using the new database server.

1 thought on “AWS Migration: Phase 3: Data Migration Phase”

Comments are closed.