A useful architecture to move data from on-premises to AWS is to consider using AWS S3 outputs and move data directly over a Direct Connect to S3 in AWS.
This could be useful to migrate historical data, seed data in the cloud, or move batch processed data.
We can use S3 outposts on premises and connect to an S3 raw bucket in AWS in a landing zone or secured VPC.
eg Batching
- Batch processing is commonly used when immediate real-time analytics results are not required
- Suitable for scenarios where data can be collected over time and processed in batches, such as aggregating daily sales data or generating reports
- Batch processing often involves processing data in large blocks or batches that have been collected over a specific time period
- It is a method of analyzing and manipulating significant volumes of data at once, rather than processing it in real-time or as it arrives
- There is a variantion in frequency can be daily, weekly or monthly or a combination of all 3
- To implement batch processing using AWS, we can use various services such as Amazon EMR, Amazon Redshift, and AWS Glue etc.