In the current business environment, most of the organizations face challenges during data migration and real-time data replication. Maintaining consistent replicas of a database can increase its fault tolerance. Endeavor information systems are nowadays commonly structured as multi-tier architectures and invariably built on top of database management systems responsible for the storage. With such issues in hand, only partial replication of databases can be achieved.
The issues covered above occur especially when multiple databases are migrated as that may be obsered in scenarios such as moving data from on-premises to cloud heterogeneous database. In effect, it becomes a demanding task to handle the complexity during migration of heterogeneous database and difficult to deal with ongoing replication, as it also becomes a challenging task to use a cost-effective method to achieve this.
How to overcome challenges?
The challenges stated above can be overcome by leveraging AWS Database migration service (AWS DMS). AWS DMS is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. AWS DMS can be used to migrate data into the AWS Cloud, between on-premises instances, or between combinations of cloud and on-premises setups.
How AWS DMS helps address the challenges?
AWS Database Migration Service (AWS DMS) provides a non-intrusive, fast-to deploy method for moving data in real time with zero downtime. AWS DMS is a web service that can be used to migrate data from a source data store to a target data store as endpoint. Data can be migrated between source and target endpoints that use the same database engine, such as homogenous and heterogeneous database. The need to use AWS DMS arises when one of the endpoints is on AWS service (On-premise to cloud environment).
DMS supports both full load migration and ongoing real-time continuous replication. It captures the changes of source database and migrates the changes in a consistent way to the target. For instance, let Oracle be the source database for capturing the change data from source database. “LogMiner” is configured and enabled in source oracle database. AWS DMS reads ongoing changes from the redo logs based on the system change number (SCN).
How the Change Data Capture (CDC) is achieved by using LogMiner?
Initially, LogMiner is configured in source Oracle database and there are four main objects involved in LogMiner including source database, mining database, LogMiner dictionary and the redo log files that make CDC possible. LogMiner uses the dictionary to translate internal object identifiers and data types to object names and external data formats.
Without a dictionary, LogMiner returns internal object IDs and presents data as binary data. For generating log files that will be analyzed by LogMiner, it becomes essential to enable supplemental logging. When supplemental logging is enabled, additional information is captured in the redo stream that is needed to make the information in the redo log files. So whatever operations that have been carried out in source databases including insert, update, and delete, all operations get captured in redo logs.
How Security is enforced in database migration service?
AWS Database Migration Service (AWS DMS) uses several processes to secure data during migration. The service encrypts the storage used by the replication instance and the endpoint connection information using AWS Key Management Service (AWS KMS) key which is uniquely identified by AWS account. Secure Sockets Layer (SSL) is supported and it requires appropriate permissions if it gets accessed as AWS Identity and Access Management (IAM) user. The security features include sign-in credentials, Access management, Cluster Security groups, Cluster encryption, VPC, Load data encryption and SSL Connections.
The VPC based on the Amazon Virtual Private Cloud (Amazon VPC) service uses the replication instance which needs to be associated with a security group that has rules for allowing all traffic on all ports to leave (egress) the VPC. This approach allows communication from the replication instance to the source and target database endpoints.
In AWS DMS source end points, Oracle engine uses Secure Sockets Layer (SSL) technology to protect the credentials. It uses authentication and encryption to ensure that data is secure and available only to authorized users. Using SSL with AWS Database Migration Service, connections for source and target endpoints can be encrypted by using Secure Sockets Layer (SSL). To do so, AWS DMS Management Console can be used to assign a certificate to an endpoint as well as manage the certificates.
The security group (firewall) can be configured for the Amazon Redshift cluster to only allow connections from the public IP address range of On-premise oracle server. There is a need to allow public IP address of On-premise Oracle server to connect to the AWS Redshift. It is also possible to make Virtual Private Cloud (VPC) only accessible from On-premise server network.
In addition, connections need to be routed from VPC to on-premise datacenter and network connectivity must be enabled between the DMS instance and source and target database systems.
What are the efforts and cost involved?
The ability to use AWS DMS in all three scenarios such as data migration, replication and real-time data integration reduces licensing costs and the learning curve thereby reducing overall cost of ownership. It only entails simple efforts with minimal level of configuration to achieve this.
Why this option?
- 100% real-time replication can be achieved
- Low cost affair
- Minimal efforts to accomplish the said objective
- Zero-downtime and high performance
At Saksoft, we addressed immense challenges faced by a leading logistics company in the Asia-Pac region to achieve data migration and real-time replication leveraging cloud environment through our data migration solution. The mechanics involved fetching of data from on-premise Oracle server to be moved to AWS cloud using database migration service (AWS DMS) including real-time data replication with high availability and zero downtime.