Posted August 8, 2024Aug 8 Modern business applications rely on timely and accurate data with increasing demand for real-time analytics. There is a growing need for efficient and scalable data storage solutions. Data at times is stored in different datasets and needs to be consolidated before meaningful and complete insights can be drawn from the datasets. This is where replication tools help move the data from its source to the target systems in real time and transform it as necessary to help businesses with consolidation. In this post, we provide a step-by-step guide for installing and configuring Oracle GoldenGate for streaming data from relational databases to Amazon Simple Storage Service (Amazon S3) for real-time analytics using the Oracle GoldenGate S3 handler. Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. It captures and applies transactional changes in real time, minimizing latency and keeping target systems synchronized with source databases. It supports data transformation, allowing modifications during replication, and works with various database systems, including SQL Server, MySQL, and PostgreSQL. GoldenGate supports flexible replication topologies such as unidirectional, bidirectional, and multi-master configurations. Before using GoldenGate, make sure you have reviewed and adhere to the license agreement. Oracle GoldenGate for Big Data provides adapters that facilitate real-time data integration from different sources to big data services like Hadoop, Apache Kafka, and Amazon S3. You can configure the adapters to control the data capture, transformation, and delivery process based on your specific requirements to support both batch-oriented and real-time streaming data integration patterns. GoldenGate provides special tools called S3 event handlers to integrate with Amazon S3 for data replication. These handlers allow GoldenGate to read from and write data to S3 buckets. This option allows you to use Amazon S3 for GoldenGate deployments across on-premises, cloud, and hybrid environments. Solution overview The following diagram illustrates our solution architecture. In this post, we walk you through the following high-level steps: Install GoldenGate software on Amazon Elastic Compute Cloud (Amazon EC2). Configure GoldenGate for Oracle Database and extract data from the Oracle database to trail files. Replicate the data to Amazon S3 using the GoldenGate for Big Data S3 handler. Prerequisites You must have the following prerequisites in place: An Oracle Database 19c or later, either on Amazon Relational Database Service (Amazon RDS) for Oracle or Amazon EC2. Make sure you have completed the steps in Preparing the Database for Oracle GoldenGate. Make sure that you have installed a Java development kit and configured ORACLE_HOME. An existing or new S3 bucket. To create a new S3 bucket, see Creating a bucket. An AWS Identity and Access Management (IAM) user. You can use temporary credentials; for more details, refer to Using temporary credentials with AWS resources. Make sure you have the right display settings and xclock is available. For more details, refer to How to enable X11 forwarding from Red Hat Enterprise Linux (RHEL), Amazon Linux, SUSE Linux, Ubuntu server to support GUI-based installations from Amazon EC2. Install GoldenGate software on Amazon EC2 You need to run GoldenGate on EC2 instances. The instances must have adequate CPU, memory, and storage to handle the anticipated replication volume. For more details, refer to Operating System Requirements. After you determine the CPU and memory requirements, select a current generation EC2 instance type for GoldenGate. Use the following formula to estimate the required trail space: trail disk space = transaction log volume in 1 hour x number of hours down x .4 When the EC2 instance is up and running, download the following GoldenGate software from the Oracle GoldenGate Downloads page: GoldenGate 21.3.0.0 GoldenGate for Big Data 21c Use the following steps to upload and install the file from your local machine to the EC2 instance. Make sure that your IP address is allowed in the inbound rules of the security group of your EC2 instance before starting a session. For this use case, we install GoldenGate for Classic Architecture and Big Data. See the following code: scp -i pem-key.pem 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip ec2-user@hostname:~/. ssh -i pem-key.pem ec2-user@hostname unzip 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip Install GoldenGate 21.3.0.0 Complete the following steps to install GoldenGate 21.3 on an EC2 instance: Create a home directory to install the GoldenGate software and run the installer: mkdir /u01/app/oracle/product/OGG_DB_ORACLE /fbo_ggs_Linux_x64_Oracle_shiphome/Disk1 ls -lrt total 8 drwxr-xr-x. 4 oracle oinstall 187 Jul 29 2021 install drwxr-xr-x. 12 oracle oinstall 4096 Jul 29 2021 stage -rwxr-xr-x. 1 oracle oinstall 918 Jul 29 2021 runInstaller drwxrwxr-x. 2 oracle oinstall 25 Jul 29 2021 response Run runInstaller: [oracle@hostname Disk1]$ ./runInstaller Starting Oracle Universal Installer. Checking Temp space: must be greater than 120 MB. Actual 193260 MB Passed Checking swap space: must be greater than 150 B. Actual 15624 MB Passed A GUI window will pop up to install the software. Follow the instructions in the GUI to complete the installation process. Provide the directory path you created as the home directory for GoldenGate. After the GoldenGate software installation is complete, you can create the GoldenGate processes that read the data from the source. First, you configure OGG EXTRACT. Create an extract parameter file for the source Oracle database. The following code is the sample file content: [oracle@hostname Disk1]$vi eabc.prm -- Extract group name EXTRACT EABC SETENV (TNS_ADMIN = "/u01/app/oracle/product/19.3.0/network/admin") -- Extract database user login USERID ggs_admin@mydb, PASSWORD "********" -- Local trail on the remote host EXTTRAIL /u01/app/oracle/product/OGG_DB_ORACLE/dirdat/ea IGNOREREPLICATES GETAPPLOPS TRANLOGOPTIONS EXCLUDEUSER ggs_admin TABLE scott.emp; Add the EXTRACT on the GoldenGate prompt by running the following command: GGSCI> ADD EXTRACT EABC, TRANLOG, BEGIN NOW After you add the EXTRACT, check the status of the running programs with the info all You will see the EXTRACT status is in the STOPPED state, as shown in the following screenshot; this is expected. Start the EXTRACT process as shown in the following figure. The status changes to RUNNING. The following are the different statuses: STARTING – The process is starting. RUNNING – The process has started and is running normally. STOPPED – The process has stopped either normally (controlled manner) or due to an error. ABENDED – The process has been stopped in an uncontrolled manner. An abnormal end is known as ABEND. This will start the extract process and a trail file will be created in the location mentioned in the extract parameter file. You can verify this by using the command stats <<group_name>>, as shown in the following screenshot. Install GoldenGate for Big Data 21c In this step, we install GoldenGate for Big Data in the same EC2 instance where we installed the GoldenGate Classic Architecture. Create a directory to install the GoldenGate for Big Data software. To copy the .zip file, follow these steps: mkdir /u01/app/oracle/product/OGG_BIG_DATA unzip 214000_ggs_Linux_x64_BigData_64bit.zip tar -xvf ggs_Linux_x64_BigData_64bit.tar GGSCI> CREATE SUBDIRS GGSCI> EDIT PARAM MGR PORT 7801 GGSCI> START MGR This will start the MANAGER program. Now you can install the dependencies required for the REPLICAT to run. Go to /u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader and run the sh file with the latest version of aws-java-sdk. This script downloads the AWS SDK, which provides client libraries for connectivity to the AWS Cloud. [oracle@hostname DependencyDownloader]$ ./aws.sh 1.12.748 Configure the S3 handler To configure an GoldenGate Replicat to send data to an S3 bucket, you need to set up a Replicat parameter file and properties file that defines how data is handled and sent to Amazon S3. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the access key and secret access key of your IAM user, respectively. Do not hardcode credentials or security keys in the parameter and properties file. There are several methods available to achieve this, such as the following: #!/bin/bash # Use environment variables that are already set in the OS export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY export AWS_REGION="your_aws_region" You can set these environment variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a secure method to set them temporarily: export AWS_ACCESS_KEY_ID="your_access_key_id" export AWS_SECRET_ACCESS_KEY="your_secret_access_key" Configure the properties file Create a properties file for the S3 handler. This file defines how GoldenGate will interact with your S3 bucket. Make sure that you have added the correct parameters as shown in the properties file. The following code is an example of an S3 handler properties file (dirprm/reps3.properties): [oracle@hostname dirprm]$ cat reps3.properties gg.handlerlist=filewriter gg.handler.filewriter.type=filewriter gg.handler.filewriter.fileRollInterval=60s gg.handler.filewriter.fileNameMappingTemplate=${tableName}${currentTimestamp}.json gg.handler.filewriter.pathMappingTemplate=./dirout gg.handler.filewriter.stateFileDirectory=./dirsta gg.handler.filewriter.format=json gg.handler.filewriter.finalizeAction=rename gg.handler.filewriter.fileRenameMappingTemplate=${tableName}${currentTimestamp}.json gg.handler.filewriter.eventHandler=s3 goldengate.userexit.writers=javawriter #TODO Set S3 Event Handler- please update as needed gg.eventhandler.s3.type=s3 gg.eventhandler.s3.region=eu-west-1 gg.eventhandler.s3.bucketMappingTemplate=s3bucketname gg.eventhandler.s3.pathMappingTemplate=${tableName}_${currentTimestamp} gg.eventhandler.s3.accessKeyId=$AWS_ACCESS_KEY_ID gg.eventhandler.s3.secretKey=$AWS_SECRET_ACCESS_KEY gg.classpath=/u01/app/oracle/product/OGG_BIG_DATA/dirprm/:/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader/dependencies/aws_sdk_1.12.748/ gg.log=log4j gg.log.level=DEBUG #javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar -Daws.accessKeyId=my_access_key_id -Daws.secretKey=my_secret_key javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar Configure GoldenGate REPLICAT Create the parameter file in /dirprm in the GoldenGate for Big Data home: [oracle@hostname dirprm]$ vi rps3.prm REPLICAT rps3 -- Command to add REPLICAT -- add replicat fw, exttrail AdapterExamples/trail/tr SETENV(GGS_JAVAUSEREXIT_CONF = 'dirprm/rps3.props') TARGETDB LIBFILE libggjava.so SET property=dirprm/rps3.props REPORTCOUNT EVERY 1 MINUTES, RATE MAP SCOTT.EMP, TARGET gg.handler.s3handler;; [oracle@hostname OGG_BIG_DATA]$ ./ggsci GGSCI > add replicat rps3, exttrail ./dirdat/tr/ea Replicat added. GGSCI > info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT STOPPED RPS3 00:00:00 00:00:39 GGSCI > start * Sending START request to Manager ... Replicat group RPS3 starting. Now you have successfully started the Replicat. You can verify this by running info and stats commands followed by the Replicat name, as shown in the following screenshot. To confirm that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You can see that the table data has been replicated to Amazon S3 in JSON file format. Best practices Make sure that you are following the best practices on performance, compression, and security. Consider the following best practices for performance: Optimize trail file storage by using high-performance storage systems for improved read/write operations. Refer to Amazon EBS-optimized instance types for more information. Monitor and tune GoldenGate parameters related to trail file management, such as trail file size, number of trail files, and trail file rollover settings, based on your workload characteristics. Monitor GoldenGate processing to identify and address performance bottlenecks. You can also monitor GoldenGate logs by using Amazon CloudWatch. Consider using GoldenGate’s different types of Replicats based on the requirements. For example, use Parallel Replicat for parallel processing to improve performance of heavy workloads. The following are best practices for compression: Enable compression for trail files to reduce storage requirements and improve network transfer performance. Use GoldenGate’s built-in compression capabilities or use file system-level compression tools. Strike a balance between compression level and CPU overhead, because higher compression levels may impact performance. Lastly, when implementing Oracle GoldenGate for streaming data to Amazon S3 for real-time analytics, it’s crucial to address various security considerations to protect your data and infrastructure. Follow the security best practices for Amazon S3 and security options available for GoldenGate Classic Architecture. Clean up To avoid ongoing charges, delete the resources that you created as part of this post: Remove the S3 bucket and trail files if no longer needed and stop the GoldenGate processes on Amazon EC2. Revert the changes that you made in the database (such as grants, supplemental logging, and archive log retention). To delete the entire setup, stop your EC2 instance. Conclusion In this post, we provided a step-by-step guide for installing and configuring GoldenGate for Oracle Classic Architecture and Big Data for streaming data from relational databases to Amazon S3. With these instructions, you can successfully set up an environment and take advantage of the real-time analytics using a GoldenGate handler for Amazon S3, which we will explore further in an upcoming post. If you have any comments or questions, leave them in the comments section. About the Authors Prasad Matkar is Database Specialist Solutions Architect at AWS based in the EMEA region. With a focus on relational database engines, he provides technical assistance to customers migrating and modernizing their database workloads to AWS. Arun Sankaranarayanan is a Database Specialist Solution Architect based in London, UK. With a focus on purpose-built database engines, he assists customers in migrating and modernizing their database workloads to AWS. Giorgio Bonzi is a Sr. Database Specialist Solutions Architect at AWS based in the EMEA region. With a focus on relational database engines, he provides technical assistance to customers migrating and modernizing their database workloads to AWS.View the full article
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.