A Sample Program For Moving DB2 Changed Data from WebSphere MQ to Files with Q Replication Introduction This article introduces a sample program that retrieves DB2 changed data in a delimited values format from WebSphere MQ and saves it to files. The sample is based on the event publishing function found in the IBM InfoSphere Data Replication (IIDR) Q Replication technology. Event Publishing captures changed data from a database transaction log and makes it available for applications to consume. The consuming applications may range from in-house programs to off-the-shelf ETL tools, such as InfoSphere DataStage, that perform complex transformations on the data. This process is illustrated in the diagram below: Q Replication offers event publishing to WebSphere MQ and to tables called CCDs. Some applications are not able to consume from these directly They need another interface to changed data. Files are often a good option. You can move changed data from WebSphere MQ to files with a simple event consumer. The sample discussed in this paper is an event consumer. It shows how more applications can consume changed data from Q Replication. Even though the article shows you how to invoke the sample program for DB2 changed data, its contents also apply to changed data from Oracle. The Solution at a Glance The three stages of the solution are shown in the diagram below: In the first stage, you identified the tables for which changed data is to be captured. The event publishing function puts committed transactions into a queue. WebSphere MQ sends the data to the system where you need it. In the second stage, the sample program, called asnmsg2file, reads the queue and writes data to the local file system. One file is created for each table. The program reads data from the queue for a user-defined amount of time. For example, 5-10 minutes. At the end of that time, the program shuts down so that the files can be processed. In the final stage, the consuming application reads the files. The data is in a delimited values format. The application or a script deletes the files once the application is finished with them. The asnmsg2file program can now be run again. System 1 and system 2 in the picture above are logical servers that can reside on the same physical server. This is the case for the exercise in Part 3. How This Article is Organized The article starts with a description of the sample program prerequisites then moves on to file format and processing, restrictions, program logic and input parameters. The second part of the article includes instructions to install and configure the program to run on your system. The third part of the article ends with the tasks and commands necessary to run a simple end-to-end scenario. Prerequisites The sample program runs on Windows, UNIX (including Unix System Services USS) or Linux. Before you run it, a Q Replication environment must be up and running for a DB2 or InfoSphere Warehouse. The Q Replication environment must be defined as follows: • Provide a WebSphere MQ local queue on the system where you plan to run the asnmsg2file program. This queue will hold the incoming data. You also need the supporting MQ definitions to connect the source system to the target system. If you are not familiar with WebSphere MQ, you can set up a typical MQ configuration for event publishing using the instructions found in A Fast Way to Get Started with WebSphere MQ for DB2. • License Q Replication for your DB2. ◦ On Windows, UNIX or Linux, the Q Capture program in DB2 ESE and DB2 Workgroup will not start unless it has been licensed through one of the following products: ◦ ◦ ◦ ◦ The IBM Homogeneous Replication Feature for DB2 InfoSphere Data Replication InfoSphere Replication Server On z/OS, the Q Capture program is licensed though one of the following products: ◦ InfoSphere Data Replication ◦ InfoSphere Replication Server Processing Files Created by the Sample Program For each run, the program creates a directory whose name is a timestamp. The directory is created as a subdirectory of a main directory provided through an input parameter. Each run's data files go into this subdirectory. One data file is created for each table with changed data. Consuming applications look in the main directory for the presence of subdirectories to determine whether the asnmsg2file program ran, and how many times, since the last time the consuming application was run. If new subdirectories are found, the consuming application then looks for data files in each subdirectory. The consuming application should delete subdirectories after successfully processing their data files. The output format of each data file follows closely the Data Event Publisher delimited message format. You can not change the output format of the data file but you can provide your own output delimiter (using an optional parameter). For example, you won't be able to change date or timestamp format. The picture below idisplays the contents of the EventPublisherFiles directory, the directory provided to the asnmsg2file program for the scenario described in part 2. As you can see: ◦ The 1329960391105 directory was created when asnmsg2file first started and holds data for that run ◦ Three files were created corresponding to three tables found with changed data in the MQ queue when the program was running: Restrictions and Error Messages The asnmsg2file program does not currently perform comprehensive error checking so review the configuration attributes and prerequisites to avoid unnecessary exceptions. If any unexpected error happens after the message has been read from the queue, the data is available in the data files so data read from the MQ queue is not lost. There are two restrictions 1) The program currently only runs on a system where a WebSphere MQ queue manager is running. It is a JMS application that uses JMS binding mode connections (not client mode). What this means is that it won't be able to access a remote MQ queue manager. 2) The program currently only supports the comma ',' as the decimal delimiter. The asnmsg2file Program Logic The sample program runs as a batch job. You decide how often and how long the program runs. A cron job may be helpful to schedule these runs automatically. Once started, the sample program does the following: • Verifies input parameters Three required input are the MQ queue manager name, the MQ publishing queue name and the output directory where all files are created. One of the optional parameter is the duration of the run with a default of five minutes • Connects to the queue manager • Creates a directory with the current timestamp. This is created as a subdirectory of the one provided as an input parameter. • Listens to the queue and process messages for a user-specified amount of time. By default, this is five minutes, a good choice for many situations. • For each message found in the queue ◦ Parses the message and finds the table name(s) associated with the changed data. The MQ message may include changed data for multiple tables. ◦ Processes the message to format the output. ◦ Writes the data to files in the directory with the current timestamp. The program uses the file naming convention tablename.dat (the file name does not include the table's schema) and creates the file if it doesn't already exist. Note: if a large transaction spans across multiple MQ messages, the data files are first created in a “stage” directory. After the transaction is processed in its entirety, the sample moves the contents of the stage directory to the timestamp directory. • When the run time has been completed, disconnects from the queue manager and stop. The asnmsg2file Parameters: The three required parameters are: -qmgr: WebSphere MQ queue manager name that manages the local queue to be read by the sample program. The queue manager must be running before invoking asnmsg2file otherwise you'll receive a JMS exception (queue manager not started) -queue : Name of the local queue. If you want to process messages from a different queue, you will need to start another invocation of the program -out: Directory where timestamped subdirectories (and their data files) are created. If it doesn't exist prior to invoking asnmsg2file, it will be created. That value should remain the same across scheduled invocations of asnmsg2file if the stage directory is populated with data files across the invocations. You will need to invoke the program with additional (optional) input parameters if you: • • • want to run the program for a different amount of time than the provided defaults (5mn) made some updates to the default Q Replication publication queue map properties want to change the output file structure (changed data record delimiter or column delimiter). The six additional optional parameters are: -time: Length of time the program should run, in milliseconds. Default is slightly less than 5 minutes (28500 ms) -outRecord: Output changed data record separator. Default is 0A (ASCII new line) -outCol: Output column delimiter. Default is 1E (ASCII record separator) -inCol: Input column delimiter. Default is 2C (ASCII comma), same as Data Event Publisher Hex ASCII value column delimiter default -inString: Input character string delimiter. Default is 22 (ASCII double quotation mark), same as Data Event Publisher Hex ASCII value char string delimiter default -inRecord: Input changed data record separator. Default is 0A (ASCII new line), same as Data Event Publisher Hex ASCII value changed data delimiter default Part 2: Installing and Configuring asnmsg2file to Run on Your System To run asnmsg2file in your environment, complete the following tasks: • • Launch a new DB2 Command Window and create a new directory where the program will run. ◦ If you are running Windows 7, launch the new command window as Administrator to prevent any authorization errors running Websphere MQ. ◦ For the exercise of Part 3, I used 'mymsg2f'. Switch to the 'mymsg2f' directory and unzip the asnmsg2file.zip file (in the Samples section of the Q+SQL Replication Forum on developerWorks) by running the command: unzip asnmsg2file.zip. You should see six files in your directory. Let's call this current directory 'mymsg2f': • ◦ EPcreatedb.bat: DB2 script to create a sample database ◦ EPmq.asnclp: ASNCLP script to create WebSphere MQ objects ◦ EPpub.asnclp: ASNCLP script to create event publishing objects ◦ asnmsg2file.jar: sample event consumer program to save messages to files ◦ asnmsg2file.java: sample source code ◦ README: a subset of this article describing the asnmsg2file event consumer program Switch to the 'mymsg2f' directory and run the following command to update your CLASSPATH variable and avoid a 'NoClassDefFoundError' exception: • ◦ Windows systems: set CLASSPATH=%CLASSPATH%;C:\mymsg2f\asnmsg2file.jar ◦ UNIX/Linux systems: exportCLASSPATH=$CLASSPATH:/u/.. /mymsg2f/asnmsg2file.jar Verify that the WebSphere MQ JMS jar files already exist in your CLASSPATH (part of the WebSphere MQ JMS install): ◦ Windows systems: ..\Websphere MQ\Java\lib\com.ibm.mqjms.jar and ..\Websphere MQ\Java\lib\jms.jar ◦ UNIX/Linux systems: ../lpp/mqm/java/lib/com.ibm.mqjms.jar and ../lpp/mqm/java/lib/jms.jar If they don't, you need to include them in your CLASSPATH using the method listed above. • The asnmsg2file.jar file is compatible with JDK 6 VM. If you run in a pre-JDK 6 VM environment, compile the asnmsg2file.java source to create a new jar file by running the two commands below: ◦ javac -verbose asnmsg2file.java ◦ jar cvf ansmsg2file.jar asnmsg2file.class asnmsg2file$MQListener.class asnmsg2file$StopListen.class You're now ready to run your scenario. Part 3: Running a Simple Scenario For the simple scenario included below, I used a Windows 7 system with a copy of WebSphere MQ V7 and a copy of the DB2 product at version 9.7 FP4. This section takes about 20 minutes to complete. It shows you how to configure three publications and run the sample program to create three data files with the changed data. The diagram summarizes the configuration: The scenario includes four steps: 1. Set up directories 2. Create a DB2 sample database PUBSRC and publication definitions for three of its tables 3. Start Q Capture (asnqcap) on your system 4. Update the DB2 source system and run asnmsg2file to review the output files 1. Set Up Directories Before running the scenario, complete the two tasks below: 2. ◦ Create a new directory for the output directory where all sub-directories and data files will be created. I used 'DEPFiles' which I created in the 'C:\mymsg2f\' directory. ◦ Launch another Command Window, switch to 'mymsg2f' and create another new directory: 'myQCap' Create the sample database and publication definitions Run the four commands below all from 'mymsg2f' directory: ◦ createdb.bat The command creates a sample database called PUBSRC ◦ asnclp -f EPmq.asnclp The command creates a shell and a batch file, for UNIX/Linux and Windows systems respectively ◦ qrepl.pubsrc.mq_windows.bat (windows system) or qrep.pubsrc.mq_aixlinux.sh (UNIX/Linux system) The command creates a new queue manager PUBSRC, starts it and creates all necessary queue objects for Data Event Publishing. The publishing queue is called ASN.PUBSRC.DATA. ◦ asnclp -f EPpub.asnclp The command creates the publications and their related objects for three source tables: EMPLOYEE, DEPARTMENT and PROJECT. 3. Start Q Replication's Q Capture program. Run the command below from 'myQCap' directory: ◦ asnqcap capture_server=PUBSRC You should see initialization messages for the EMPLOYEE, DEPARTMENT and PROJECT tables. 4. Update the DB2 Source System and Run asnmsg2file Run the five DB2 statements below all from 'mymsg2f' directory: ◦ db2 "update EMPLOYEE set job = 'MANAGER' where lastname = 'QUINTANA' "; db2 "insert into DEPARTMENT values ('A01', 'TEST', null,'A00',null)”; db2 "insert into DEPARTMENT values ('A02', 'TEST2', null,'A00',null)”; db2 "insert into DEPARTMENT values ('A03', 'TEST3', null,'A00',null)”; db2 "update PROJECT set prendate = CURRENT DATE where projname = 'OPERATION SUPPORT' "; You should now see five new messages in the ASN.PUBSRC.DATA queue. Finally, run the Java program from 'mymsg2f' directory to process the messages and save them to files: ◦ ◦ java asnmsg2file -qmgr PUBSRC -queue ASN.PUBSRC.DATA -time 60000 -out DEPFiles ▪ I'm using a one-minute run here so you don't have to wait long to see the results ▪ You'll want to add optional parameters as discussed earlier and/or vary the db2 insert/update/delete statements to fully try this new program out. You should have three files in a directory created under the DEPFiles output directory, as follows: Stay Tuned Expect an upcoming article and blog post with a real-world use case for this sample.
© Copyright 2024