Tar PM Optimization - AEM 6.0

Tar PM Optimization
Tar PM Optimization
Overview / Adobe Experience Manager / Adobe Experience Manager 6.0 / Deploying and Maintaining / Upgrading to AEM 6.0 /
Persistence Managers and Other Storage Elements /
Tar PM optimization removes obsolete data from the CQ repository. You can optimize
the optimization process by observing the process behavior and adjusting performance
properties.
Data stored in CQ is composed of blobs of content of varying sizes, such as a 10MB video
or a short string. CQ divides the content into large and small objects using an arbitrary size
threshold of 4K by default. Objects larger than 4K go in the "data store" and objects below
4K go into the repository, which uses the tar-persistence manager (tar-pm). Tar-pm uses
an append-only mechanism to store data. If an object is changed, a new copy of the object
is appended to the repository file and the old object becomes orphaned. The old object still
uses space, but is no longer used. When more changes occur, more objects are appended
and more old data becomes obsolete. The presence of old data can become a storage
overhead and a process of tar-pm optimization is used to clean up all orphaned content
and create a repository that contains only content that is currently in use.
The TAR-pm places content in a series of files, each of an approximate size of 256 MB. The repository
can contain many of these files, each numbered chronologically. When a tar-optimization process runs, it
takes the oldest file first and transfers all of the active content into a new file. The un-used, obsolete content
remains in the old file.
After processing an old file and all of the active content is written to a new output file, the
old file is removed. Because there may be dozens or more files and tar-optimization is
normally scheduled to run in a fixed time window, it is possible that on a given run, the
process does not optimize all of the input files. Because the optimizer process always
begins with the first file, the process generally requires the same amount of time each time
it runs. This diagram shows how TAR files are processed and reprocessed in successive
runs:
It is not necessarily a problem if the tar optimizer does not complete processing of all of the files in the
CQ repository during a given run. For example, if the repository contains 24 files, but TAR optimization
processed only 12 files in an overnight run, you can assume that the next run resumes where the previous
© 2012 Adobe Systems Incorporated.
All rights reserved.
Page 1
Created on 2014-12-16
Tar PM Optimization
run stopped, processing the remaining 12 files. Thus the effective throughput would be 12 files per day, and
with 24 files the average re-optimization interval would be two days.
MEASURING OPTIMIZATION PROGRESS
To estimate the average time required to optimize all TAR files, count the TAR files in the repository and
examine the error.log to see how many files are processed each night. TAR PM optimization processes files
that are located in three areas of the repository:
•
•
•
/crx-quickstart/repository/workspaces/crx.default
/crx-quickstart/repository/tarJournal
/crx-quickstart/repository/version
Typically, the workspaces TAR files require the most time for TAR optimization. From the log, you can see
which files are processed and the time required to optimize them. The following example data illustrates the
information that can be extracted from the log regarding TAR PM optimization activity:
Start
End
TAR file
Time (s)
02:00:03
02:11:17
workspaces/
data_01724.tar
674.2
02:11:17
02:22:29
workspaces/
data_01725.tar
672.3
02:22:29
02:33:25
workspaces/
data_01726.tar
655.5
02:33:25
02:44:18
workspaces/
data_01727.tar
653.4
02:44:18
02:55:46
workspaces/
data_01728.tar
687.9
02:55:46
03:06:46
workspaces/
data_01729.tar
659.7
03:06:46
03:17:30
workspaces/
data_01730.tar
644.4
03:17:30
03:27:53
workspaces/
data_01731.tar
622.5
03:27:53
03:38:18
workspaces/
data_01732.tar
625.3
03:38:18
03:49:32
workspaces/
data_01733.tar
673.9
03:49:32
03:59:13
workspaces/
data_01734.tar
580.6
03:59:13
04:08:54
workspaces/
data_01735.tar
581.1
© 2012 Adobe Systems Incorporated.
All rights reserved.
Page 2
Created on 2014-12-16
Tar PM Optimization
04:08:54
04:17:36
workspaces/
data_01736.tar
522.2
04:17:36
04:28:30
workspaces/
data_01737.tar
654.2
04:28:30
04:39:21
workspaces/
data_01738.tar
650.9
04:39:21
04:48:30
workspaces/
data_01739.tar
549.1
04:48:30
04:59:21
workspaces/
data_01740.tar
651.5
04:59:21
05:00:00
workspaces/
data_01741.tar
38.536
In this example, TAR PM optimization processes about 17 files, each requiring about 600 seconds. The last
file was abandoned at 0500, and TAR PM optimization did not complete. To assess the remaining files that
were not optimized, look in the workspace directory:
/crx-quickstart/repository/workspaces/crx.default
total 4973816
-rw-r--r-- 1 user1 user1 268438016 Sep 27 14:19 data_01741.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 27 14:35 data_01742.tar
-rw-r--r-- 1 user1 user1 268437504 Sep 27 14:51 data_01743.tar
-rw-r--r-- 1 user1 user1 269306880 Sep 27 15:10 data_01744.tar
-rw-r--r-- 1 user1 user1 268588544 Sep 27 15:32 data_01745.tar
-rw-r--r-- 1 user1 user1 269104128 Sep 27 15:49 data_01746.tar
-rw-r--r-- 1 user1 user1 268961792 Sep 27 16:11 data_01747.tar
-rw-r--r-- 1 user1 user1 269267456 Sep 27 16:22 data_01748.tar
-rw-r--r-- 1 user1 user1 271622656 Sep 27 16:32 data_01749.tar
-rw-r--r-- 1 user1 user1 268437504 Sep 28 02:31 data_01750.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 03:05 data_01751.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 03:19 data_01752.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 03:31 data_01753.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 03:52 data_01754.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 04:02 data_01755.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 04:11 data_01756.tar
-rw-r--r-- 1 user1 user1 268437504 Sep 28 04:27 data_01757.tar
-rw-r--r-- 1 user1 user1 268438016 Sep 28 04:49 data_01758.tar
-rw-r--r-- 1 user1 user1 81231360 Sep 28 10:02 data_01759.tar
drwxr-xr-x 12 user1 user1
4096 Sep 28 10:02 index
-rw-r--r-- 1 user1 user1 135413760 Sep 28 05:00 index_1_426.tar
-rw-r--r-- 1 user1 user1 21861888 Sep 28 05:01 index_391_12.tar
-rw-r--r-- 1 user1 user1
0 Sep 27 16:01 locks
-rw-r--r-- 1 user1 user1 11436057 Sep 20 15:11 q
-rw-r--r-- 1 user1 user1
1872 Sep 27 15:58 workspace.xml
The TAR PM processed and deleted files data_01724.tar through data_01740.tar. The files created between
0200 and 0500 are the outputs of the optimization (files data_01750.tar through data_01758.tar). In this case
the TAR optimization generated approximately 9 output files after processing 17 input files, so it reduced
space requirements by 2:1.
When the scheduling constratints stopped the optimization process at 0500, the process abandoned the
data_01741.tar file, which became the oldest file in this directory. The files that the TAR optimization did
not process are data_01741.tar through data_01749.tar; a total of 9 files. Of the 26 files, it processed
17, or 65%. The next day, the scheduled optimization would complete those 9 files and around 8 more.
© 2012 Adobe Systems Incorporated.
All rights reserved.
Page 3
Created on 2014-12-16
Tar PM Optimization
In this case, the average time between optimizations of any file is about a day and a half, which is likely
satisfactory.
IMPROVING THE PERFORMANCE OF TAR OPTIMIZATION
The throughput of optimization I have observed on the snokzlx14 data has been fairly consistent at about
600 seconds per 256MB tar file, or about 0.43MB/sec. This does not represent a high I/O rate and you can
readily observe that the CPU is not heavily utilized. In the case of the snokzlx14 data, the time spent in tar
optimization is traceable to a throttling mechanism that is designed to curtail the impact of tar optimization on
production transactions.
CONFIGURING THE TAROPTIMIZATIONDELAY PROPERTY
TAR optimization delay is a throttling mechanism that ensures adequate system resources are available for
higher-priority production transactions. On systems with limited I/O capacity,TAR optimization can starve
other CQ operations for filesystem I/O bandwidth, affecting performance or even stability of the system.
1.
2.
3.
Open the CQ Web Console and click the JMX tab. (http://localhost:4502/system/console/jmx)
Click the Repository MBean for the com.adobe.granite domain.
In the table, click the value of the TarOptimizationDelay attribute, change the value to 0, and click
Save.
CONFIGURING THE INDEXINMEMORY PROPERTY
It is possible that file system I/O capacity is the limiting factor for TAR optimization performance. In this case,
you can configure the indexInMemory property to reduce the I/O requirements. The following procedure
configures the indexInMemory configuration to ensure enough heap memory is available to handle the index.
The heap size is also doubled for future growth.
Your system could benefit from this configuration if you observe high CPU usage and high disk usage during
Tar PM optimization. If increasing the indexInMemory property has negligible affect, the original configuration
is adequate.
For more information, see Performance Tuning Tips.
1.
Use a text editor to open the crx-quickstart/repository/workspaces/crx.default/workspace.xml and
crx-quickstart/repository/repository.xml files. Add the <param name="indexInMemory" value="true"/
> element to the PersistenceManager element, as in the following example:
<PersistenceManager class="com.day.crx.persistence.tar.TarPersistenceManager">
<param name="indexInMemory" value="true" />
</PersistenceManager>
2.
3.
4.
Calculate the total size of the index*.tar files in the crx-quickstart/repository/workspaces/crx.default
directory, in MB.
Calculate the total size of the index*.tar files in the crx-quickstart/repository/version directory, in MB.
Double the sum of the two totals and add the value to the maximum heap size of the JVM:
Increase in heap size = (total from step 2 + total from step 3) x 2
For example, the startup script for a server uses the -Xmx2048m parameter to configure the heap
size of the JVM. The server has, as a result of step 4, a total of 1000 MB. Therefore, the heap size is
increased by 1000 MB using -Xmx3072m as the JVM parameter.
© 2012 Adobe Systems Incorporated.
All rights reserved.
Page 4
Created on 2014-12-16
Tar PM Optimization
TAR PM OPTIMIZATION CASE STUDY
In this example case study, iostat is used to monitor CPU and disk usage during the TAR optimization
process. The disk is between 6% and 7% busy and the CPU is between 4% and 5% busy. The disks are
doing about 100 transfers per second, which is a considerable load but does not approach their capability.
In this chart the percentage of disk (red) and CPU (blue) utilization are plotted over time. The green line
represents the overall disk write throughput in MB/sec (the righ axis). The orange horizontal dashes
represent the processing of TAR files. The length of each line represents the time required for processing,
and the vertical position represents the time taken in seconds.
The chart shows that the system was idle (0% utilization) before and after TAR optimization occurred.
During optimization, the disk throughput is about 20MB/sec, which iostat reports as about 6% of throughput
capacity. The TAR optimization rate averages about 24.5 seconds per file, or about 10 MB/sec of TAR file
content.
Despite eliminating the delay, the results above do not indicate that I/O capacity is a limit on TAR
optimization. To validate that system I/O capacity is adequate, TAR optimization performance is again
measured, this time in the presence of a large amount of background disk I/O (generated for testing
purposes).
The generated background load consists of the following activites:
• Copy all of the TAR files from author/crx-quickstart/repository/workspaces/crx.default to a temporary
directory on the same physical filesystem.
• Repeatedly copy each of the files in succession, from the original filename to a temp file using dd "if=$IN
of=$OUT bs=512" Note that the block size is relatively small at 1/2 K.
While the background load is running, the throughput using iostat and the throughput reported by the dd
commands are about 220 MB/sec. TAR optimization is performed while this background I/O load is running.
The following chart shows the results using the same layout as above. Note that in this chart the maximum
axis values are much different.
© 2012 Adobe Systems Incorporated.
All rights reserved.
Page 5
Created on 2014-12-16
Tar PM Optimization
As expected, the disk throughput is much higher, at over 200MB/sec. The disk utilization reported is on
the order of 75%. The background load alone requires 5% CPU usage. The additional load of the tar
optimization increases the required CPU usage to 10%. The throughput of TAR optimization generally
follows the same pattern, where individual files are processed in around 25 seconds. The following chart
shows a direct comparison of TAR pm optimization throughput for the no-load and background-load cases,
with the TarOptimizationDelay property set to 0:
Although the times are comparable, the average TAR optimization time is longer when the parallel load is
present by an average of 29s versus 25s. Also, the overall TAR optimization time is slightly longer at just
over 10 minutes versus just over 8 minutes. The differences in timing are real and measureable, but not
significant. It appears that substantial amounts of parallel disk activity has a small effect on TAR optimization
throughput.
As a result of this analysis, the conclusion is that it would be reasonable, on this particular test system, to
use the TarOptimizationDelay=0 option in circumstances where the throughput of normal TAR optimization
is insufficient. In this case a scheduled tar optimization is run on a weekly or bi-weekly basis, where the
process can be scheduled and any interaction with normal application load due to high I/O use could be
monitored and managed.
© 2012 Adobe Systems Incorporated.
All rights reserved.
Page 6
Created on 2014-12-16