Operations Dashboard Operations Dashboard Overview / Adobe Experience Manager / Adobe Experience Manager 6.0 / Administering / Operations and Monitoring / INTRODUCTION The all-new Operations Dashboard in AEM 6 helps system operators to monitor AEM system health at a glance. It also provides auto-generated diagnosis informations on relevant aspects of AEM and allows to configure and run self-contained maintenance automation to reduce project operations and support cases significantly. The Operations Dashboard can be extended with custom health checks and maintenance tasks. Further,Operations Dashboard data can be accessed from external monitoring tools via JMX. The Operations Dashboard: • Is a one-click system status to help operations departments gain efficiency • Provides system health overview in a single, centralized place • Dramatically reduces time to find, analyze and fix issues • Provides self-contained maintenance automation that helps reduce project operations costs significantly It can be accessed by going to Tools - Operations - Dashboard from the AEM Welcome screen. NOTE In order to be able to access the Operations Dashboard, the logged in user must be part of the "Operators" user group. For more info, see documentation on User, Group and Access Right Administration. HEALTH REPORTS The Health Report system provides information on the health of an AEM instance through Sling Health Checks. This can be done via either OSGI, JMX or HTTP requests (via JSON). It offers measurements and threshold of certain configurable counters and in some cases, will offer information on how to resolve the issue. It has several features, described below. Health Checks The Health Reports are a system of cards indicating a good or a bad health with regard to a specific product area. These cards are visualizations of the Sling Health Checks that aggregate information from JMX and other sources and expose processed information again as MBeans. The MBeans can be inspected also in the JMX web console, under the org.apache.sling.healthcheck domain. The Health Report interface can be accessed by going to Tools - Operations - Dashboard - Console Health Reports from the AEM Welcome screen, or directly by accessing the following URL: http://<serveraddress>:port/libs/granite/operations/content/healthreports.html © 2012 Adobe Systems Incorporated. All rights reserved. Page 1 Created on 2015-02-05 Operations Dashboard The card system has three status states: OK, WARN and CRITICAL. The states are a result of rules and thresholds that can be configured through the user interface by hovering the mouse over the card and then clicking the gear icon in the action bar: Health Check Types There are two types of health checks in AEM 6: 1. Individual Health Checks 2. Composite Health Checks An Individual Health Check is a single health check that corresponds to a status card. Individual Health Checks can be configured with rules or thresholds and they can provide one or more links to solve identified health issues. For example, if there are ERROR entries logged in the system, you will find them on the details page of the "Log Errors" along with a link to the "Log Message" analyzer in the Diagnosis Tools section so that you can actually inspect the errors. A Composite Health Check is a check that aggregates information from several individual checks. Composite health checks are configured with the aid of filter tags. In essence, all single checks that have the same filter tag will be grouped as a composite health check. A Composite Health Check will have an OK status only if all the single checks it aggregates have OK statuses as well. How to create Health Checks There are two types of Health Checks you can create for the Operations Dashboard: an individual health check and a composite health check. Creating an individual Health Check Creating an individual Health Check involves two steps: creating a Sling Health Check and adding an entry for the Health Check in the Dashboard's configuration nodes: © 2012 Adobe Systems Incorporated. All rights reserved. Page 2 Created on 2015-02-05 Operations Dashboard 1. In order to create a Sling Health Check, you need to create an OSGI component implementing the Sling Health Check interface. You will add this component inside a bundle. The properties of the component will fully identify the Health Check. Once the component is installed, a JMX Mbean will automatically be created for the Health Check. See the Sling Health Check Documentation for more information. Example of a Sling Health Check component: @Component(metatype=true, label="Example Health Check", description="This is an example Health Check.") @Properties({ @Property(name=HealthCheck.NAME, value="Example", label="Name", description="Name of the health check."), @Property(name=HealthCheck.TAGS, unbounded= PropertyUnbounded.ARRAY, value={"example", "test"}, label="Tags", description="Tags for the health check."), @Property(name=HealthCheck.MBEAN_NAME, value="exampleHealthCheck", label="MBean Name", description="Name of the JMX mbean to register for this check.") }) @Service(value=HealthCheck.class) public class ExampleHealthCheck implements HealthCheck { ... } NOTE The MBEAN_NAME property defines the name of the mbean that will be generated for this health check. 2. After creating a Health Check, it is necessary to insert an entry for it in the Dashboard's configuration nodes, in order to make it accessible in the web interface. For this step, it is necessary to know the JMX Mbean name of the Health Check (the MBEAN_NAME property). To create a configuration for the Health Check, open CRX and add a new node (of type nt:unstructured) under this location: /apps/ granite/operations/config/hc The name of the node is the name of the health check that will appear in the Operations Dashboard. It should have the following properties: • • Name: sling:resourceType • Type: String • Value: granite/operations/components/mbean Name: resource • Type: String • Value: /system/sling/monitoring/mbeans/org/apache/sling/healthcheck/HealthCheck/ exampleHealthCheck NOTE The resource path above is created in the following way: if the mbean name of your Health Check is "test", add "test" to the end of this path: /system/sling/monitoring/mbeans/org/apache/sling/healthcheck/HealthCheck So the final path will be: /system/sling/monitoring/mbeans/org/apache/sling/healthcheck/HealthCheck/test Creating a Composite Health Check A Composite Health Check's role is to aggregate a number of individual Health Checks sharing a set of common features. For instance, the Security Composite Health Check groups together all the individual health checks performing security checks. In order to create a composite, several steps are required that involve creating a Composite Health Check OSGI configuration and adding an entry for the Composite Health Check in the Dashboard's configuration nodes. © 2012 Adobe Systems Incorporated. All rights reserved. Page 3 Created on 2015-02-05 Operations Dashboard 1. Go to the Web Configuration Manager in the OSGI Console. You can do this by accessing http:// serveraddress:port/system/console/configMgr 2. Search for the entry called Apache Sling Composite Health Check. After you find it, notice that there are two configurations already available: one for the System Checks and another one for the Security Checks. Create a new configuration by pressing the "+" button on the right hand side of the configuration. A new window will appear, as shown below: 3. 4. Create a configuration and save it. An Mbean will be created with the new configuration. The purpose of each configuration property is as follows: • Name (hc.name): The name of the Composite Health Check. It can be anything, but a meaningful name is recommended. • Tags (hc.tags): The tags for this Health Check. If this composite health check is intended to be a part of another composite health check (such as in a hierarchy of health checks), add the tags this composite is related to. • MBean Name (hc.mbean.name): The name of the Mbean that will be given to the JMX MBean of this composite health check. • Filter Tags (filter.tags): This is a property specific to composite health checks. These are the tags which the composite should aggregate. The composite health check will aggregate under its group all the health checks that have any tag matching any of the filter tags of this composite. For example, a composite health check having the filter tags test and check will aggregate all the individual and composite health checks that have any of the test and check tags in their tags property (hc.tags). NOTE A new JMX Mbean is created for each new configuration of the Apache Sling Composite Health Check. 5. Finally, the entry of the composite health check that has just been created needs to be added in the Operations Dashboard configuration nodes. The procedure for this is the same as with individual health checks: a node of type nt:unstructured needs to be created under /apps/granite/operations/config/ hc. The resource property of the node will be defined by the value of hc.mean.name in the OSGI configuration. If, for example, you created a configuration and set the hc.mbean.name value to diskusage, the configuration nodes will look like this: • Name: Composite Health Check • Type: nt:unstructured With the following properties: • Name: sling:resourceType • Type: String • Value: granite/operations/components/mbean • Name: resource • Type: String • Value: /system/sling/monitoring/mbeans/org/apache/sling/healthcheck/HealthCheck/diskusage © 2012 Adobe Systems Incorporated. All rights reserved. Page 4 Created on 2015-02-05 Operations Dashboard NOTE If you create individual health checks that logically belong under a composite check that is already present in the Dashboard by default, they will be automatically captured and grouped under the respective composite check. Because of this, there is no need to create a new configuration node for those checks. For example, if you create a individual security health check, all you need to do assign it the "security" tag, and it is intalled, it will automatically appear under the Security Checks composite in the Operations Dashboard. Health Checks Provided with AEM 6.0 Healthcheck Name Filter tags for Composite Health Checks System Checks system A health check is generated automatically for all maintenance tasks. This will indicate if the last run of the task was not successful or if it has not yet run at all. Security Checks security The Security Checks composite groups together individual health checks related to security. The individual health checks are related to the security checklist available at the Security Checklist documentation page. CRXDE Support © 2012 Adobe Systems Incorporated. All rights reserved. Tags • • • Description bundles security production Default threshold (can be configured) Recommended actions Health Check fails if the CRXDE Support Page 5 Created on 2015-02-05 Operations Dashboard bundle is active. Default Login Accounts • • • login security production Health Check fails if the default login accounts have not been disabled. Sling Get Servlet • • • • dos sling security production Health Check fails if the default Sling Get Servlet configuration is not following the security guidelines. CQ Dispatcher Configuration • • • dispatcher production security Checks the basic configuration of the Dispatcher component. CQ HTML Library Manager • • • cq security production Checks if the default CQ HTML Library Manager configuration follows the security guidelines. Replication and Transport Users • • • security replication cq Health Check that checks the replication and transport users. Sling Java Script Handler • • • sling security production Checks if the Sling Java Script Handler configuration follows the security guidelines. Sling Jsp Script Handler • • • sling security production Checks if the Sling Jsp Script Handler configuration follows the security guidelines. Sling Referrer Filter • • sling security Checks if the Sling © 2012 Adobe Systems Incorporated. All rights reserved. Page 6 Created on 2015-02-05 Operations Dashboard • • production csrf Referrer Filter is configured in order to prevent CSRF attacks User Profile Default Access • • acl security This health check checks if everyone has read access to user profiles. WCM Filter Config • • • cq security production Checks if the default WCM Filter configuration follows the security guidelines. WebDav Access • • • bundles security production This health check checks if the crxdesupport bundle is active. Web Server Configuration • • • • webserver production security clickjacking This checks if the web server sends the X-FRAMEOPTIONS HTTP header set to SAMEORIGIN. In order for it to function, this Health Check needs to be configured with the public server address that the dispatcher is configured with. Replication Queue © 2012 Adobe Systems Incorporated. All rights reserved. Checks if replication is stuck. For example, if the first item of any replication queue has more retries than allowed by the configuration, then replication Page 7 numberOfRetriesAllowed=3 • link to replication agents in order to inspect queues • link to log messages analyzer in the Diagnosis Tools section. Created on 2015-02-05 Operations Dashboard is considered to be stuck. Log Errors This health check notifies if there are errors in the log messages. Active Bundles Health check that fails if there are inactive or unresolved bundles. Response Performance Health check responsible for requests performance. If a percentage of the requests are resolved above a critical or warning threshold value, the health check will be in a critical or a warning state. Otherwise, the health check is successful. The requests considered are the ones made in the last time frame (by default, 60 minutes). You can configure the requests percentage, the critical and warning thresholds in the configuration of the Requests Status Health Check. The configuration for the time frame can be changed in © 2012 Adobe Systems Incorporated. All rights reserved. Page 8 Created on 2015-02-05 Operations Dashboard the Adobe Granite Timed Requests Logger. Query Performance Health check responsible for queries performance. If the average of the queries' resolve time from the last hour exceeds a critical or a warning threshold, the health check will be in a critical / warning state. The critical and warning thresholds are configurable values, and they can be changed in the configuration of the Queries Health Check. Production Ready Health Check This Health Check verifies if the Production Ready Package is installed. For more info on this package, please see below. Warning Threshold = 10ms Critical Threshold = 15ms The Production Ready Package CAUTION The Production Ready package is only for publish mode. The Production Ready package can be installed in order to automatically perform most of the configuration steps in the Security Checklist and make the instance ready for production. The Production Ready Health Check checks if the above mentioned package is installed. If it is not, it will display a WARN message advising to install the package. © 2012 Adobe Systems Incorporated. All rights reserved. Page 9 Created on 2015-02-05 Operations Dashboard The Production Ready package covers these configuration areas: Logging • All loggers are set to the ERROR level with a default daily rolling policy. Security • Uninstalls example content and users • Configures replication and transport users. The default user used for the replication is replication-default with an empty password • Prevents Denial of Service (DoS) attacks by setting the JSON Max Results to 100 • Disables the CQ WCM Debug Filter • Automatically adjusts settings for the OSGi services that might leak internal information if not configured correctly on publish instances. For more info, see this section of the Security Checklist. Repository Garbage Collect Scheduler • The Scheduler is configured to run weekly with the "Delete" flag set to true. CAUTION Please note that there is no way for the Garbage Collection Scheduler to determine if the Data Store is shared. For such situations, it is recommended that the settings be configured manually by an administrator in order to avoid accidental data loss. You can install the Production Ready package by: 1. Going to Package Admin by going to Tools - Operations - Packaging - Packages from the AEM Welcome screen, or accessing the Package Admin directly at http://serveraddress:4502/crx/packmgr/ index.jsp 2. 3. Finding the package called productionready-config-pkg-1.0.0.zip Clicking the Install button. MONITORING WITH NAGIOS The Health Check Dashboard can integrate with Nagios via the Granite JMX Mbeans. The below example illustrates how to add a check that shows used memory on the server running AEM. 1. Setup and install Nagios on the monitoring server. © 2012 Adobe Systems Incorporated. All rights reserved. Page 10 Created on 2015-02-05 Operations Dashboard 2. Next, install the Nagios Remote Plugin Executor (NRPE). NOTE For more info on how to install Nagios and NRPE on your system, please consult the Nagios Documentation. 3. Add a host definition for the AEM server. This can be done via the Nagios XI Web Interface, by using the Configuration Manager: 1. Open a browser and point to the Nagios server. 2. Press the Configure button in the top menu. 3. In the left pane, press the Core Config Manager under Advanced Configuration. 4. Press te Hosts link under the Monitoring section. 5. Add the host definition: Below is an example of a host configuration file, in case you are using Nagios Core: define host { address 192.168.0.5 max_check_attempts 3 check_period 24x7 check-command check-host-alive contacts admin notification_interval 60 notification_period 24x7 } 4. 5. 6. Install Nagios and NRPE on the AEM server. Install the check_http_json plugin on both servers. Define a generic JSON check command on both servers: define command{ command_name check_http_json-int command_line /usr/lib/nagios/plugins/check_http_json --user "$ARG1$" --pass "$ARG2$" -u 'http://$HOSTNAME$:$ARG3$/$ARG4$' -e '$ARG5$' -w '$ARG6$' -c '$ARG7$' } 7. Add a service for used memory on the AEM server: define service { © 2012 Adobe Systems Incorporated. All rights reserved. Page 11 Created on 2015-02-05 Operations Dashboard use generic-service host_name my.remote.host service_description AEM Author Used Memory check_command check_http_json-int!<cq-user>!<cq-password>!<cqport>!system/sling/monitoring/mbeans/java/lang/Memory.infinity.json! {noname}.mbean:attributes.HeapMemoryUsage.mbean:attributes.used.mbean:value!<warn-threshold-inbytes>!<critical-threshold-in-bytes> } 8. Check your Nagios dashboard for the newly created service: DIAGNOSIS TOOLS The Operation Dashboard also provides access to Diagnosis Tools that can help finding and troubleshooting root causes of the warnings coming from the Health Check Dashboard, as well as providing important debug information for system operators. Amongst its most important features are: • A log message analyzer • The ability to access heap and thread dumps • Requests and query performance analyzers You can reach the Diagnosis Tools screen by going to Tools - Operations - Dashboard - Diagnosis from the AEM Welcome screen. You can also access the screen by directly accessing the following URL: http:// serveraddress:port/libs/granite/operations/content/diagnosis.html © 2012 Adobe Systems Incorporated. All rights reserved. Page 12 Created on 2015-02-05 Operations Dashboard Log Messages The log messages User Interface will display all ERROR messages by default. If you want to have more log messages displayed, you need to configure a logger with the appropriate log level. The log messages use an in memory log appender and therefore, are not related to the log files. Another consequence is that changing the log levels in this UI will not change the information that gets logged in the traditional log files. Adding and removing loggers in this UI will only affect the in memory logger. Also, note that changing the logger configurations will be reflected in the future of the in memory logger - the entries that are already logged and are not relevant anymore are not deleted, but similar entries will not be logged in the future. You can configure what gets logged by providing logger configurations from the upper left gear button in the UI. There, you can add, remove or update logger configurations. A logger configuration is composed of a log level (WARN / INFO / DEBUG) and a filter name. The filter name has the role of filtering the source of the log messages that get logged. Alternatively, if a logger should capture all the log messages for the specified level, the filter name should be "root". Setting the level of a logger will trigger the capture of all the messages with a level equal or higher than the one specified. Examples: • If you plan on capturing all the ERROR messages - no configuration is required. All the ERROR messages are captured by default. • If you plan on capturing all the ERROR, WARN and INFO messages - the logger name should be set to: "root", and the logger level to: INFO. • If you plan on capturing all the messages coming from a certain package (for example com.adobe.granite) - the logger name should be set to: "com.adobe.granite", and the logger level to: DEBUG (this will capture all the ERROR, WARN, INFO and DEBUG messages), as shown in the image below. NOTE You can not set a logger name to capture only ERROR messages via a specified filter. By default, all the ERROR messages are captured. NOTE The log messages user interface does not reflect the actual error log. Unless you are configuring other types of log messages in the UI, you will see ERROR messages only. For how to display specific log messages, see instructions above. © 2012 Adobe Systems Incorporated. All rights reserved. Page 13 Created on 2015-02-05 Operations Dashboard NOTE The settings in the diagnosis page do not influence what is logged to the log files and vice-versa. So, while the error log might catch INFO messages, you might not see them in the log messages UI. Also, through the UI it's possible to catch DEBUG messages from certain packages without it affecting the error log. For more information on how to configure the log files, see Logging. Request performance The Request Performance page allows the analysis of the slowest page requests processed. Only content requests will be registered on this page. More specifically, the following requests will be captured: 1. Requests accessing resources under /content 2. Requests accessing resources under /etc/design 3. Requests having the ".html" extension The page displays: • The time when the request was made • The URL and the method of request • The duration in milliseconds By default, the slowest 20 page requests are captured, but the limit can be modified in the Configuration Manager. In order to modify the number of slowest requests captured, you need to: 1. 2. 3. 4. Go to the Web Configuration Manager by accessing http://serveraddress:port/system/console/ configMgr Look for an entry called Adobe Granite Timed Requests Logger. Click on the Edit button and modify the Longest requests history size property. Save the changes. Query Performance The Query Performance page allows the analysis of the slowest queries performed by the system. This information is provided by the repository in a JMX Mbean. In Jackrabbit, the com.adobe.granite.QueryStat JMX Mbean provides this information, while in the Oak repository, it is offered by org.apache.jackrabbit.oak.QueryStats. The page displays: © 2012 Adobe Systems Incorporated. All rights reserved. Page 14 Created on 2015-02-05 Operations Dashboard • • • • • The time when the query was made The language of the query The number of times the query was issued The statement of the query The duration in milliseconds Download Status.zip This will trigger the download of a zip containing useful information about the system status and configuration. The archive contains several text files containing information such as the active services, the log files and active configurations. This is particularly useful for taking a snapshot of the system status and send it for analysis. Download Thread Dump This will trigger the download of a zip containing information about the threads present in the system. Information about each thread is provided, such as its status, the classloader and the stacktrace. Download Heap Dump You also have the ability to download a snapshot of the heap, in order to analyze it at a later time. Take note that this will trigger the download of a large file, in the order of hundreds of megabytes. AUTOMATED MAINTENANCE TASKS The Automated Maintenance Tasks page is a place where you can view and track recommended maintenance tasks scheduled for periodic execution. The tasks are integrated with the Health Check system and their execution has minimal impact on system performance. The tasks can also be manually executed from the interface. In order to get to the Maintenance page in the Operations Dashboard, you need to go to Tools - Operations - Dashboard - Maintenance from the AEM Welcome screen, or directly follow this link: http://serveraddress:port/libs/granite/operations/content/maintenance.html By default, there are two automated maintenance tasks available in the Operations Dashboard: 1. The Revision Clean Up, located under the Daily Maintenance Window menu 2. The Workflow purge, located under the Weekly Maintenance Window menu. © 2012 Adobe Systems Incorporated. All rights reserved. Page 15 Created on 2015-02-05 Operations Dashboard Revision Clean Up As data is never overwritten in a tar file, the disk usage increases even when only updating existing data. To make up for the growing size of the repository, AEM employs a garbage collection mechanism called Tar Compaction. The mechanism will reclaim disk space by removing obsolete data from the repository. By default, tar file compaction is automatically run each night between 2 am and 5 am. The automatic compaction can be triggered manually in the Operations Dashboard via a maintenance job called Revision Clean Up. To start Revision Clean Up you need to go under the Daily Maintenance Window page, hover over the Revision Clean Up window and press the Play button: The icon will turn orange to indicate that the Revision Clean Up job is running. You can stop it at any time by hovering the mouse over the icon and pressing the Stop button: NOTE The Revision Clean Up can also be triggered via the JMX Console or run from an external tool. For more info, please see this page. Workflow purge Workflows can also be purged from the Maintenance Dashboard. In order to run the Workflow Purge task, you need to: © 2012 Adobe Systems Incorporated. All rights reserved. Page 16 Created on 2015-02-05 Operations Dashboard 1. 2. Click on the Weekly Maintenance Window page. In the following page, click the Play button in the Workflow purge card. CUSTOM MAINTENANCE TASKS Custom maintenance tasks can be implemented as OSGi services. As the maintenance task infrastructure is based on Apache Sling's job handling, a maintenance task must implement the java interface org.apache.sling.event.jobs.consumer.JobExecutor. In addition, it must declare several service registration properties to be detected as a maintenance task, as listed below: Service Property Name Example Type granite.maintenance.isStoppable Boolean attribute defining whether the task can be stopped by the user. If a task declares that it is stoppable it must check during its execution whether it has been stopped and then act accordingly. The default is false. true Optional granite.maintenance.mandatory Boolean attribute defining whether a task is mandatory and must be run periodically. If a task is mandatory but currently not in any active schedule window, a Health Check will report this as an error. The default is false. true Optional granite.maintenance.name A unique name for the task - this is used to reference the task. This is usally a simple name. MyMaintenanceTask Required granite.maintenance.title A title displayed for this task My Special Maintenance Task Required job.topics This is a unique topic of the maintenance task. The Apache Sling job handling will start a job with exactly this topic to execute the maintenance task and as the task is registered for this topic it gets executed. The topic must start with com/adobe/granite/ maintenance/job/ com/adobe/granite/ maintenance/job/ MyMaintenanceTask Required © 2012 Adobe Systems Incorporated. All rights reserved. Description Page 17 Created on 2015-02-05 Operations Dashboard Apart from the above service properties, the process() method of the JobConsumer interface needs to be implemented by adding the code that should be executed for the maintance task. The provided JobExecutionContext can be used to output status information, check if the job is stopped by the user and create a result (success or failed). For situations where a maintenance task should not be run on all installations (for example, run only on the publish instance), you can make the service require a configuration in order to be active by adding @Component(policy=ConfigurationPolicy.REQUIRE). You can then mark the according configuration as being run mode dependent in the repository. For more information, see Configuring OSGi. Below is an example of a custom maintenance task that deletes files from a configurable temporary directory which have been modified in the last 24 hours: Once the service is deployed, it will be exposed to the Operations Dashboard UI and can be added to one of the available maintenance schedules: This will add a corresponding resource at /apps/granite/operations/config/maintenance/[schedule]/ [taskname]. If the task is run mode dependent, the property granite.operations.conditions.runmode needs to be set on that node with the values of the runmodes which need to be active for this maintenance task. MAINTENANCE TASKS SHIPPED WITH AEM AEM 6 ships with a default set of automated maintenance tasks. Below is a table describing the maintenance tasks and their availability for each of the storage elements present in AEM 6. Maintenance Name Task Version Purge CRX2 Tar MK Mongo MK com.day.cq.wcm.core.impl.VersionPurgeTask Yes Yes Yes © 2012 Adobe Systems Incorporated. All rights reserved. Page 18 Maintenance Remarks Window Configurable. Not enabled by default. It can be added manually by pressing the "+" button in the "Weekly Created on 2015-02-05 Operations Dashboard Maintenance Window" section of the Operations Dashboard. Workflow Purge WorkflowPurgeTask Yes DataStore Garbage Collection Yes Weekly DataStoreGarbageCollectionTask Yes No Yes Weekly Tar Compaction/ Optimization TarOptimizeTask Yes No No Daily Revision Clean Up RevisionCleanupTask No Yes Yes Daily Tar Index Merge TarIndexMergeTask Yes No No Daily © 2012 Adobe Systems Incorporated. All rights reserved. Yes Page 19 A purge configuration must first be created in order for the task to run successfully. Created on 2015-02-05
© Copyright 2024