SIEM Best Practices: Correlation Rule and Engine

McAfee SIEM
Correlation Rules and Engine Debugging
Introduction
This document is intended to outline the basic rule creation, tuning and debugging for the McAfee C orrelation
Engine. The correlation engine can reside within an Event Receiver or as the Advance C orrelation Engine (AC E)
appliance. The AC E removes the correlation overhead from the Event Receiver, allowing it operate at its maximum
ratings.
An Event Receiver is an appliance (hardware or virtual) which houses just one of the 3 possible correlation
engines. This is:

Real-Time Rules-based C orrelation for Events.
An AC E is an appliance (hardware or virtual) which houses 3 correlation engines. These are:



Real-Time Rules-based C orrelation for both Events, Flows & deviation
Risk (Event Scoring) C orrelation
Historical C orrelation
This document assumes that the reader has an basic understand of the flow of events within the McAfee SIEM
architecture, terms associated with McAfee SIEM and how to create correlation rules. An intermediate level of GUI
navigation and Linux command line tools are also required.
Conventions used in this document
correlator.sh
Indicates a command, program, code sample, or path.
0.0ms
Highlights specific elements within XML or code examples
that need to be referenced or should take note of.
NOTE:
Indicates a call for important notes or caveats. This may be
in a standalone sentence or grayed box.
What doesn’t this document provide?
This document is intended to provide the reader with an insight into debugging correlation rules and the
associated correlation engine. As such, it provides a great deal of technical information.
In order to be succinct but still providing enough information, some context is not provided.
What does that mean? This means that without expert guidance, any adjustments you make as a result of this
document could cause unintended consequences. Use any tip or trick highlighted in this document carefully.
Table of Contents
Rules
Event Flow
4
Writing Rules
5
Rule Caveats
9
Debugging a Bad Rule
13
When a rule doesn’t do what you want it to.
Locations
14
Where the Correlation Engine stores its assets.
In-Bound Events
15
Looking at the inbound events.
CPU Utilization
16
How is the engine performing?
Correlation Engine Status
17
Complete dump of the engine's vital signs.
Additional Options
23
Useful options when things are proving difficult.
Conclusion
24
Some final items of interest.
Appendix A
27
Full XML of individual Correlation Rule.
Appendix B
correlator.sh arguments list and description.
29
Rules
The Event Flow
Before we move into the detail of rule writing, here's a quick primer on how event traffic moves around the McAfee
SIEM environment. While a C orrelation Engine can reside on an Event Receiver, this diagram assumes an Advance
C orrelation Engine (AC E) is in the environment.
The diagram below illustrates the event flow.
 Event traffic is collected by the Event Receiver.
 Events are processed (Collected, Parsed, Normalized, Enriched and Aggregated) and then based
on the poll time, are sent to the ESM.
 ESM then forwards these to the AC E for possible use in one or more C orrelated Rules. These are
forwarded at roughly the same cycle as the ESM to ERC poll.
 When AC E triggers an event, its queued and sent back to the ESM via the ESM to AC E poll.
Writing Rules
Now that you have the flow of the events understood, let’s talk about writing rules. While not a difficult process,
C orrelation rule writing has a few guidelines which, if not followed, could introduce unintended results or lack thereof.
The following section is designed to provide you with a brief summary of some do’s and don’ts on rule writing.
First, we'll cover the different components used in creating a C orrelation Rule. Each of the components below can be
used individually or together within a rule.
Match Component
Simple Filtering
Complex Filtering
This is the most frequently used component and it performs a criteria match based on the elements of an event that
are contained within it. One or more filters can be within a Match C omponent. Each Match C omponent within a rule
may match separate events in order to satisfy the rule.
Deviation Component
As the label describes, this component uses the traditional model for Standard Deviation and applies this deviation to
the filters contained within the component. In addition to traditional Deviation, we’ve added Percent from Average and
Fixed Value from Average as additional comparison operators. This provides more flexibility than regular standard
deviation.
For a quick primer on Standard Deviation, see this Wiki link: http://en.wikipedia.org/wiki/Standard_deviation.
Rule Component
This is used to leverage either an existing rule (Standard or C ustom) or a set of filters that are commonly needed for
multiple rules. Six (6) predefined rule components come out of the box but you can create additional as neede d.
Gates
AND
OR
SET
As illustrated above, there are three gate possibilities. These are:



AND – All the components within this gate will have to match.
OR – Any of the components within this gate will have to match.
SET – One or more of the components within this gate will have to match. You select how many of the
available Match C omponents have to match true in order for this gate to match. Examples would be 2 of 4 and
3 of 6.
NOTE: There is an additional option called Sequence that is used on the AND or SET logical element to require the
conditions of the rule to occur in the sequence you place them in the C orrelation Logic field for the rule to be
triggered.
Gates can be used alone or in nested groups. Multiple event records may satisfy the components within a gate. But as
we will describe shortly, the more nesting that is used, the more costly from a performance standpoint they are.
Next is the “How” to use these components most effectively. This section is not designed to provide every possible
option for creating C orrelation Rules. Rather its purpose is to provide guidance on efficient use of the components and
pitfalls you might encounter.
#1 – Reducing Match Components
Below are examples of what appears to be a similar rule. Notice that the filters in each of the match components are
looking for the same values. However, in Figure 1, there are three match components versus Figure 2, where there is
just one. While each of these rules will trigger on essentially the same criteria, the Figure 1 example will behave
substantially different than Figure 2.
Figure 1
Figure 2
Here’s why:
In Figure 1, the rule as a whole will potentially fire much more often because:



C omponent 1 will match on ANY Michigan event
C omponent 2 will match on ANY Authentication event ( non-Michigan events included)
C omponent 3 will match on ANY failure ( non-Authentication and/or non-Michigan events included)
An Example:

Any event from Michigan, matches component 1
It doesn’t have to be an authentication or any event failure

Any Authentication event, from anywhere matches component 2
It doesn’t have to be from Michigan or any event failure

Any failed matches component 3
It doesn’t have to be an authentication or from Michigan
Essentially 3 unrelated events now cause the rule to fire , and you’ve just fired on 3 events you probably don’t care
about.
Whereas the example in Figure 2 is the only way to ensure that the rule would fire when you get an Authentication
Failure from Michigan from a single event.
From a performance perspective, Figure 1 would use substantially more memory because the engine keeps state
(memory allocations) on each Match C omponent (orange block) and in this example, which could be an extensive
number of memory allocations since the engine is matching on individual, unrelated events.
Whereas in Figure 2, it use less memory because it won’t maintain state ( memory allocations) until all three criteria
are within a single event.
#2 – Nesting Components
Below are two examples of a nested rule. Figure 3 is a bad example. At a minimum , this could have been combined
into a single AND gate, and preferably, it would have been a single Match C omponent. Figure 4 is a better example of
how to use nesting effectively. In this example a single Match component is combined with 3 existing Attack rules,
only one of which will have to match.
NOTE: Without a SET gate, an OR gate implies a single match among the components in the gate.
Nesting rules can be extremely effective when looking for different types of events as part of a single rule. However,
nesting uses more resources (memory and CPU) because of the overhead associated with the additional logic require
to performance the matches. Please keep that in mind when crafted nested rules and reduce the number of
components where possible to maintain an efficient rule as possible.
Bad Example
Figure 3
Figure 4
Rule Caveats
Below are items that can have the potential to induce extra or excessive resource usage (memory and C PU) and which
should be considered when crafting custom C orrelation Rules. This doesn’t mean "don’t do the items listed below";
Rather, these are highlighted so that you can be aware of the impact these items have on the overall performance of
the C orrelation Engine so that you can craft an efficient rule as possible given your criteria.
OR gates mixed with AND gates result in a lot of extra processing
This is due to the additional processing required to perform state evaluation on the overall rule. The engine will have
to check the OR and AND conditions more frequently.
A rule that checks multiple large Watch Lists
Watch Lists are stored in the database and are read into short term memory at engine execution time to prevent
frequent reads to the McAfeeEDB (McAfeeEDB is the database in which the parsed events are stored on the ESM).
This adds to memory usage, so very large lists, those with entries in the thousands or more, could use excessive
memory and be the cause of slower engine performance. Also, should a list change, it requires the current state of the
correlation rule using the changed list to be ended, the new list read in and then state to be maintained again.
NOTE: A large watch list would be one that has more than 100K entries.
Referenced rules result in more processing
Using another rule (Referenced), within a rule uses additional memory and C PU than simply including the logic from
the referenced rule. This does not mean they shouldn’t be used as the upside of leveraging existing rules is
functionally very beneficial. But should the gate of the Referenced Rule be an AND and your new rule also contains an
AND(s), it may be more beneficial to just use the logic from the Referenced Rule. This is especially true if the
Referenced Rule is only referenced once.
Figure 5 is an example of what a Rule with a reference component (green) looks like.
Figure 5
The more gates, the more CPU and Memory will be used
Nesting gates will cause additional overhead (C PU and Memory) as additional match processing is required. Ensure the
rule requires nested gating or that it can be crafted in another fashion using fewer gates.
High match rate/low fire rate/high timeout
This is where a rule with multiple Match C omponents has one Match C omponent that has a substantially larger match
rate than other components and the overall rule triggers very infrequently. These match rates, among many other
settings and values, can be checked using a script that “dumps” details of what is going on within the engine. This is
specifically outlined in the script section starting on page 18.
One really slow rule can’t be fixed with load balancing
The C orrelation Engine has the ability to auto balance its self across its processors. However, if you have a rule which
uses a less efficient methodology, this auto balance capability may not help. You can check this using the c orrelator.sh
script and that is specifically outlined in the script section towards the end of this document.
Memory usage biggest reason for slowdown
The C orrelation Engine is constrained by the available resources. Keeping the items previously mention ed is one way
of managing this C PU and Memory use. Another is to determine if the customer environment actually needs all of the
standard rules provided (176 as the writing of this document). It’s possible that a few (or more) can be disabled or at
least tune to reduce their memory and/or C PU usage.
Watch List vs. Variable
If one were trying to squeeze every last ounce of performance out of the C orrelation Engine, consider whether a
Match C omponent should use a Watch List or Variable. The Pros and C ons o f each are below.
Watch Lists



Stored in the database
Read into short-term in-memory cache to reduce queries
Slowest of the filtering options but not “slow”
See previous note on large Watch Lists
Variables





Fastest between the two
Written right into the rule itself
Not updated automatically (requires policy rollout)
Only use for things that don’t change regularly
Has a 2000 byte limitation
Rule Attributes
Some final bits of advice to ensure that your custom C orrelation Rules will performance as efficiently as possible. Each
rule has a couple attributes that have a few caveats. Below, we outline some of them and items to consider when
creating a new correlation rule.
Group By


Event whose grouped IP is null (0.0.0.0) will be ignored
Same thing for any string/numeric fields that are unset (0)
NOTE: Grouping by multiple high cardinality fields (SrcIP, DestIP, etc) may cause high memory usage.
Time Window


Parent gate must have Time Window >= child
Time Window + Time order Tolerance tracked in memory
NOTE: Time order Tolerance is outlined on the following page.
Gate Logic


Sequence is somewhat more expensive
Setting high thresholds results in more memory usage
Time Order Tolerance
Even after the Time Window on a given rule has expired, some Meta data for that rule is kept in memory for a
period of time. That time period is called Time Order Tolerance and by default this is set to 60min. Time Order
Tolerance is designed to account for events that come in out of sequence ( or late, after a rules Time Window has
expired). This uses additional memory, and depending on how events are matched ( or not) and your time thresholds
(a number of factors come into play), this could use a lot of memory.
To prevent potential excess memory usage, you can re duce the Time Order Tolerance to something less than 60 min.
The upside is that in environments that are struggling with resources, they would benefit from added memory that is
freed up.
However, and this is a big however, you need to ensure or be comfortable that events for ALL data sources will not
arrive late. Ever. If they do and they arrive after the expiration of the Time Order Tolerance, then they won’t be
included in a previously expired C orrelation Rule.
Figure 6
NOTE: If you choose to change the tolerance, it’s recommended that you reduce it gradually. The initial setting should
not be lower than 30min to ensure that you are not missing events.
Debugging a Bad Rule
Once a rule has been written and is running, you might find that it doesn’t trigger, or all correlated events seem to be
getting to the ESM a bit more slowly after the new rule has been added, or no events appear at all. When these types
of behaviors appear, customers will typically call support and work through the issue.
However, the following pages will provide the reader with a workflow to determine what might be causing the issue
and allow them to resolve the issue on their own.
NOTE: The steps outlined on the following pages are designed to help the reader debug moderate correlation engine
issues. They are not intended to be a complete debugging guide. If you attempt to go beyond the scope of this
document, you may do more harm than intended. If you are unsure, do not proceed and seek additional support.
Important Locations
Before we debug, we need to know where everything is and to note where the C orrelation Engine keeps its important
directories and files. Everything that is important (or at least covered herein) is located within a single directory
regardless if this is an Event Receiver or AC E Appliance. That directory is:
/usr/local/ace/
From here, most things correlation related can be found, checked or investigated. As with any component within
McAfee SIEM there are some files or commands that can be useful. To keep this section simple, we’ll stick to the most
important items.
The Directories
bin
Contains the correlator.sh shell script. This script can be used for a variety of
tasks, most notably to check the efficiency of the engine and individual rules.
enrichment
Contains the enrichment rules
historical
Contains the event files used for historical correlation if historical correlation
is enabled.
incoming
Stores the incoming events sent to it from the ESM.
lib
No need to look in here
log
Contains the logs of the running correlation engines. The could be useful
during debugging. We have outlined some uses within this document.
properties
Not much to see here
rules
Contains the standard and user defined correlation rules in XML format. These
are the running rules. Other copies of the rules exist in case of a corruption.
The Files
While there are a number of important files within each of these directories, the ones outlined below are worth noting.
Log/correlator.log.x
Contains the log files for the Correlation Engine. There will only
ever be 5 logs (.0 through .4) with the most recent logs in .0. You
can use the tail command to view these in real time. Adding a grep
and looking for something like word exception may provide insight
into any issues within the engine.
tail –f
coorelator.log.o | grep Exception
In-Bound Events
C hecking the number of inbound events waiting to be processed
As previously noted, the ESM forwards events from the Event Receivers (ERC ) to the C orrelation Engine. The engine
stores these event files in the incoming directory to wait for further processing. Sometimes, due to bad rules, too
many events, engine being stalled/stopped or another issue, these event files can get stacked up and the C orrelation
Engine gets behind.
To see how many event files are waiting, if any are, perform the following command from the /usr/local/ace
directory:
McAfee ACE ~ # ls –l incoming | wc -l
Depending on the EPS of your environment, your results, an example of which is below, should be a single or small
double digit number. The ultimate goal is to have as few files waiting as possible and that this number is stable or
reducing over time.
If you have a large number of files waiting, or after running the above command a couple times, the number is
growing, this could be an indicator that there is an issue that will need to be investigated further.
Figure 7
CPU Utilization
The C orrelation Engine reserves a minimum amount of resources (memory and C PU) to operate. C hecking C PU
utilization is one way to see if the engine is performing as expected or if it is in distress. The engine uses a dynamic
calculation at runtime determined the number of cores to use. The calculation is the number of cores detected, minus
2.
To monitor the C PU you can use either top or htop. Both are useful tools but display the data in slightly different
ways. htop has a slight advantage as it displays activity in a bit more colorful manner. Below are a couple of screen
shots.
What to look for? In a moderately loaded AC E, you should see the C PU percentage (First of the two highlighted lines in
the screen shot below) as a number over 100%. Maybe 300%, possibly 700%, but definitely over 99-100 %. See
Figure 9. If you ever notice that the C PU utilization at 99-100% for an extended period of time, it could mean that a
single rule is hogging cycles. The other thing to look for is that you have utilization spread across 4 C PU’s. Usage does
not have to be uniform, just that activity is across all four. See Figure 8.
Figure 8
Figure 9
Correlation Engine Status
If, after inspecting the incoming directory, the C PU usage or events are simply not showing up, you believe there is a
problem, you can look at the internals of the C orrelation Engine. The engine keeps a wide variety of statistics and
values for the engine itself as well as for the individual rules and components within the rules.
To dump the current status of the engine, execute the following script.
McAfee ACE ~ # /usr/local/ace/bin/correlator.sh –status > status.log
The correlator.sh script performs a status dump of the engine at the moment in time it is executed.
NOTE: If this script takes more than 5 minutes to execute, this could be an indication that the engine is under
performing. The values found in the output file are as of the last time the engine was restarted.
Once you have the result file, you can search it for valuable information on how the engine is performing. The
following pages provide examples of sections of the output to review when debugging your correlation eng ine.
#1 – Memory Critical
C orrelation is a memory intensive operation. The act of maintaining state on rules matching Source IP, Destination IP
or User Name, each having high cardinality can be expensive in terms of memory used. So if your experiencing slow
event generation or you believe the engine is under stress, one of the first items to check would be the
memoryCritical property.
McAfee ACE ~ # grep –A 3 ‘memoryCritical’ status.log
NOTE: The –A 3 will grab the next three lines after the grep ma tch.
<property name="memoryCritical">
<value>false (5%)</value>
<description>Is memory at a critical level</description>
</property>
If the value highlighted above is more than 50%, this could indicate the engine is under stress and that certain rules
may need to be tuned to reduce the number of activeInstances.
McAfee ACE ~ # grep -A 1 -B 1 'activeInstances' status.log
NOTE: The –A 1 and –B 1 will grab the line before and the line after the grep match.
The grep will catch the activeInstances for each rule plus one for the Engine and one for Alarms. An example of the
output is on the following page.
-<status name="Ruleset: Policy - Porn Policy Events on a Local Host">
<property name="activeInstances">
<value>2</value>
-<status name="Ruleset: GTI - DNS Communication with Malicious Host - Event or Flow">
<property name="activeInstances">
<value>520</value>
-<status name="Ruleset: Attack - Possible Conficker Worm Activity">
<property name="activeInstances">
<value>10</value>
What you are looking for is one or more rules which have the highest activeInstances values. If you see one or
more rules that are much larger than the rest of the rules, use some of the steps found further in this document to
debug the reason for this large number of active instances.
NOTE: It is possible that due to high cardinality of a specific field (SourceIP, DestinationIP or Username as examples)
that any rule, even the default rules, may have to have additional match components added to limit their
activeInstances.
#2 – Processor Balance
McAfee ACE ~ # grep ‘Rules Processor’ status.log
Your results may be:
Example 1 (possibly bad)
<status
<status
<status
<status
name="Rules
name="Rules
name="Rules
name="Rules
Processor
Processor
Processor
Processor
1
2
3
4
(1 rules)">
(29 rules)">
(59 rules)">
(91 rules)">
Processor
Processor
Processor
Processor
1
2
3
4
(25
(42
(52
(61
- or Example 2 (good)
<status
<status
<status
<status
name="Rules
name="Rules
name="Rules
name="Rules
rules)">
rules)">
rules)">
rules)">
The difference between these examples is the noticeable imbalance among the processors. In the first example,
Processor 1 only has a single rule running against it. This could indicate that a single rule is inefficient and is 'hogging'
C PU and may be preventing other rules from executing in a timely manner.
The second example displays a better balance of the rules. While this is a good thing, there may still be an issue that
needs to be investigated.
NOTE: The balance of the rules across C PU’s does not have to be equal. As long as each C PU has multiple rules, the
engine has balanced them as it needs to.
#3 – Which Rule is on Which Processor
The “mate” to the previous grep is one looking for timeSpent. The example is:
McAfee ACE ~ # grep 'timeSpent=' status.log
<value>rulesProcessor1.timeSpent=48851.7ms</value>
<value>Correlation Engine-47-4000004.timeSpent=35425.8ms</value>
..
..
<value>Correlation Engine-47-4000013.timeSpent=13425.9ms</value>
<value>rulesProcessor2.timeSpent=9310.3ms</value>
<value>Correlation Engine-47-4000014.timeSpent=8765.1ms</value>
..
..
<value>Correlation Engine-47-4000023.timeSpent=545.2ms</value>
The C orrelation Engine prioritizes rules by their expense (processing time), so this particular element shows which
rule is taking the most processing time and which core it is using. This element i s ordered in processor sequence and
the C orrelation Engine will always put the most expense rule first within each processor group.
Thus the first rule on processor 1 will be the most expensive rule overall. Because of this, you should easily be able to
determine which rule is using the most processing time.
In the example above, note the time spent for each rule (red), the time spent for each processor (pink) and the
Signature ID (green).
NOTE: timeSpent is reset each time the rule set is rebalanced (auto or manually). This is done to be more sensitive to
performance changes in the engine. As an example, a rule that was slow last week, may not be slow this week. In
addition, when new rules are added timeSpent is reset because the rules need to be bala nced across processors based
on current processing time against the whole rule base.
NOTE: If you see an extremely large number and it is in E-Notation (scientific or exponential notation. This occurs
with numbers greater than 15 digits), then you can be pretty certain that this rule is very expensive and should be
reviewed.
Determining Rule Performance
Once you have identified the offending rule based on its C PU usage ( previous page), how can you determine how it’s
performing? Is the rule logic performing as intended? To find out, we need to go further into the output of the
correlator.sh script.
As mentioned in the Writing Rules section (page 5), the state for each Rule and each Match C omponent within each
Rule is maintained in memory and the C orrelation Engine keeps statistics on these. We can view these statistics in the
output of the correlator.sh script.
Let’s use the Bad Rule example from page 7. In Figure 10, you see three Match C omponents.
Figure 10
From the previous example of the grep, you are able to determine Time Spent and the Signature ID of the rule in
question. While grep is an excellent tool, sometimes just looking through the output is helpful as well. So for this
section, we’ll edit the output of the correlator.sh script. An example using vi is below.
McAfee ACE ~ # vi status.log
NOTE: To search for the Signature ID of the correlated event you are interest in use a slash (/) followed by the value
you are searching for. Once you’ve locate d the string you entered, use a lower case n to continue the search. During
the search, you may match the Signature ID a number of times.
For this example, you are looking for the <status name> element which will have the Rule Name in it. You may pass
<status name=> by a couple lines as your search is matching on Signature ID.
<status name="Ruleset: Account Sharing - Bad Rule">
..
..
<value>Correlation Engine-47-4000004.totalInstances=0</value>
Next, scroll down towards the end of this </status> element and until you locate the <status name="rule_1">
element. Once you find this section of code, you can examine how the rule components are matching the inbound
events and use this information to determine if one or more of the Match C omponents need adjustment. The next
page has an example of what we are looking for. Also, s ee Appendix A for a complete rule XML example.
Using our example, three are (3) Match C omponents. So in the output for this rule, we will have three <status
name=rule_ elements, each corresponding to a Match C omponent. This means that:
<status name="rule_1"> matches
<status name="rule_2"> matches
<status name="rule_3"> matches
Knowing this, we can see each components statistics and how it is performing. The XML Example:
<status name="rule_1">
<property name="matchAttempts">
<value>423454245</value>
<description>Number of match attempts</description>
</property>
<property name="matches">
<value>0</value>
<description>Number of matches</description>
</property>
</status>
<status name="rule_2">
<property name="matchAttempts">
<value>423454245</value>
<description>Number of match attempts</description>
</property>
<property name="matches">
<value>0</value>
<description>Number of matches</description>
</property>
</status>
<status name="rule_3">
<property name="matchAttempts">
<value>423454245</value>
<description>Number of match attempts</description>
</property>
<property name="matches">
<value>243489245</value>
<description>Number of matches</description>
</property>
</status>
What we see from our example on the previous page is that Rule 1 and Rule 2 have had 423 million ( + or -) match
attempts, but nothing has matched either component. Whereas Rule 3 has had the same number of attempts, but has
matched 243 million times. This tells us three things.
1.
Since these Match C omponents are individually defined versus all in a single Match C omponent, the engine will
be using a lot of processing (CPU) attempting to match each component individually.
2. When a match does occur in Rule 3, it’s maintaining state (memory) for almost half of the events it’s seeing.
This is aggressive and will be expensive in term of memory use.
And most importantly
3.
It may never trigger. This is because while Rule 1 and Rule 2 have seen 400M+ events, nothing has matched
even though Rule 3 has seen almost a 50% match rate. In other words, if Rule 1 and Rule 2 haven’t matched
by now, they may never. Thus Rule 3 is keeping memory state on over half of what it is seeing without any
change of a match (rule trigger)
The solution in this example is for this rule is to combine the Match C omponents into a single component. This will
reduce memory usage and improve C PU utilization thus ensuring the engine can run as efficiently as possible.
NOTE: This is just one example of rule tuning. Each rule will behave differently and may require different tuning. But
using the steps outline here, it’s straight forward to see where a rule has gone 'wrong'.
Additional Options
Sometimes even after you indentify the rule, disable, it and then roll policy out, the Engine is busy determining how to
catch up. Alternatively, your experience tells you that the engine has a corrupt Rule XML file, or you want to delete
the queued up events and start from scratch.
If these reasons apply in your environment, you can force the situation a bit. You do this by killing the java process
which is running the engine. Generally there is no harm in doing this as there is another process which will start the
engine immediately without user intervention.
To do this, first you need to determine what the Process ID is. You can use one of two commands. These are:
McAfee ACE ~ # ps –ef
– or –
McAfee ACE ~ # ps –ef | grep java
Figure 11 is the output of the first example with the Java process (red) and the Process ID (yellow) highlighted:
Figure 11
Now that you have the Process ID, all you have to do issue a kill command. However, there is a process to issuing the
kill. The steps are:
1.
2.
Disable the offending Rule via the GUI.
Kill the process. An example of the command is:
McAfee ACE ~ # kill -9 3337
3.
Rollout Policy to the C orrelation Engine
NOTE: Once the process terminates, the engine will automatically restart and begin processing any backlogged files
in the incoming directory. If you want to have a fresh start without processing the backlog of events, you could also
delete the contents of the incoming directory. Also Rolling out policy for the engine is recommended, but is not
required to have the engine continue processing.
Conclusion
Finally, here are a couple sections contained within the correlator.sh output.
Cores Used and Detected
As we’ve previously mentioned, the C orrelation Engine dynamically determines the number of cores to use. That
calculation is the number of cores detected minus 2. This section shows you what was detected and was is used.
<property name="coresUsed">
<value>2</value>
<description>The number of CPU cores being used</description>
</property>
<property name="totalCores">
<value>4</value>
<description>Total detected CPU cores</description>
</property>
Rules Correlation EPS
Like dsstatus on a Receiver, the C orrelation Engine also keeps track of its processing performance. This section
provides you with the live EPS and Total Events Processed as of the execution of the script. The EPS number here will
be what the EPS was at the moment the status was run. It could be low or high and should not be viewed as the EPS
of the engine. See page 25 for the processing record counts in the logs for more accurate EPS of the engine.
<status name="Correlation Thread Status">
<status name="Correlation Thread 0">
<property name="Processing Rate (EPS)">
<value>200</value>
<description>Processing Rate (EPS)</description>
</property>
<property name="Total Events Processed">
<value>8012314107</value>
<description>Total Events Processed</description>
</property>
..
..
</status>
</status>
As mentioned on page 14, the logs are located in /usr/local/ace/logs. These files (there are 6 of them) store
some very important information on the processing and to some extent, the performance of the engine. The next few
examples will allow you to weed through these to pick out some good information.
Grep for AddFile in the logs
On occasion, after you have done the investigation on the previous pages, everything looks OK but still nothing is
being correlated. While this could be normal and events are just not triggering the rules. It could mean that
something is wrong.
The command below will look through the logs to see if files are getting added to the Engine from the ESM:
McAfee ACE ~ # grep AddFile correlator.log.0
The results will look something like Figure 12. With a number of entries listed. This would mean that events are
getting to the engine, but for some reason the engine is ignoring them. One easy step to take is to Roll Policy out to
the engine to make sure it has the more recent rules.
Figure 12
Grep for files processed in the logs
Once you know that files are making it to the engine, this next command will check to see if the files are being
processed. The syntax is:
McAfee ACE ~ # grep 'record counts' correlator.log.0
The results are in Figure 13. If you have one or more entries in the log and they are recent events, you can be
assured that the engine is processing events. The event counts here are compressed. Multiplying these b y your
aggregation rate can provide an estimate of the event rate on the engine. McAfee uses a 10:1 default aggregation
rate, however you rate will be different.
Figure 13
Appendix A - Full Rule Element in XML
<status name="Ruleset: Account Sharing - Bad Rule">
<property name="activeInstances">
<value>0</value>
<description>Total active instances, first instance will be shown in status</description>
</property>
<property name="totalInstancesCreated">
<value>Correlation Engine-47-4000004.totalInstances=0</value>
<description>Total instances created</description>
</property>
<property name="deadInstancesRemoved">
<value>0</value>
<description>Dead instances removed</description>
</property>
<property name="totalTimeSpent">
<value>Correlation Engine-47-4000004.timeSpent=2.160236ms</value>
<description>Time spent in this ruleset</description>
</property>
<property name="timeSpentProcessing">
<value>Correlation Engine-47-4000004.timeSpentProcessing=0.0ms</value>
<description>Time spent in this rule within an instance</description>
</property>
<property name="timeSpentMatching">
<value>Correlation Engine-47-4000004.timeSpentMatching=0.287428ms</value>
<description>Time spent seeing if an events matches a new instance</description>
</property>
<property name="timeSpentCreatingInstances">
<value>Correlation Engine-47-4000004.timeSpentInstantiating=0.0ms</value>
<description>Time spent creating instances</description>
</property>
<property name="timeSpentOther">
<value>Correlation Engine-47-4000004.timeSpentOther=0.200094ms</value>
<description>Time spent doing everything else before match/processing</description>
</property>
<property name="firings">
<value>0</value>
<description>Number of times this ruleset fired</description>
</property>
<property name="upcomingTriggerChecks">
<value>0</value>
<description>upcomingTriggerChecks</description>
</property>
<property name="cooldown">
<value>0</value>
<description>cooldown</description>
</property>
<property name="immediateMode">
<value>false</value>
<description>immediateMode</description>
</property>
<status name="Default"/>
<status name="rule_1">
<property name="matchAttempts">
<value>423454245</value>
<description>Number of match attempts</description>
</property>
<property name="matches">
<value>0</value>
<description>Number of matches</description>
</property>
</status>
<status name="rule_2">
<property name="matchAttempts">
<value>423454245</value>
<description>Number of match attempts</description>
</property>
<property name="matches">
<value>0</value>
<description>Number of matches</description>
</property>
</status>
<status name="rule_3">
<property name="matchAttempts">
<value>423454245</value>
<description>Number of match attempts</description>
</property>
<property name="matches">
<value>243489245</value>
<description>Number of matches</description>
</property>
</status>
</status>
Appendix B - Arguments for the correlator .sh script
NOTE: C olumns which are grayed out are outlined here for informational purposes and are NOT intended for general use without support assistance.
-add <file | file uncompressed | file <type> | file <type> uncompressed>
A dd a n ew event file to be processed. On e could strip ou t a file from the i ncoming
di rectory and add it in. Typically used in debugging.
-clean <true|false>
Un u sed / Future use
-customFields <path>
Fl agged used at startup. Internal configuration file.
-eventTypes <eventTypePath>
-globals <path>
Fl agged used at startup. Internal configuration file.
-gui
Un u sed / Future use
-idleCheck
Un u sed / Future use
-incoming <path>
Tel l s the engine to u se a different incoming location for the events from the ESM.
Used for debugging on ly.
-managers <managers>
Fl agged used at startup. Internal configuration file.
-out <path>
Fl agged used at startup. Internal configuration file.
-port <port>
Tel l s Correlation Engine what port to listen for commands. In ternal use on ly
-processcfg <processcfg>
Fl agged used at startup. Internal configuration file.
-rebalance
Forces en gine to re-balance i ts u sage across the threads. This is supposed to be
performed as a normal process, however, should an imbalance occur, using this can
force th e engine to perform the balance immediately.
-reload
Used to rel oad certain con figuration or rules files.
-reset
Wi l l do a full engine reset. Not recommended for use.
-run
Fl agged used at startup. Internal configuration file.
-scoring <scoring>
Fl agged used at startup. Internal configuration file.
-shutdownStatus
Pai red with the shutdown command and i ts qu eries the correlation engine to see if
i t h as shutdown yet. It could take time to save i ts state file.
Appendix B – Arguments for the correlator.sh script
NOTE: C olumns which are grayed out are outlined here for informational purposes and are NOT intended for general use without support assistance.
-start
Starts the en gine i f it had stopped on its own or i f the stop augment had been used
prev i ously.
Du m ps the status of the correlation engine into a file to allow administrators to
i n v estigate how the engine and i ndividual rules are performing. Syntax is:
-status
# ./usr/local/ace/bin/correlator.sh –status > /tmp/status.log
See prev ious section on use and values of interest.
-stop
Stops th e correlation engine. Once it stops, it will automatically restart the engine.
Th i s i s similar to the kill outlined on page 22.
-testRules <path>
Wi l l test a Rules XML file. Also used during a pol icy push (rollout) in the GUI.
-thirdParty <path>
Fl agged used at startup. Internal configuration file.
-update <enrichmentFile>
Wi l l u pdate a Watch List or Enrichment framework manual. Not recommended for
u se.
-upgrade <path>
Un u sed