McAfee SIEM Correlation Rules and Engine Debugging Introduction This document is intended to outline the basic rule creation, tuning and debugging for the McAfee C orrelation Engine. The correlation engine can reside within an Event Receiver or as the Advance C orrelation Engine (AC E) appliance. The AC E removes the correlation overhead from the Event Receiver, allowing it operate at its maximum ratings. An Event Receiver is an appliance (hardware or virtual) which houses just one of the 3 possible correlation engines. This is: Real-Time Rules-based C orrelation for Events. An AC E is an appliance (hardware or virtual) which houses 3 correlation engines. These are: Real-Time Rules-based C orrelation for both Events, Flows & deviation Risk (Event Scoring) C orrelation Historical C orrelation This document assumes that the reader has an basic understand of the flow of events within the McAfee SIEM architecture, terms associated with McAfee SIEM and how to create correlation rules. An intermediate level of GUI navigation and Linux command line tools are also required. Conventions used in this document correlator.sh Indicates a command, program, code sample, or path. 0.0ms Highlights specific elements within XML or code examples that need to be referenced or should take note of. NOTE: Indicates a call for important notes or caveats. This may be in a standalone sentence or grayed box. What doesn’t this document provide? This document is intended to provide the reader with an insight into debugging correlation rules and the associated correlation engine. As such, it provides a great deal of technical information. In order to be succinct but still providing enough information, some context is not provided. What does that mean? This means that without expert guidance, any adjustments you make as a result of this document could cause unintended consequences. Use any tip or trick highlighted in this document carefully. Table of Contents Rules Event Flow 4 Writing Rules 5 Rule Caveats 9 Debugging a Bad Rule 13 When a rule doesn’t do what you want it to. Locations 14 Where the Correlation Engine stores its assets. In-Bound Events 15 Looking at the inbound events. CPU Utilization 16 How is the engine performing? Correlation Engine Status 17 Complete dump of the engine's vital signs. Additional Options 23 Useful options when things are proving difficult. Conclusion 24 Some final items of interest. Appendix A 27 Full XML of individual Correlation Rule. Appendix B correlator.sh arguments list and description. 29 Rules The Event Flow Before we move into the detail of rule writing, here's a quick primer on how event traffic moves around the McAfee SIEM environment. While a C orrelation Engine can reside on an Event Receiver, this diagram assumes an Advance C orrelation Engine (AC E) is in the environment. The diagram below illustrates the event flow. Event traffic is collected by the Event Receiver. Events are processed (Collected, Parsed, Normalized, Enriched and Aggregated) and then based on the poll time, are sent to the ESM. ESM then forwards these to the AC E for possible use in one or more C orrelated Rules. These are forwarded at roughly the same cycle as the ESM to ERC poll. When AC E triggers an event, its queued and sent back to the ESM via the ESM to AC E poll. Writing Rules Now that you have the flow of the events understood, let’s talk about writing rules. While not a difficult process, C orrelation rule writing has a few guidelines which, if not followed, could introduce unintended results or lack thereof. The following section is designed to provide you with a brief summary of some do’s and don’ts on rule writing. First, we'll cover the different components used in creating a C orrelation Rule. Each of the components below can be used individually or together within a rule. Match Component Simple Filtering Complex Filtering This is the most frequently used component and it performs a criteria match based on the elements of an event that are contained within it. One or more filters can be within a Match C omponent. Each Match C omponent within a rule may match separate events in order to satisfy the rule. Deviation Component As the label describes, this component uses the traditional model for Standard Deviation and applies this deviation to the filters contained within the component. In addition to traditional Deviation, we’ve added Percent from Average and Fixed Value from Average as additional comparison operators. This provides more flexibility than regular standard deviation. For a quick primer on Standard Deviation, see this Wiki link: http://en.wikipedia.org/wiki/Standard_deviation. Rule Component This is used to leverage either an existing rule (Standard or C ustom) or a set of filters that are commonly needed for multiple rules. Six (6) predefined rule components come out of the box but you can create additional as neede d. Gates AND OR SET As illustrated above, there are three gate possibilities. These are: AND – All the components within this gate will have to match. OR – Any of the components within this gate will have to match. SET – One or more of the components within this gate will have to match. You select how many of the available Match C omponents have to match true in order for this gate to match. Examples would be 2 of 4 and 3 of 6. NOTE: There is an additional option called Sequence that is used on the AND or SET logical element to require the conditions of the rule to occur in the sequence you place them in the C orrelation Logic field for the rule to be triggered. Gates can be used alone or in nested groups. Multiple event records may satisfy the components within a gate. But as we will describe shortly, the more nesting that is used, the more costly from a performance standpoint they are. Next is the “How” to use these components most effectively. This section is not designed to provide every possible option for creating C orrelation Rules. Rather its purpose is to provide guidance on efficient use of the components and pitfalls you might encounter. #1 – Reducing Match Components Below are examples of what appears to be a similar rule. Notice that the filters in each of the match components are looking for the same values. However, in Figure 1, there are three match components versus Figure 2, where there is just one. While each of these rules will trigger on essentially the same criteria, the Figure 1 example will behave substantially different than Figure 2. Figure 1 Figure 2 Here’s why: In Figure 1, the rule as a whole will potentially fire much more often because: C omponent 1 will match on ANY Michigan event C omponent 2 will match on ANY Authentication event ( non-Michigan events included) C omponent 3 will match on ANY failure ( non-Authentication and/or non-Michigan events included) An Example: Any event from Michigan, matches component 1 It doesn’t have to be an authentication or any event failure Any Authentication event, from anywhere matches component 2 It doesn’t have to be from Michigan or any event failure Any failed matches component 3 It doesn’t have to be an authentication or from Michigan Essentially 3 unrelated events now cause the rule to fire , and you’ve just fired on 3 events you probably don’t care about. Whereas the example in Figure 2 is the only way to ensure that the rule would fire when you get an Authentication Failure from Michigan from a single event. From a performance perspective, Figure 1 would use substantially more memory because the engine keeps state (memory allocations) on each Match C omponent (orange block) and in this example, which could be an extensive number of memory allocations since the engine is matching on individual, unrelated events. Whereas in Figure 2, it use less memory because it won’t maintain state ( memory allocations) until all three criteria are within a single event. #2 – Nesting Components Below are two examples of a nested rule. Figure 3 is a bad example. At a minimum , this could have been combined into a single AND gate, and preferably, it would have been a single Match C omponent. Figure 4 is a better example of how to use nesting effectively. In this example a single Match component is combined with 3 existing Attack rules, only one of which will have to match. NOTE: Without a SET gate, an OR gate implies a single match among the components in the gate. Nesting rules can be extremely effective when looking for different types of events as part of a single rule. However, nesting uses more resources (memory and CPU) because of the overhead associated with the additional logic require to performance the matches. Please keep that in mind when crafted nested rules and reduce the number of components where possible to maintain an efficient rule as possible. Bad Example Figure 3 Figure 4 Rule Caveats Below are items that can have the potential to induce extra or excessive resource usage (memory and C PU) and which should be considered when crafting custom C orrelation Rules. This doesn’t mean "don’t do the items listed below"; Rather, these are highlighted so that you can be aware of the impact these items have on the overall performance of the C orrelation Engine so that you can craft an efficient rule as possible given your criteria. OR gates mixed with AND gates result in a lot of extra processing This is due to the additional processing required to perform state evaluation on the overall rule. The engine will have to check the OR and AND conditions more frequently. A rule that checks multiple large Watch Lists Watch Lists are stored in the database and are read into short term memory at engine execution time to prevent frequent reads to the McAfeeEDB (McAfeeEDB is the database in which the parsed events are stored on the ESM). This adds to memory usage, so very large lists, those with entries in the thousands or more, could use excessive memory and be the cause of slower engine performance. Also, should a list change, it requires the current state of the correlation rule using the changed list to be ended, the new list read in and then state to be maintained again. NOTE: A large watch list would be one that has more than 100K entries. Referenced rules result in more processing Using another rule (Referenced), within a rule uses additional memory and C PU than simply including the logic from the referenced rule. This does not mean they shouldn’t be used as the upside of leveraging existing rules is functionally very beneficial. But should the gate of the Referenced Rule be an AND and your new rule also contains an AND(s), it may be more beneficial to just use the logic from the Referenced Rule. This is especially true if the Referenced Rule is only referenced once. Figure 5 is an example of what a Rule with a reference component (green) looks like. Figure 5 The more gates, the more CPU and Memory will be used Nesting gates will cause additional overhead (C PU and Memory) as additional match processing is required. Ensure the rule requires nested gating or that it can be crafted in another fashion using fewer gates. High match rate/low fire rate/high timeout This is where a rule with multiple Match C omponents has one Match C omponent that has a substantially larger match rate than other components and the overall rule triggers very infrequently. These match rates, among many other settings and values, can be checked using a script that “dumps” details of what is going on within the engine. This is specifically outlined in the script section starting on page 18. One really slow rule can’t be fixed with load balancing The C orrelation Engine has the ability to auto balance its self across its processors. However, if you have a rule which uses a less efficient methodology, this auto balance capability may not help. You can check this using the c orrelator.sh script and that is specifically outlined in the script section towards the end of this document. Memory usage biggest reason for slowdown The C orrelation Engine is constrained by the available resources. Keeping the items previously mention ed is one way of managing this C PU and Memory use. Another is to determine if the customer environment actually needs all of the standard rules provided (176 as the writing of this document). It’s possible that a few (or more) can be disabled or at least tune to reduce their memory and/or C PU usage. Watch List vs. Variable If one were trying to squeeze every last ounce of performance out of the C orrelation Engine, consider whether a Match C omponent should use a Watch List or Variable. The Pros and C ons o f each are below. Watch Lists Stored in the database Read into short-term in-memory cache to reduce queries Slowest of the filtering options but not “slow” See previous note on large Watch Lists Variables Fastest between the two Written right into the rule itself Not updated automatically (requires policy rollout) Only use for things that don’t change regularly Has a 2000 byte limitation Rule Attributes Some final bits of advice to ensure that your custom C orrelation Rules will performance as efficiently as possible. Each rule has a couple attributes that have a few caveats. Below, we outline some of them and items to consider when creating a new correlation rule. Group By Event whose grouped IP is null (0.0.0.0) will be ignored Same thing for any string/numeric fields that are unset (0) NOTE: Grouping by multiple high cardinality fields (SrcIP, DestIP, etc) may cause high memory usage. Time Window Parent gate must have Time Window >= child Time Window + Time order Tolerance tracked in memory NOTE: Time order Tolerance is outlined on the following page. Gate Logic Sequence is somewhat more expensive Setting high thresholds results in more memory usage Time Order Tolerance Even after the Time Window on a given rule has expired, some Meta data for that rule is kept in memory for a period of time. That time period is called Time Order Tolerance and by default this is set to 60min. Time Order Tolerance is designed to account for events that come in out of sequence ( or late, after a rules Time Window has expired). This uses additional memory, and depending on how events are matched ( or not) and your time thresholds (a number of factors come into play), this could use a lot of memory. To prevent potential excess memory usage, you can re duce the Time Order Tolerance to something less than 60 min. The upside is that in environments that are struggling with resources, they would benefit from added memory that is freed up. However, and this is a big however, you need to ensure or be comfortable that events for ALL data sources will not arrive late. Ever. If they do and they arrive after the expiration of the Time Order Tolerance, then they won’t be included in a previously expired C orrelation Rule. Figure 6 NOTE: If you choose to change the tolerance, it’s recommended that you reduce it gradually. The initial setting should not be lower than 30min to ensure that you are not missing events. Debugging a Bad Rule Once a rule has been written and is running, you might find that it doesn’t trigger, or all correlated events seem to be getting to the ESM a bit more slowly after the new rule has been added, or no events appear at all. When these types of behaviors appear, customers will typically call support and work through the issue. However, the following pages will provide the reader with a workflow to determine what might be causing the issue and allow them to resolve the issue on their own. NOTE: The steps outlined on the following pages are designed to help the reader debug moderate correlation engine issues. They are not intended to be a complete debugging guide. If you attempt to go beyond the scope of this document, you may do more harm than intended. If you are unsure, do not proceed and seek additional support. Important Locations Before we debug, we need to know where everything is and to note where the C orrelation Engine keeps its important directories and files. Everything that is important (or at least covered herein) is located within a single directory regardless if this is an Event Receiver or AC E Appliance. That directory is: /usr/local/ace/ From here, most things correlation related can be found, checked or investigated. As with any component within McAfee SIEM there are some files or commands that can be useful. To keep this section simple, we’ll stick to the most important items. The Directories bin Contains the correlator.sh shell script. This script can be used for a variety of tasks, most notably to check the efficiency of the engine and individual rules. enrichment Contains the enrichment rules historical Contains the event files used for historical correlation if historical correlation is enabled. incoming Stores the incoming events sent to it from the ESM. lib No need to look in here log Contains the logs of the running correlation engines. The could be useful during debugging. We have outlined some uses within this document. properties Not much to see here rules Contains the standard and user defined correlation rules in XML format. These are the running rules. Other copies of the rules exist in case of a corruption. The Files While there are a number of important files within each of these directories, the ones outlined below are worth noting. Log/correlator.log.x Contains the log files for the Correlation Engine. There will only ever be 5 logs (.0 through .4) with the most recent logs in .0. You can use the tail command to view these in real time. Adding a grep and looking for something like word exception may provide insight into any issues within the engine. tail –f coorelator.log.o | grep Exception In-Bound Events C hecking the number of inbound events waiting to be processed As previously noted, the ESM forwards events from the Event Receivers (ERC ) to the C orrelation Engine. The engine stores these event files in the incoming directory to wait for further processing. Sometimes, due to bad rules, too many events, engine being stalled/stopped or another issue, these event files can get stacked up and the C orrelation Engine gets behind. To see how many event files are waiting, if any are, perform the following command from the /usr/local/ace directory: McAfee ACE ~ # ls –l incoming | wc -l Depending on the EPS of your environment, your results, an example of which is below, should be a single or small double digit number. The ultimate goal is to have as few files waiting as possible and that this number is stable or reducing over time. If you have a large number of files waiting, or after running the above command a couple times, the number is growing, this could be an indicator that there is an issue that will need to be investigated further. Figure 7 CPU Utilization The C orrelation Engine reserves a minimum amount of resources (memory and C PU) to operate. C hecking C PU utilization is one way to see if the engine is performing as expected or if it is in distress. The engine uses a dynamic calculation at runtime determined the number of cores to use. The calculation is the number of cores detected, minus 2. To monitor the C PU you can use either top or htop. Both are useful tools but display the data in slightly different ways. htop has a slight advantage as it displays activity in a bit more colorful manner. Below are a couple of screen shots. What to look for? In a moderately loaded AC E, you should see the C PU percentage (First of the two highlighted lines in the screen shot below) as a number over 100%. Maybe 300%, possibly 700%, but definitely over 99-100 %. See Figure 9. If you ever notice that the C PU utilization at 99-100% for an extended period of time, it could mean that a single rule is hogging cycles. The other thing to look for is that you have utilization spread across 4 C PU’s. Usage does not have to be uniform, just that activity is across all four. See Figure 8. Figure 8 Figure 9 Correlation Engine Status If, after inspecting the incoming directory, the C PU usage or events are simply not showing up, you believe there is a problem, you can look at the internals of the C orrelation Engine. The engine keeps a wide variety of statistics and values for the engine itself as well as for the individual rules and components within the rules. To dump the current status of the engine, execute the following script. McAfee ACE ~ # /usr/local/ace/bin/correlator.sh –status > status.log The correlator.sh script performs a status dump of the engine at the moment in time it is executed. NOTE: If this script takes more than 5 minutes to execute, this could be an indication that the engine is under performing. The values found in the output file are as of the last time the engine was restarted. Once you have the result file, you can search it for valuable information on how the engine is performing. The following pages provide examples of sections of the output to review when debugging your correlation eng ine. #1 – Memory Critical C orrelation is a memory intensive operation. The act of maintaining state on rules matching Source IP, Destination IP or User Name, each having high cardinality can be expensive in terms of memory used. So if your experiencing slow event generation or you believe the engine is under stress, one of the first items to check would be the memoryCritical property. McAfee ACE ~ # grep –A 3 ‘memoryCritical’ status.log NOTE: The –A 3 will grab the next three lines after the grep ma tch. <property name="memoryCritical"> <value>false (5%)</value> <description>Is memory at a critical level</description> </property> If the value highlighted above is more than 50%, this could indicate the engine is under stress and that certain rules may need to be tuned to reduce the number of activeInstances. McAfee ACE ~ # grep -A 1 -B 1 'activeInstances' status.log NOTE: The –A 1 and –B 1 will grab the line before and the line after the grep match. The grep will catch the activeInstances for each rule plus one for the Engine and one for Alarms. An example of the output is on the following page. -<status name="Ruleset: Policy - Porn Policy Events on a Local Host"> <property name="activeInstances"> <value>2</value> -<status name="Ruleset: GTI - DNS Communication with Malicious Host - Event or Flow"> <property name="activeInstances"> <value>520</value> -<status name="Ruleset: Attack - Possible Conficker Worm Activity"> <property name="activeInstances"> <value>10</value> What you are looking for is one or more rules which have the highest activeInstances values. If you see one or more rules that are much larger than the rest of the rules, use some of the steps found further in this document to debug the reason for this large number of active instances. NOTE: It is possible that due to high cardinality of a specific field (SourceIP, DestinationIP or Username as examples) that any rule, even the default rules, may have to have additional match components added to limit their activeInstances. #2 – Processor Balance McAfee ACE ~ # grep ‘Rules Processor’ status.log Your results may be: Example 1 (possibly bad) <status <status <status <status name="Rules name="Rules name="Rules name="Rules Processor Processor Processor Processor 1 2 3 4 (1 rules)"> (29 rules)"> (59 rules)"> (91 rules)"> Processor Processor Processor Processor 1 2 3 4 (25 (42 (52 (61 - or Example 2 (good) <status <status <status <status name="Rules name="Rules name="Rules name="Rules rules)"> rules)"> rules)"> rules)"> The difference between these examples is the noticeable imbalance among the processors. In the first example, Processor 1 only has a single rule running against it. This could indicate that a single rule is inefficient and is 'hogging' C PU and may be preventing other rules from executing in a timely manner. The second example displays a better balance of the rules. While this is a good thing, there may still be an issue that needs to be investigated. NOTE: The balance of the rules across C PU’s does not have to be equal. As long as each C PU has multiple rules, the engine has balanced them as it needs to. #3 – Which Rule is on Which Processor The “mate” to the previous grep is one looking for timeSpent. The example is: McAfee ACE ~ # grep 'timeSpent=' status.log <value>rulesProcessor1.timeSpent=48851.7ms</value> <value>Correlation Engine-47-4000004.timeSpent=35425.8ms</value> .. .. <value>Correlation Engine-47-4000013.timeSpent=13425.9ms</value> <value>rulesProcessor2.timeSpent=9310.3ms</value> <value>Correlation Engine-47-4000014.timeSpent=8765.1ms</value> .. .. <value>Correlation Engine-47-4000023.timeSpent=545.2ms</value> The C orrelation Engine prioritizes rules by their expense (processing time), so this particular element shows which rule is taking the most processing time and which core it is using. This element i s ordered in processor sequence and the C orrelation Engine will always put the most expense rule first within each processor group. Thus the first rule on processor 1 will be the most expensive rule overall. Because of this, you should easily be able to determine which rule is using the most processing time. In the example above, note the time spent for each rule (red), the time spent for each processor (pink) and the Signature ID (green). NOTE: timeSpent is reset each time the rule set is rebalanced (auto or manually). This is done to be more sensitive to performance changes in the engine. As an example, a rule that was slow last week, may not be slow this week. In addition, when new rules are added timeSpent is reset because the rules need to be bala nced across processors based on current processing time against the whole rule base. NOTE: If you see an extremely large number and it is in E-Notation (scientific or exponential notation. This occurs with numbers greater than 15 digits), then you can be pretty certain that this rule is very expensive and should be reviewed. Determining Rule Performance Once you have identified the offending rule based on its C PU usage ( previous page), how can you determine how it’s performing? Is the rule logic performing as intended? To find out, we need to go further into the output of the correlator.sh script. As mentioned in the Writing Rules section (page 5), the state for each Rule and each Match C omponent within each Rule is maintained in memory and the C orrelation Engine keeps statistics on these. We can view these statistics in the output of the correlator.sh script. Let’s use the Bad Rule example from page 7. In Figure 10, you see three Match C omponents. Figure 10 From the previous example of the grep, you are able to determine Time Spent and the Signature ID of the rule in question. While grep is an excellent tool, sometimes just looking through the output is helpful as well. So for this section, we’ll edit the output of the correlator.sh script. An example using vi is below. McAfee ACE ~ # vi status.log NOTE: To search for the Signature ID of the correlated event you are interest in use a slash (/) followed by the value you are searching for. Once you’ve locate d the string you entered, use a lower case n to continue the search. During the search, you may match the Signature ID a number of times. For this example, you are looking for the <status name> element which will have the Rule Name in it. You may pass <status name=> by a couple lines as your search is matching on Signature ID. <status name="Ruleset: Account Sharing - Bad Rule"> .. .. <value>Correlation Engine-47-4000004.totalInstances=0</value> Next, scroll down towards the end of this </status> element and until you locate the <status name="rule_1"> element. Once you find this section of code, you can examine how the rule components are matching the inbound events and use this information to determine if one or more of the Match C omponents need adjustment. The next page has an example of what we are looking for. Also, s ee Appendix A for a complete rule XML example. Using our example, three are (3) Match C omponents. So in the output for this rule, we will have three <status name=rule_ elements, each corresponding to a Match C omponent. This means that: <status name="rule_1"> matches <status name="rule_2"> matches <status name="rule_3"> matches Knowing this, we can see each components statistics and how it is performing. The XML Example: <status name="rule_1"> <property name="matchAttempts"> <value>423454245</value> <description>Number of match attempts</description> </property> <property name="matches"> <value>0</value> <description>Number of matches</description> </property> </status> <status name="rule_2"> <property name="matchAttempts"> <value>423454245</value> <description>Number of match attempts</description> </property> <property name="matches"> <value>0</value> <description>Number of matches</description> </property> </status> <status name="rule_3"> <property name="matchAttempts"> <value>423454245</value> <description>Number of match attempts</description> </property> <property name="matches"> <value>243489245</value> <description>Number of matches</description> </property> </status> What we see from our example on the previous page is that Rule 1 and Rule 2 have had 423 million ( + or -) match attempts, but nothing has matched either component. Whereas Rule 3 has had the same number of attempts, but has matched 243 million times. This tells us three things. 1. Since these Match C omponents are individually defined versus all in a single Match C omponent, the engine will be using a lot of processing (CPU) attempting to match each component individually. 2. When a match does occur in Rule 3, it’s maintaining state (memory) for almost half of the events it’s seeing. This is aggressive and will be expensive in term of memory use. And most importantly 3. It may never trigger. This is because while Rule 1 and Rule 2 have seen 400M+ events, nothing has matched even though Rule 3 has seen almost a 50% match rate. In other words, if Rule 1 and Rule 2 haven’t matched by now, they may never. Thus Rule 3 is keeping memory state on over half of what it is seeing without any change of a match (rule trigger) The solution in this example is for this rule is to combine the Match C omponents into a single component. This will reduce memory usage and improve C PU utilization thus ensuring the engine can run as efficiently as possible. NOTE: This is just one example of rule tuning. Each rule will behave differently and may require different tuning. But using the steps outline here, it’s straight forward to see where a rule has gone 'wrong'. Additional Options Sometimes even after you indentify the rule, disable, it and then roll policy out, the Engine is busy determining how to catch up. Alternatively, your experience tells you that the engine has a corrupt Rule XML file, or you want to delete the queued up events and start from scratch. If these reasons apply in your environment, you can force the situation a bit. You do this by killing the java process which is running the engine. Generally there is no harm in doing this as there is another process which will start the engine immediately without user intervention. To do this, first you need to determine what the Process ID is. You can use one of two commands. These are: McAfee ACE ~ # ps –ef – or – McAfee ACE ~ # ps –ef | grep java Figure 11 is the output of the first example with the Java process (red) and the Process ID (yellow) highlighted: Figure 11 Now that you have the Process ID, all you have to do issue a kill command. However, there is a process to issuing the kill. The steps are: 1. 2. Disable the offending Rule via the GUI. Kill the process. An example of the command is: McAfee ACE ~ # kill -9 3337 3. Rollout Policy to the C orrelation Engine NOTE: Once the process terminates, the engine will automatically restart and begin processing any backlogged files in the incoming directory. If you want to have a fresh start without processing the backlog of events, you could also delete the contents of the incoming directory. Also Rolling out policy for the engine is recommended, but is not required to have the engine continue processing. Conclusion Finally, here are a couple sections contained within the correlator.sh output. Cores Used and Detected As we’ve previously mentioned, the C orrelation Engine dynamically determines the number of cores to use. That calculation is the number of cores detected minus 2. This section shows you what was detected and was is used. <property name="coresUsed"> <value>2</value> <description>The number of CPU cores being used</description> </property> <property name="totalCores"> <value>4</value> <description>Total detected CPU cores</description> </property> Rules Correlation EPS Like dsstatus on a Receiver, the C orrelation Engine also keeps track of its processing performance. This section provides you with the live EPS and Total Events Processed as of the execution of the script. The EPS number here will be what the EPS was at the moment the status was run. It could be low or high and should not be viewed as the EPS of the engine. See page 25 for the processing record counts in the logs for more accurate EPS of the engine. <status name="Correlation Thread Status"> <status name="Correlation Thread 0"> <property name="Processing Rate (EPS)"> <value>200</value> <description>Processing Rate (EPS)</description> </property> <property name="Total Events Processed"> <value>8012314107</value> <description>Total Events Processed</description> </property> .. .. </status> </status> As mentioned on page 14, the logs are located in /usr/local/ace/logs. These files (there are 6 of them) store some very important information on the processing and to some extent, the performance of the engine. The next few examples will allow you to weed through these to pick out some good information. Grep for AddFile in the logs On occasion, after you have done the investigation on the previous pages, everything looks OK but still nothing is being correlated. While this could be normal and events are just not triggering the rules. It could mean that something is wrong. The command below will look through the logs to see if files are getting added to the Engine from the ESM: McAfee ACE ~ # grep AddFile correlator.log.0 The results will look something like Figure 12. With a number of entries listed. This would mean that events are getting to the engine, but for some reason the engine is ignoring them. One easy step to take is to Roll Policy out to the engine to make sure it has the more recent rules. Figure 12 Grep for files processed in the logs Once you know that files are making it to the engine, this next command will check to see if the files are being processed. The syntax is: McAfee ACE ~ # grep 'record counts' correlator.log.0 The results are in Figure 13. If you have one or more entries in the log and they are recent events, you can be assured that the engine is processing events. The event counts here are compressed. Multiplying these b y your aggregation rate can provide an estimate of the event rate on the engine. McAfee uses a 10:1 default aggregation rate, however you rate will be different. Figure 13 Appendix A - Full Rule Element in XML <status name="Ruleset: Account Sharing - Bad Rule"> <property name="activeInstances"> <value>0</value> <description>Total active instances, first instance will be shown in status</description> </property> <property name="totalInstancesCreated"> <value>Correlation Engine-47-4000004.totalInstances=0</value> <description>Total instances created</description> </property> <property name="deadInstancesRemoved"> <value>0</value> <description>Dead instances removed</description> </property> <property name="totalTimeSpent"> <value>Correlation Engine-47-4000004.timeSpent=2.160236ms</value> <description>Time spent in this ruleset</description> </property> <property name="timeSpentProcessing"> <value>Correlation Engine-47-4000004.timeSpentProcessing=0.0ms</value> <description>Time spent in this rule within an instance</description> </property> <property name="timeSpentMatching"> <value>Correlation Engine-47-4000004.timeSpentMatching=0.287428ms</value> <description>Time spent seeing if an events matches a new instance</description> </property> <property name="timeSpentCreatingInstances"> <value>Correlation Engine-47-4000004.timeSpentInstantiating=0.0ms</value> <description>Time spent creating instances</description> </property> <property name="timeSpentOther"> <value>Correlation Engine-47-4000004.timeSpentOther=0.200094ms</value> <description>Time spent doing everything else before match/processing</description> </property> <property name="firings"> <value>0</value> <description>Number of times this ruleset fired</description> </property> <property name="upcomingTriggerChecks"> <value>0</value> <description>upcomingTriggerChecks</description> </property> <property name="cooldown"> <value>0</value> <description>cooldown</description> </property> <property name="immediateMode"> <value>false</value> <description>immediateMode</description> </property> <status name="Default"/> <status name="rule_1"> <property name="matchAttempts"> <value>423454245</value> <description>Number of match attempts</description> </property> <property name="matches"> <value>0</value> <description>Number of matches</description> </property> </status> <status name="rule_2"> <property name="matchAttempts"> <value>423454245</value> <description>Number of match attempts</description> </property> <property name="matches"> <value>0</value> <description>Number of matches</description> </property> </status> <status name="rule_3"> <property name="matchAttempts"> <value>423454245</value> <description>Number of match attempts</description> </property> <property name="matches"> <value>243489245</value> <description>Number of matches</description> </property> </status> </status> Appendix B - Arguments for the correlator .sh script NOTE: C olumns which are grayed out are outlined here for informational purposes and are NOT intended for general use without support assistance. -add <file | file uncompressed | file <type> | file <type> uncompressed> A dd a n ew event file to be processed. On e could strip ou t a file from the i ncoming di rectory and add it in. Typically used in debugging. -clean <true|false> Un u sed / Future use -customFields <path> Fl agged used at startup. Internal configuration file. -eventTypes <eventTypePath> -globals <path> Fl agged used at startup. Internal configuration file. -gui Un u sed / Future use -idleCheck Un u sed / Future use -incoming <path> Tel l s the engine to u se a different incoming location for the events from the ESM. Used for debugging on ly. -managers <managers> Fl agged used at startup. Internal configuration file. -out <path> Fl agged used at startup. Internal configuration file. -port <port> Tel l s Correlation Engine what port to listen for commands. In ternal use on ly -processcfg <processcfg> Fl agged used at startup. Internal configuration file. -rebalance Forces en gine to re-balance i ts u sage across the threads. This is supposed to be performed as a normal process, however, should an imbalance occur, using this can force th e engine to perform the balance immediately. -reload Used to rel oad certain con figuration or rules files. -reset Wi l l do a full engine reset. Not recommended for use. -run Fl agged used at startup. Internal configuration file. -scoring <scoring> Fl agged used at startup. Internal configuration file. -shutdownStatus Pai red with the shutdown command and i ts qu eries the correlation engine to see if i t h as shutdown yet. It could take time to save i ts state file. Appendix B – Arguments for the correlator.sh script NOTE: C olumns which are grayed out are outlined here for informational purposes and are NOT intended for general use without support assistance. -start Starts the en gine i f it had stopped on its own or i f the stop augment had been used prev i ously. Du m ps the status of the correlation engine into a file to allow administrators to i n v estigate how the engine and i ndividual rules are performing. Syntax is: -status # ./usr/local/ace/bin/correlator.sh –status > /tmp/status.log See prev ious section on use and values of interest. -stop Stops th e correlation engine. Once it stops, it will automatically restart the engine. Th i s i s similar to the kill outlined on page 22. -testRules <path> Wi l l test a Rules XML file. Also used during a pol icy push (rollout) in the GUI. -thirdParty <path> Fl agged used at startup. Internal configuration file. -update <enrichmentFile> Wi l l u pdate a Watch List or Enrichment framework manual. Not recommended for u se. -upgrade <path> Un u sed
© Copyright 2024