TNT Software event log monitoring solutions

"How should I size my ELM Server?"


Because of the dynamic nature inherent in monitoring computers, it is difficult to provide specific recommendations for hardware specifications. Such recommendations depend on the number of Agents, the number and frequency of Monitor Items, the amount of data collected or received, how frequently it is collected, etc.

Given those caveats, we offer the following guidelines, observations, and general recommendations for sizing an ELM Server.

Note: This guide covers only the sizing requirements for the ELM Server component. It does not include sizing for the ELM Server database or any other component, including the operating system. Generally, running the ELM Server on a multi-purpose computer is acceptable.

The factors that would indicate a dedicated server is required would be:

  • How many systems will be monitored?
    If the number of systems to be monitored is more than 30, you should consider a dedicated server.
  • Is Monitoring Mission Critical?
    If the systems to be monitored are mission critical and the fastest possible notification of failures is required, you should consider a dedicated server.

 

While there is no exact formula that can be used to size an ELM Server, there are guidelines to determine general specifications an ELM Server requires.

The following are the results of a stress test that was performed against an ELM Server. These tests are not meant to imply any specific configuration parameters, nor do they represent an ideal configuration. They are intended to provide a guideline for sizing your ELM Server. Ultimately, the resources required and consumed by your ELM Server will be determined by a number of factors, including:

  • Number and Type of Agents monitored - This is especially important when using Virtual Agents. All monitoring of, and collection from Virtual Agents occurs in the ELM Server process. As the number of Virtual Agents and the amount of monitoring/collection done on the Virtual Agents increases, the memory required by the ELM Server process increases.
  • Number of Monitor Items used and Frequency of Data from Monitor Items - The number of monitor items assigned to Agents, and the frequency at which data is generated from the assigned Monitor Items (e.g., data collection or state change data) contributes to memory requirement for the ELM Server.
  • Number of Views and Filters Used - Each time event-related data is received, the ELM Server must process the incoming data against all Event Filters and enabled Rules. A large amount of data combined with a large number of Event Filters and Rules will cause the ELM Server to consume more memory. Due to the infinite permutations of events, filters and rules, it is impossible to provide a usable formula to determine how much additional memory will be consumed.
  • Number of ELM Consoles connected to the ELM Server - Each ELM Console connection requires the ELM Server to create and maintain a persistent session with the ELM Console. The session transmits data from the ELM Server to the ELM Console for display and editing purposes. In addition to the overhead for maintaining the session, ELM Server overhead is used for the transmission of data. On average, an ELM Console session causes the ELM Server to consume approximately 10MB of memory to maintain the session. Any additional memory requirements on the ELM Console computer are determined by the amount of data transmitted to the ELM Console, and the number of ELM Consoles to which the data needs to be transmitted.
  • Previewing and running reports - The Report Engine is a component of the ELM Server process. When you preview or run a report, the ELM Server does all of the work: it queries the ELM Server Database, it receives the database query results, it formats the results, and it outputs the results to the specified output format. These operations are CPU and memory-intensive, particularly when a large number of records are returned, or when a large database is queried.

This list is not exhaustive. It illustrates the types of activities that cause the ELM Server to consume resources.

Lab Test Results

In our test lab, a single ELM Server running on moderate hardware has been shown to handle more than 40 events per second. This translates to approximately 3.5 million events per day.

Tables 1 and 2 below detail the specifications of this test server and the database server for comparison purposes:

Table 1 - Specifications for test ELM Server
CPU Memory Disk Database Network Operating System
Pentium III S - 1.13Ghz
Dual Processor 2.25GB
18GB Ultra 160
LVD SCSI 10K rpm
(remote) 100MB
Ethernet
Windows 2003
Enterprise SP1

Table 2 - Specifications for test Database Server
CPU Memory Disk Database Network Operating System
Pentium III 450MHz
Single Processor 256MB
Two 10.2GB ATA-66 IDE
7200 rpm
SQL Server
2000
100MB
Ethernet
Windows 2003
Standard SP1

The ELM Server in this stress test demonstrated a sustained value of 41 events per second, with frequent spikes of 42 events per second. Average resource utilization on the ELM Server during the test is detailed in Table 3 below.

Table 3 - Average Resource Utilization by ELM Server
CPU Usage Working Set Virtual Memory I/O Reads I/O Writes Handles Threads
9.3% 24MB 59MB 0.05 0.00087 816 39

Performance was charted for a 24 hour period during which 6 Service Agents were used to collect a large number of events. In addition to transmitting an average of 7 events per second to the ELM Server, each Service Agent was executing additional monitor items and reporting to the ELM Server any state or status changes for those items. As shown in Chart 1 below, the ELM Server used an average of 9.3% of the overall CPU time, with a peak usage of just over 10%.

ELM Server CPU Usage
Chart 1 - ELM Server CPU Usage

The two primary tasks performed by the ELM Server that are CPU-intensive were Beep Notification Methods and storage to the ELM Server Database. Beep and Sound File Notification Methods in general have been shown to use extra CPU time because of the processor interrupts that are generated when the sound-related Notification Methods are executed. Other Notification Methods, such as emails, SNMP traps, etc., are generally not CPU-intensive.

Chart 2 shows that the ELM Server consumed an average of 24MB of physical memory, with a maximum peak of 37MB of physical memory. Using Virtual Agents would have increased the ELM Server's working set by an average of 5-10MB per Virtual Agent.

ELM Server Physical Memory
Chart 2 - ELM Server Physical Memory (Working Set)

As shown in Chart 3, virtual memory (pagefile) usage by the ELM Server averaged 59MB with a peak of 65MB. The stress test was done using SQL Server for the ELM Server database. Had the database platform been Microsoft Access, the amount of virtual memory used by the ELM Server would have been higher, perhaps as much as double this value.

ELM Server Virtual Memory
Chart 3 - ELM Server Virtual Memory (Pagefile)

As you can see from Table 3 above, the ELM Server itself is not I/O intensive. However, what Table 3 does not show is that the ELM Server performance can be affected if there are I/O intensive operations occurring from another process running on the same server. For example, if your ELM Server Database is on the same computer as your ELM Server, database I/O operations could have an impact on ELM Server performance. This is more evident when average speed IDE disk drives are used instead of fast SCSI hard drives.

Other performance metrics were collected and reviewed, see table 4 below for details:

Table 4 - Miscellaneous Performance Details
Avg. Events/sec Max. Events/sec Avg. Page Faults/sec Avg. Network Bytes/sec Avg. Packets/sec
41 42 48 216K 308

Summary

A single ELM Server running on very modest hardware can handle millions of events per day. The critical areas for server size are (in order of importance):

  • Memory
  • CPU
  • Disk
  • Network

 

Tables 5 and 6 below show some final guidelines and recommendations for ELM Server specifications, based on a variety of configuration elements:

Table 5 - General Server Sizing Based on Collected Events/Days
Events/Day Server CPU(s) Server Memory Server Disk Network
< 250,000 Single PII-233 or greater 128MB+ IDE or SCSI 10Mbps+
250,000 - 500,000 Single PII-400 or greater 128MB+ IDE or SCSI 10Mbps+
500,000 - 1,000,000 Dual PIII-500 or greater 192MB+ SCSI 10Mbs+
1,000,000 - 2,000,000 Dual PIII-633 or greater 256MB+ SCSI 10Mbs+
2,000,000 - 3,000,000 Dual PIII-800 or greater 384MB+ SCSI 10Mbs+
3,000,000 - 5,000,000 Dual P4-800 or greater 512MB+ SCSI 10Mbs+
5,000,000 - 7,000,000 Dual P4-1Ghz or greater 768MB+ SCSI or Fibre 100Mbs+
7,000,000+ Quad PIII-633 or greater 1GB+ SCSI or Fibre 100Mbs+

Table 6 - General Server Sizing Based on Number of Agents
No. of Agents Server CPU(s) Server Memory Server Disk Network
< 25 Single PII-233 or greater 128MB+ IDE or SCSI 10Mbps+
25 - 50 Single PII-400 or greater 128MB+ IDE or SCSI 10Mbps+
50 - 100 Dual PIII-500 or greater 192MB+ SCSI 10Mbs+
100 - 200 Dual PIII-633 or greater 256MB+ SCSI 10Mbs+
200 - 300 Dual PIII-800 or greater 384MB+ SCSI 10Mbs+
300 - 400 Dual P4-800 or greater 512MB+ SCSI 10Mbs+
400 - 500 Dual P4-1Ghz or greater 768MB+ SCSI or Fibre 100Mbs+
500+ Quad PIII-633 or greater 1GB+ SCSI or Fibre 100Mbs+

If you have any questions or comments about this guide, or if you would like assistance sizing your ELM Server or designing your ELM-based solution, please contact TNT Software's Product Support Group.