Optimizing security monitoring
Security issues can be a challenge; preventing them via properly set up monitoring can save many resources. However, as network grows, the list of resources subject to monitoring may grow much faster.
A typical situation is a data center: when new hosts (servers) are added, multiple monitors of the same type can be added (depending on server type: Web server, mail server and so on). In such a situation, it is required to reduce possible amount of monitors to as small number as possible. Below we propose several approaches to optimize watching a number of critical resources.
Properly select dependencies
In “Main parameters” of every monitor there’s possibility to specify a dependency monitor: if the latter is in certain state (for example, Down), the monitor depending on it is paused (no alerts will be generated, until the dependency monitor leaves the specified state.
Assign dependencies “inward”, checking every device and/or service starting from the system where IPHost is installed. Every logical group of hosts and monitors should depend on as few monitors as possible.
For example, when monitoring group of servers connected through the single network switch, make them depend on the monitor checking switch availability: if it goes offline, all the connected devices become unavailable. Instead of getting dozens of alerts from affected servers’ monitors, a single alert from the network switch will be enough to attend to the issue.
Similarly, for all processes running on a server, default PING dependency is enough to prevent getting spurious alerts, when a single PING check could do.
Attend primary possible failures first
When something isn’t functioning properly, that can result in multiple alarms that are not directly related to the true problem.
For example, if a site SSL certificate is out-of-date, connectivity to the site may become broken, thus all the monitors using HTTP(S) on the host in question may fail. However, if the SSL certificate age is set as dependency monitor for all HTTP(S)/Web Transaction Monitors for the site in question, the “secondary alerts” will not be generated.
For such critical services monitors, it is proposed to use both Warning and Down alerts. Say, if the certificate is due for renewal in 14 days, its monitor can be set to Warning state; and if 3 or less days remain, Down state can be reported. That way, monitors will warn about imminent service failure long before it actually happens.
The same approach can work for all cases where resources can get scarce and thus cause multiple cascade failures (such as disk space, CPU usage, RAM usage and so on).
Channel and aggregate monitoring data
Syslog monitor is a universal tool to detect various possible problems detected on a Unix-like system.
However, as servers number grows, the amount of possible Syslog alerts can grow way too far. In such a case, it is recommended to use a single Unix-like server as an aggregation point: Syslog protocol allows sending all the generated events to another host. By collecting events from multiple servers on a single recipient system, the number of alert sources can be reduced, thus it can be easier to track possible problems.
In case of script-type monitors (including running scripts or programs over SSH), it is possible to collect multiple metrics at once and report the overall state (i.e., 0 would mean “OK” state, whereas any positive number can indicate how many metrics require administrator’s attention). In addition to separate monitors, an aggregate one can be run, to provide single alert in case of more than a single resources issue.
Use visual representation to notice possible problems
In a real-life use case, a customer of ours uses modified Web interface dashboard to display the most important monitors states in clearly visible form.
When constructed properly, with access control set up, such dashboards can be used with variety of devices, including mobile one, to quickly draw attention, if there’s no regular access to other alerts media (such as email).
If you would need assistance setting up any of the monitoring setups mentioned above, just contact us. We could assist you with setting aggregate monitors, or provide samples of how custom Web interface dashboard can look like.
Do you know of any other useful tricks on saving time and resources when monitoring crucial resources> Please let us know either by contacting us via the above link, or by posting a comment below.