“Monitor everything” approach: did you miss anything?

“Monitor everything” approach: did you miss anything?

Common pitfalls in setting up network monitoring

Monitoring stand

In most cases, when a significant new service (device, application) is brought online, it is not put to monitoring immediately. This is a matter of common observation: resources often get monitored only after they “suddenly” stop working (and this is frequently found in most inappropriate way: imagine you learn that your online shop isn’t working only from your customers’ emails).

Whereas there can be management or communication issues, the result is the same: an important resource can become unavailable, and no proper notice of that is provided in time. The following is a quick checklist of how monitoring setup should be maintained, to avoid (or at least to reduce) necessity to fix the problem on “round the clock” schedule.

The below list isn’t complete; there can be situations not covered by it. If you think you can add to the list below, please let us know!

Scan your network periodically

Apart from initial discovering network devices, IPHost has “rediscovery” feature: in its “Settings > Rediscovery” section one can set up periodic attempts to find new devices:

Rediscovery menu

The settings in rediscovery section are identical to those in network discovery wizard. Enable it, to stay aware of all the changes in network neighborhood of the system running IPHost. You might wish to only add PING monitors, during discovery attempts – add the rest of monitors later.

Add monitor for every resource

Every possible system’s resource can be exhausted – in case you add a new device (say, a computer), make sure you put every resource under monitoring:

  • every physical resource: CPU, RAM, available disk space on every file system, network speed for every adapter installed, available file handles and so on
  • for every important process running, set up monitor to notify of its absence (or excess)
  • in case the system listens on publicly available network ports – monitor the presence of open ports
  • in case Web (or other) applications are available off the device, create at least basic HTTP(S) monitors to watch them running
  • depending on operating system type, set up either Event log or Syslog monitor, to be aware of most important system-wide events

The above is actually a shortlist. For every system added to the network watched, all the important services and the resources the former depends upon should be monitored.

It can help if you document every device being added to the network you monitor. The quicker all the details are saved and corresponding monitors added, the less feasible become situation when an important resource goes off radars silently.

Test everything

Human errors, even small ones, such as typos, donate to the entire probability of something important going down without proper notice.

When changing the settings, make sure you check them in action. For example, after setting up or changing settings for email notifications (“Settings > Email Settings”), click the “Send test email alert” and make sure email is actually delivered.

Same goes for database backups (you can both create a database backup at any time, and restore any of created backups – to make sure they are valid). Note that in case backup restoration fails, the previous state of database is kept as nms.fdb.old file, thus can be safely recovered.

When constructing alerts – perhaps most important part of monitoring setup – make sure you use “Testing” tab in “Alerting” section to run tests of how alerts will behave. That’s especially important, if monitors are polled rarely, and you should be sure notifications will indeed be sent.

Whatever change you apply to monitoring setup, make sure you test it as soon as possible.

Enable more regular reports

By default, IPHost sends daily reports on monitoring setup. We recommend getting acquainted with the report messages, to notice quickly all the important changes and trends.

At any moment, one can change the reporting parameters by going to “Settings > Reporting”:

Reporting menu

Please note you can include reports of newly discovered devices, and provide auxiliary (Cc:) email address to send reports to.

By default, reports are sent to $AdminMail address, defined in “Settings > Email Settings”. Make sure that address does actually exist and accepts incoming mail.

Use scheduled alerts

By default, every simple action in alerts uses default “Always” schedule. However, one can define arbitrary amount of other schedules, in “Settings > Schedules”:

Schedules menu

After that, if you add to the same alert several simple actions, every one on its own schedule, you can ensure that certain alerts are executed on business hours, whereas others are only run during other time intervals. For example, you can notify daily network administrator when there are business hours, and notify entirely different people when problems strike when business hours are over.

Conclusion

The above list includes most typical pitfalls you can get in while setting up monitoring. If you know another typical example and would like to share your experience with others, feel free to let us know!