Alerting and Actions
IPHost Network Monitor provides various methods of automatic response to monitored resource state changes. Prompt and informative notification on a monitored resource operation failure allows you to substantially reduce the downtime of important resources, which saves the users work time and improves administrator work effectiveness.
The following notification methods can be used:
- e-mail notification,
- SMS over GSM modem or e-mail,
- instant messaging (using any XMPP client),
- sending a message to another computer,
- playing a sound file, and
- displaying a message in a dialog or pop-up window.
Also, the monitoring service can do the following, if a monitor changes its state, or if an event occurs:
- run a program or script on local machine,
- run a program or script on local or remote machine via SSH,
- set SNMP value on local or remote SNMP-enabled machine.
- send HTTP(S) request
The default notification message body contains a short problem description and a problem timeframe (if available). Also, it is possible to specify a program to run as an alerting action. You can use these actions to implement complex methods to respond to a problem state: for example, to start/stop a given service on a remote machine, or to reboot a given remote machine.
In order to get a notification on a resource state change an alerting rule should be assigned to the resource monitor on the monitor’s Alerting tab. The alerting rule assigned to the resource monitor (or any other entity) consists of commands (or command sequences) that the monitoring system executes on a monitor state change. Each command (or command sequence) is named an alert. Each alert is a set of simple actions and a time schedules assigned to them. The monitoring system executes a simple action (as a command in the monitor alerting rule) only in the assigned to this simple action schedule timeframe.
There are three predefined alerts and one alerting rule, hence you are not required to create alerts or rules to start getting notifications, you can use the predefined ones.
Let’s consider an example. A monitor is set to check a CPU usage on a certain host and configured to switch to Down state if the CPU load is too high. We need to notify a system administrator with an e-mail message and a pop-up window if the monitor switches to Down state and remains in this state for 10 seconds. Also we need to notify the senior system administrator via e-mail if the problem still persists within 30 minutes after it was encountered (i.e., if the CPU load is still high), but escalate on worktime only. Also, we need to notify the administrator if the monitor switches to OK state, i.e. if the problem is resolved.
For this particular case we need to create four alerts on Settings -> Alerts tab:
- Down (e-mail + pop-up) alert that will be executed in 10 seconds after the problem occurs, both simple actions: e-mail and pop-up will be executed 24/7:
- Escalation (e-mail to Senior Administrator) alert that will be sent in 30min if the problem still persists, this e-mail simple action will be executed on worktime only:
- Recovery from down (e-mail) alert that will be sent after the problem is resolved, this e-mail simple action will be executed 24/7:
- Recovery from escalated down (e-mail to Senior Administrator) alert that will be sent after the long-term problem is resolved, i.e. if the resource switches to OK state after the long (in this case 30min) Down. This simple action will be executed on worktime only:
Note: there is no need to create four different alerts in this case. You can create one alert (e.g., e-mail + pop-up) and use it for all four state changes. I.e., you can create four different alerts for all four state changes:
- a problem occurs
- monitor recovers from the problem state
- monitor remains in the problem state for a long time (escalation)
- monitor recovers from the escalated problem state
but you can use one alert for all these state transitions as well.
The detailed description of how to create and modify an alert is here.
After the alerts are created we need to construct an alerting rule on the Settings -> Alerting Rules tab: let’s assign our alerts to the monitor state changes. In the Down State Alerts section we select Down (e-mail to Administrator + pop-up) alert from drop-down list to be executed when the monitor enters Down state. We check escalation checkbox (‘When monitor remains in Down state…’), set the escalation time limit (30 min) and frequency (repeat every 30 min), and select Escalation (e-mail) alert. We check recovery checkbox and select Recovery from down (e-mail) alert to be executed upon recovery from Down state. There is no need to configure Warning State Alerts and Event Alert sections, since we don’t need to inform anyone about these state changes in this example.
This is our final example alerting rule as it is constructed on the Settings -> Alerting Rules tab:
Now we can assign this alerting rule to our CPU usage monitor on the monitor Alerting tab. The detailed description of how to create and modify an alerting rule is here.
Note: all the alerts are reusable and can be used to construct more alerting rules.
Note: all the named alerting rules are reusable, too. The rule constructed in this example can be assigned to any tree node, e.g. to other monitors, not only to the monitor we use in this example.
An alerting rule specifies what the monitoring service should do when a monitor changes its state or an event takes place. I.e., each alerting rule is a reusable template that tells the monitoring service how to respond to monitor state changes and events. The rules assigned to a host or named rules simplify monitoring system configuration.
A named alerting rule can be assigned to any node in the host tree: a root, agent, host group, host or monitor on the node Alerting tab. On the Settings -> Alerting Rules tab you can manage named alerting rules: create new alerting rules, edit and delete existing ones. The bottom part of the page contains a short description of the selected rule. The image below shows Settings -> Alerting Rules tab with selected alerting rule:
Each alerting rule consists of three sections; any of them can be empty. The sections are:
- Alert(s) to execute on OK -> Down and Down -> OK transitions (Down State Alerts):
- Alert(s) to execute on OK -> Warning and Warning -> OK transitions (Warning State Alerts):
- Alert(s) to execute on an event (Event Alert):
The form with the above sections will open if you select an alerting rule on the Settings -> Alerting Rules tab and click Edit button to the right of rules list. The form allows you to modify an existing rule.
To create a new named alerting rule click the New button on Settings -> Alerting Rules tab. You need to set a rule name, all other sections are optional: the rule can be intentionally set empty. Add a required section by clicking on the “use custom” link on the right, select a named alert from the drop-down list, check needed lines and set the time values (for Down and Warning State Alerts) for:
- how soon after a state change the alert will be executed;
- how soon the alert should be escalated;
- how soon the alert should be repeated if the monitor stays in a ‘bad’ state too long.
For escalation alert you can select an alert from appeared drop-down list, either the same as for the state change or another one. Select “<same alert: …>” to use the same alert for state change and for escalation.
You can disable escalation and recovery alerting by unchecking corresponding lines. To completely disable alerting for state transition or event click “do not report” link in the top right corner of the alerting section.
You can also create a new alert (as described here) directly from here by clicking on the ‘+’ icon on the right of the drop-down list, or edit a selected alert by clicking on the pencil icon on the right of the drop-down list as described here. Alerts and simple actions (blocks to construct an alert) are described in details below.
In the sample below you can see the new alerting rule Down State Alerts section with an immediate alert, one escalation alert and two recovery alerts (Down->OK and long-term Down -> OK) configured. The Warning State Alerts section is not fully configured yet – the immediate alert is being selected from the drop-down list of named alerts:
After the changes are complete click the OK button to save the named alerting rule. Now the rule can be assigned to any tree node on the node’s Alerting tab.
There is a predefined alerting rule named Default Alerting Rule. This rule is assigned to the All Agents node by default after the first installation, and inherited by all the other tree nodes, e.g. monitors, by default:
Hence you will receive immediate notification if a problem occurs without any additional configuration upon the installation is complete and monitors are created.
You can add and modify named alerting rules on Settings -> Alerting Rules tab, and you can assign them to a monitor (or any other tree node) on the monitor (or other node) Alerting tab on Parameters/Results pane. Each new monitor inherits its alerting rule from its parent (host or application) by default, hence there is no need to specify a rule for each monitor separately. However, it is possible to set a custom rule for any monitor, host, host group or agent.
A node, for example, a monitor, can use one of the three kinds of alerting rules:
- The rule inherited from the monitor parent (host or application), this is the default,
- One of the named rules,
- A custom rule specific to this monitor.
In the sample below a monitor is configured to use an alerting rule inherited from its parent host:
The root node (All Agents) can use either a named or a custom rule. In the sample below the All agents node is configured to use a named alerting rule:
To assign and configure an alerting rule for any tree node, for example, for a monitor, use the node Alerting tab. You can change alerting rules inheritance mode for the node using the radio button area, select named alerting rule or set node-specific (custom) alerting rule on this tab.
Global and Custom Alerting Rules
Alerting rules can be named (global) and custom. The global ones have a name and can be used for any node in the tree view (a monitor, host, host group, agent or root). Select the menu Alerting > Alerting Rules (or press the Settings button on the toolbar), then select the Alerting Rules tab to invoke the form with a list of named (global) alerting rules.
By default any new entity inherits its alerting rules from its parent. Inherited alerting rule can not be modified directly from the entity Alerting tab, you need either open the parent Alerting tab, or modify the rule itself (if it is a named rule) from Alerting > Alerting Rules menu.
A named alerting rule can be assigned directly to any entity and can be inherited by this entity children. In the sample below you can see a named alerting rule assigned to a monitor. The drop-down list on the right of the selected checkbox item shows the rule name and the bottom part of the tab shows the Down State Alerts and Warning State Alerts sections of the rule:
Named alerting rule assigned to a node can be partially disabled or overridden on place. After a named rule is assigned to a node (e.g. to a monitor), each of the rule three sections (Down State Alerts, Warning State Alerts and Event Alerts) can be in one of three states:
- inherited: use section from the rule for this monitor;
- use custom: use specific section for this monitor instead of the section from the rule;
- do not report: do nothing on this state change.
You can switch a section between these states using the links at the top-right corner of the section description. Note that the rule itself is not changed if you switch its section state, you only change the way the rule is used: use the entire rule for this node, or use only a part of the rule to report the monitor state changes.
You can disable any section of a named rule by clicking “do not report” link, for example, if you disable Down State Alerts section, the service will not send any alert if the monitor switches to/from Down state:
Any disabled section can be added again by clicking “use inherited” link to add section inherited from named rule or “use custom” to add custom section that will override the inherited one:
Any active inherited section can be overridden on place by a custom section:
In the sample below the Warning State Alerts section of a named rule assigned to a monitor is overridden with a monitor-specific section. After the change is complete, if the monitor switches its state to Warning, the “Save logs” alert will be executed immediately:
This is the final modified alerting rule for the monitor: Down State Alerts section of “Local Hosts Alert” named rule is used, and custom Warning State Alerts section is used:
A custom alerting rule can only be set for a given node (a monitor, host, host group, agent, and root). The custom rule can be inherited by the node children, but can’t be assigned to another node directly. To create a custom alerting rule for the node check “Set alerts independently” on its Alerting tab. The custom rule template is similar to the Edit Alerting Rule form. If the node is a monitor its custom alerting rule template contains only sections available for the monitor type. For example, an Event Alert section is not available for the “SSH Remote Program or Script” monitor type:
To save the changes click the Save button at the right-hand top corner of the pane.
An alert is a set of simple actions: a command or command sequence the monitoring system will execute when a monitor state changes or an event occurs, and time schedules assigned to them. Alerts are used to construct alerting rules. All the alerts and the simple actions are provided in one common alert list that allows you to add a new alert or to review and modify an existing alert. Select the menu Alerting > Alerts (or press the Settings button on the toolbar) then select the tab Alerts to invoke a form with the alert list. Note that any change in an alert affects each alerting rule that uses the alert, and any change in simple action affects each alert that uses the simple action. On the image below you can see the Alerts tab with the list of alerts. The bottom part of the page shows the simple actions that form the selected alert, and their time schedule:
On the Settings > Alerts > Alerts tab you can create a new alert and edit, copy or delete an existing alert. Select the alert you want to modify or delete and click the appropriate button.
The New Alert form opens on clicking the New button. The only mandatory field is an alert name, the alert itself can be empty. Add any number of simple actions using the drop down menu Add on the right side of the form. To add a new simple action use Add > New Simple Action and then select needed simple action type from the drop down menu.
After a simple action is added the New Simple Action form opens. The form allows you to configure the simple action: to set the recipient’s address, to modify the predefined message text (if available), and to set simple action name.
After the New Simple Action is closed with OK button, a line with the newly created simple action is appeared in New Alert list and you can assign a time schedule from the drop-down list to it. Choose the <Always perform simple action> item to allow the simple action will be executed 24/7. You can add a new schedule or modify an existing one my clicking on the “+” icon or pencil icon on the right of the drop-down list. The schedules are described in detail here.
To add already created simple action use Add > Simple Action > Name of created simple action, you can also add all simple actions with their assigned schedules from another alert using Add > Alert > Name of existing alert.
To save the new alert click OK button. Now it can be used by any alerting rule. All simple actions are described in detail here.
The Edit Alert form opens on clicking the Edit button. Using this form you can:
- Change the current alert name;
- Add a simple action to the current alert;
- Remove a selected simple action from the current alert;
- Edit a selected simple action or its copy;
- Change time schedules assigned to simple actions.
The Edit Simple Action form is similar to the New Simple Action Form and allows you to edit the message recipient and contents (if available) of a simple action, and to modify its name. After the changes are complete press the OK button to save the action.
For example, you can create a named simple action ‘Send mail to ALL’ and configure the message to be sent to all the persons concerned: system administrator, project manager, etc. This simple action can be used in any number of combined alerts. In case an address is changed, you can replace it in a single place: in this simple action parameters.
You can temporary disable all alerting. If the alerting is disabled you will not receive a notification if any monitor changed its state. To disable all alerting temporary uncheck the menu item Alerting > Enable alerting or uncheck Enable alerting in the system tray menu. If you disable alerting, a dialog opens, in which you should choose whether to enable alerting after a selected time interval automatically or not:
You will see a warning indicator at the top of the Client and Web Interface window while alerting is disabled. Also, you can configure if regular notifications will be sent while alerting is disabled on the Settings > Alerts page of the Settings dialog. This option prevents you from unintentionally disabling alerting for a long time:
All simple actions are provided on the Settings > Alerts > Simple Actions tab, there you can create a new simple action and edit, copy or delete an existing simple action. The omage below show the Alerts tab with the Simple Actions tab opened and a simple action selected. The bottom part of the page shows the content of the selected simple action:
All simple actions available to construct an alert are listed below.
Send an e-mail to specified recipients. Variables can be used in the message template, such as $MonitorName, $CurrentState, etc. (see the variables full list), which allows using a single template for all and any state changes.
Send an SMS message directly over the GSM modem or GSM cell phone attached to the computer. This action allows you to specify one or several phone numbers to send notifications to. To use this action you need to configure the GSM modem as explained here.
By default, this action uses a short one-line message template to ensure that a single SMS message is enough to deliver the action.
Send an SMS message over e-mail to a cell phone. This action allows you to select a provider from the predefined list; the e-mail address stub for this provider is created automatically and you should enter a valid user name or phone number into the bracketed placeholder. You can also specify the e-mail address manually. Make sure to enable the e-mail to SMS service for your phone number (this can be done on the provider’s web portal).
By default, this action uses a short one-line message template to ensure that a single SMS message is enough to deliver the action.
Send a message to an Internet pager. Supported IM clients are all the XMPP (Jabber) clients. Generally, you need two accounts to use this action: the first account represents IPHost Network Monitor (the sender) and you should enter a password for it in the monitor settings; the second account is the destination address for the message (you should start your IM client for this account to receive messages from IPHost Network Monitor).
By default, this action uses a short one-line message template to ensure that a single line in the IM chat log is enough to view the action. Hence, the chat log looks like a common log file.
It is a standard pop-up balloon with the default notification connected to the application icon in the system tray. The message body can’t be configured.
It is a standard window with a message, or a standard dialog with a message on a local or remote machine. The send method ‘NET SEND’ is not available on Windows Vista and later, hence the ‘MSG’ method should be used on all modern Windows versions.
This simple action starts a native binary or executes a script code in a language supported by Windows Scripting Host. The path to the file and its arguments (if any) should be specified. You can use a variable to specify program arguments. You can also specify a custom account for the program or script selecting the appropriate user credentials.
This simple action starts a Python script using Python interpreter configured here in a Settings Dialog. The path to the script, its arguments and input to the script (if any) should be specified. You can also specify a custom account for the script by selecting the appropriate user credentials. This is necessary say when the script has to access LAN resources (monitoring service runs under Local System account by default and has no LAN access).
It allows you to start any script or command on a given remote host over SSH. The command and credentials to access the remote host should be specified as explained here. You can specify program arguments using the variables.
This action plays a sound file either directly on the monitoring host or in the web browser. Supported audio format is MP3; .wav files are converted to MP3 ‘on the fly’ when you configure the action. In order to listen sound alerts in the web browser, you should:
- select the Play Sound in Web Browser checkbox in the action parameters
- open the Dashboard, Reports, or Alerts tab in the IPHost Web interface
A sample ‘rooster cry’ sound can be found in the ‘mp3’ subdirectory of the IPHost Network Monitor data directory; a sample alert using this sound is already part of the default alerting rule but this rule is not enabled initially (not assigned to any state changes).
Note: If you connected to the machine running IPHost Network Monitor via RDP (terminal services client), you will not hear audio alert generated by monitoring service.
This action sets a given value via the SNMP protocol.
The SNMP account used by this action should be granted write permissions for the variable to set. Changing some variables (sysName, sysLocation) may be prohibited by the SNMP agent even though they are formally writable. ‘Value Encoding Type‘ setting is normally auto detected based on the OID (using an SNMP GET request issued when you click on the ‘New Value‘ setting); however you can also select it manually; the default value for this setting is MIB lookup (auto detect)‘ which instructs IPHost Network Monitor to determine the value type using the current set of MIB files so the SNMP variable should be defined by one of the MIB files.
This action sends GET/POST request via the HTTP(S) protocol.
Send an GET/POST request to the server with specified URL. Variables can be used in the GET/POST data, such as $MonitorName, $CurrentState, etc. (see the variables full list), which allows using a single template for all and any state changes. HTTP(S) action has two types of validation: validation of response text and response code validation. If response doesn’t pass the validation, then action will be marked as a failed action in the log panel.
This action can be used for integration with many web-based tools and applications. For example, you can post alerts to the following destinations:
We plan to add additional types of actions to IPHost Network Monitor, such as sending an SMS message via HTTP forms, and sending messages to Skype and other messengers.
This chapter contains a useful simple action sample.
The following one-line script named log.bat writes state change messages to the text log file (assuming that the Cygwin toolkit is installed on the monitoring service host):
C:\cygwin\bin\echo.exe %1 | C:\cygwin\bin\tee.exe -a c:\NetworkMonitor\logs\custom.log
You should indicate the full path to log.bat as a Program name and specify the string to log using the Arguments setting of the Execute program action. Sample arguments might look like:
“[$Time] ‘$MonitorName’ on ‘$HostName’ is in $NewState state”
The following alerting rule is configured and assigned to a monitor. On each monitor’s state change the alert “Save logs” will be executed:
The following simple action is used to construct “Save logs” alert:
The script produces the following log:
[28.05.2014 11:28:44] ‘File on remote standalone share’ on ‘linux1’ is in warning state [28.05.2014 11:29:19] ‘File on remote standalone share’ on ‘linux1’ is in ok state [28.05.2014 11:30:55] ‘File on remote standalone share’ on ‘linux1’ is in warning state [28.05.2014 11:31:33] ‘File on remote standalone share’ on ‘linux1’ is in ok state