IPHost monitoring database is damaged, how do I recover database?

Q: IPHost monitoring service gets stopped right after it’s started. How do I check whether monitoring data and settings are fine?

A: try following the steps below, or contact our Tech. Support.

Important! Fixing IPHost monitoring database is a set of advanced techniques. Make sure you have DBA (database administration) experience before going on. If unsure, either find and ask someone with DBA experience to do that for you, or ask us to help. Unless you fully understand what you are doing, you can irreversibly damage database, so the remaining option would be to install IPHost Network Monitor with clean database.

When shall I check my monitoring database?

The following are typical syndromes of monitoring database inconsistency that could suggest checking the database

  • IPHost monitoring service stops soon after it’s started, or refuses to start
  • there are messages in iphost.log file telling that a SQL query can’t be executed
  • frequent, spurious 10054 and/or 10061 error records in firebird.log
  • after upgrading to newer version, IPHost Network Monitor refuses to accept license key
  • new monitors can’t be added, or monitoring data is not updated

Note that we strongly recommend adding monitor to check monitoring database health, with optional capability to create weekly set of database backups. If the above monitor ever goes Down, you should immediately check the database health.

Prerequisites for database analysis

Note: you can use any supported platform to perform database analysis and fixing, provided Firebird server 2.5 and administration tool (we recommend freeware Flamerobin) are supported for it. That includes Windows, Linux and Mac OSX platforms. Please refer to the links below to learn more of supported platforms.

Install Firebird database server

Open Firebird site and download the latest 32-bit Firebird server version 2.5 (use Superserver build, if there’s any choice).

Note: if your plan to install Firebird database server on the same computer where IPHost is running (we do not recommend such an installation), stop IPHost monitoring service first and make sure there’s no “fbserver.exe” process running.

When installing, on “Select Components” choose “Full installation of Server and development tools”.

On “Select Additional Tasks” window, do the following:

  • Uncheck “Use the Guardian to control the server”
  • Select “Run as a Service”
  • Uncheck “Start Firebird automatically every time you boot up”

Use the rest of options as they are.

The above settings will prevent Firebird from automatically running. That would require manually starting it, but is safe if you install Firebird on the same computer where IPHost is running.

Note that we do not generally recommend installing Firebird database server on the same computer where IPHost is running: while using Firebird server, IPHost monitoring service should be stopped, since it relying on its own Firebid server (two of them can’t run simultaneously on the same system).

We recommend adding Firebird bin\ directory to the Path environment variable. Typical string you would need to add to Path is

C:\Program Files (x86)\Firebird\Firebird_2_5\bin

, please adjust according to actual Firebird database software version you installed and OS architecture (32- or 64-bit).

Install Flamerobin database administrator tool

Please open Flamerobin site and download the latest version of Flamerobin installer (0.9.2 at the moment of writing this article). Install it with defaults.

After you finish installation, copy file fbclient.dll from Firebird 2.5 bin\ folder from installation directory (typically “C:\Program Files (x86)\Firebird\Firebird_2_5\bin”), to Flamerobin installation directory (typically “C:\Program Files (x86)\Flamerobin”).

Copy monitoring database

Make a copy of your database to inspect, rather than handling the original IPHost monitoring database in place. Attempt to modify in-use monitoring database may have disastrous consequences.

Stop monitoring service (from IPHost GUI client Tools menu), the start Task Manager and make sure there are no fbserver.exe processes running. If there are, terminate them.

Locate your database file (nms.fdb). Its typical location:
“C:\ProgramData\IPHost Network Monitor\data\nms.fdb”
if Windows Vista or above is in use, or
“C:\Documents and Settings\All Users\Application Data\IPHost Network Monitor\data\nms.fdb”
if Windows 2003 or XP is being used.

Copy nms.fdb file elsewhere. For the sake of example, we assume you have copied it to C:\Firebird (assuming you’re using Windows to handle database; wherever required, there will be notices for other OSes).

Checking monitoring database for problems

Before you proceed, make sure all the setting are in place. Open command-line interpreter window (in Windows: cmd.exe) and navigate to the directory (folder) where database copy to inspect is located:

C:
cd \Firebird

Use the actual directory if you copied your database to different location

Make sure you have added Firebird 2.5 bin\ directory to your user’s Path variable. Type

gfix

in your command-line window to make sure it is available.If you have added the path, the program will print it options.

After that, set environment variables ISC_USER and ISC_PASSWORD, to authenticate against the database file. The corresponding commands in Windows:

set ISC_USER=sysdba
set ISC_PASSWORD=masterkey

, typical commands in Linux:

export ISC_USER=sysdba
export ISC_PASSWORD=masterkey

Note that you can also set the above environment variables for current user, as well, in order you haven’t to issue the above two commands every time you open command-line box.

Find and fix database errors

Below is short sequence of typical commands to perform analysis and fixing the database. If you need in-depth explanations what those commands mean, please refer to Firebird 2.5 language reference update.

1. Run database problem detection command (hereinafter we assume that monitoring database file we fix is named nms.fdb):

gfix -v -i -full -n nms.fdb

If there are problems, gfix will print something like

Summary of validation errors
        Number of data page errors      : 2
        Number of index page errors     : 9
        Number of database page errors  : 4

(actual numbers may differ in your case). If there are no problems with database, gfix will print nothing.

Important: if you see diagnostics like

I/O error during "open" operation for file "nms.fdb.delta"
, then you should issue command
nbackup -F nms.fdb
and re-attempt the above gfix call again.

2. If there are errors, prepare database to backing up

gfix -mend nms.fdb

3. Do backup copy of database, ignoring, if we have to, damaged records. Try this command first:

gbak -b nms.fdb nms.gbak

If there are errors printed about inability to back up, remove file nms.gbak and issue command

gbak -b -i nms.fdb nms.gbak

If “gbak: ERROR:internal Firebird consistency check” is reported, remove nms.gbak. if exists, and issue command

gbak -g -b -i nms.fdb nms.gbak

If an error is still reported, database file may be beyond repair. Please send us the database and the error message gbak reported, in case we could know how to handle the situation.

4. Restore database backup without creating indices:

gbak -r -n -i nms.gbak nms-noindex.fdb

5. Shut down and reopen the database file:

gfix -shut multi -force 0 nms-noindex.fdb
gfix -online nms-noindex.fdb

Note: you may choose any other file name instead of “nms-noindex.fdb” above. This is temporary database file name we will further process in Firebird database administrator utility, to restore indices and check sequences states.

Fix database tables indices

At this moment you need to run database administration tool (Flamerobin). if this is your first run, please do a quick setup.

You will see a single Firebird server entry name “Localhost”. Right-click and select “Server registration info”. Make sure port number field is either empty, or contains default port value, 3050. Save changes if necessary.

Note: you can use IPHost bundled Firebird server for this purpose. You should make sure monitoring service is running and specify port 3055 in the above server registration info:
Flamerobin: server registration

Once again right-click server name and select “Register existing database”. The fields values are as follows:

  • Display name: any string of your choice
  • Database path: either type it manually, or click ellipsis button to the right to select nms-noindex.fdb generated on previous steps
  • Authentication: leave as is (“Use saved user name and password”)
  • User name: sysdba
  • Password: masterkey
  • Charset: UTF8

Flamerobin: database registration
Click on “Save”, then right-click on newly created database entry and select “Connect”. In case of any errors select again database registration info and make sure all the data are entered correctly.

a) Now right click on connected database and select “Run a query…”. Enter the following text block into query window, click “Execute” (or press F4) and then “Commit” (or F5):

SET TERM !! ;

EXECUTE BLOCK AS
DECLARE VARIABLE stmt VARCHAR(1000);
BEGIN
for select 'ALTER INDEX '||rdb$index_name ||' ACTIVE;'
from rdb$indices
where (rdb$system_flag is null or rdb$system_flag = 0)
order by rdb$foreign_key nulls first
into :stmt
do EXECUTE STATEMENT :stmt;

END!!

SET TERM ; !!

What it does: scans database for inactive indices and tries to enable them. Click on “Execute” (or press F4). It should perform without errors. Then click “Commit” (or press F5). If database is inconsistent, there will be errors, hinting what constraint for which table prevents query from being committed.

b) Typical error looks like this:
Fixing database indices

c) For this given example: there’s problem with constraint INTEG_271 for table RESULT_STAT. In the Flamerobin data tree for database connected, click on Tables to list the entries, find RESULT_STAT, right-click on it and select Show Properties. Click on Constraints tab:
Finding table constraints

In the above example there’s problem matching field MONITOR_ID against ID field of MONITORS table. To fix it, let’s find what entries in RESULT_STAT do not refer to existing IDs entries in table MONITORS: open another “Run a query” window and execute the following query:

SELECT ID FROM RESULT_STAT WHERE (MONITOR_ID NOT IN (SELECT ID FROM MONITORS));

If the list of IDs is non-empty, you can inspect them if necessary, or simply delete those “orphan” entries that conflict with the constraint:

DELETE FROM RESULT_STAT WHERE (MONITOR_ID NOT IN (SELECT ID FROM MONITORS));

Press “Execute” and “Commit”. Now switch to SQL query with execution block (see step a) above) and press “Rollback” (or F8).

Now repeat attempt to re-enable indices (on step a) above). Fix the found table problems according to what error commit operation displays.

As soon as the re-enabling indices worked without errors, we can switch to next operation (checking sequences).

Fix database sequences (generators)

For every row in the table below do the following:
a) Select corresponding sequence (generator) from Generators branch in database tree view and click Show Properties
b) For corresponding table and its field name run the following SQL query:

SELECT MAX(fieldname) FROM tablename

(replace entries in green with actual field name and table name)
c) If generator data value less than the result of query, alter generator value by clicking to icon to the right of the value:
Finding: change generatorSet new value to the result of table query increased by 1, then click on “OK” and “Commit” in resulting pop-up.

Sequence name Table name Table field Notes
ACS_LINES_ID_G A_CONFIG_SECTION_LINES ID
ACTIONS_ID_G ACTIONS ID
ACTION_TYPES_ID_G ACTION_TYPES ID
AC_SECTIONS_ID_G A_CONFIG_SECTIONS ID
APPLICATIONS_ID_G0 APPLICATIONS ID Version 5 and above
APPLICATION_LINES_ID_G0 APPLICATION_LINES ID Version 5 and above
A_CONFIGS_ID_G A_CONFIGS ID
A_SETTINGS_G A_SETTINGS ID
A_SETTING_ENUMS_G A_SETTING_ENUMS ID
A_SETTING_TYPES_G A_SETTING_TYPES ID
A_STATE_ID_G A_STATE ID
COMP_ALERTS_ID_G COMPOSITE_ALERTS ID
C_ALERT_LINES_ID_G COMPOSITE_ALERT_LINES ID
G_SETTINGS_G G_SETTINGS ID
HOSTS_ID_G0 HOSTS ID
HOST_TYPES_G HOST_TYPES ID
MAINTENANCE_G MAINTENANCE ID
MAIN_CONFIGS_ID_G MAIN_CONFIGS ID
MONITORS_ID_G0 MONITORS ID
MONITOR_CATEGORIES_G MONITOR_CATEGORIES ID
MONITOR_TYPES_ID_G0 MONITOR_TYPES ID
M_SETTINGS_FOR_MONITORS_G M_SETTINGS_FOR_MONITORS ID
M_SETTINGS_FOR_TEMPLATES_ID_G0 M_SETTINGS_FOR_TEMPLATES ID Version 5 and above
M_SETTINGS_FOR_TYPES_G M_SETTINGS_FOR_TYPES ID
M_SETTING_ENUMS_G M_SETTING_ENUMS ID
M_SETTING_TYPES_G M_SETTING_TYPES ID
NETWORKS_ID_G0 NETWORKS ID
NETWORK_DISCOVERY_PARAMS_ID_G0 NETWORK_DISCOVERY_PARAMS ID Version 5 and above
PLUGINS_G PLUGINS ID
REPORTING_G REPORTING ID
REP_MONITORS_REPORT_ID_G REP_MONITORS REPORT_ID
RESULT_GRAPH_G RESULT_GRAPH ID
RESULT_LOG_G RESULT_LOG ID
RESULT_RAW_G RESULT_RAW ID
RESULT_STAT_G RESULT_STAT ID
SCHEDULES_ID_G SCHEDULES ID
SC_CONFIGS_ID_G SC_CONFIGS ID
SC_CONFIG_SECTIONS_ID_G SC_CONFIG_SECTIONS ID
SC_SECTION_LINES_ID_G SC_SECTION_LINES ID
SNMP_CRED_ID_G SNMP_CREDENTIALS ID
SYSLOG_G SYSLOG ID
TEMPLATES_ID_G0 TEMPLATES ID Version 5 and above
TEMPLATE_CATEGORIES_ID_G0 TEMPLATE_CATEGORIES ID Version 5 and above
TEMPLATE_MONITORS_ID_G0 TEMPLATE_MONITORS ID Version 5 and above
UNIX_CRED_ID_G UNIX_CREDENTIALS ID
WIN_CRED_ID_G WIN_CREDENTIALS ID

Check database in action

After you have fixed the indices and generators, disconnect Flamerobin, stop IPHost monitoring service, copy the fixed database as nms.fdb, replacing existing database file, and start IPHost GUI client and monitoring service again.

Watch the monitoring for at least few minutes. Make sure you can change monitors data, and that reporting data are updated as time goes by. Check iphost.log file for possible database problems after the monitoring started (you should pay extra attention to possible SQL error reports). If there are any inconsistencies, resume fixing nms-noindex.fdb, according to what you have found.

Special cases

Huge database

If you are dealing with databases more than 50-75Mb in size, it might be convenient to create temporary indices, to speed up the process of fixing up. Run and commit the following queries (in “Run a query” window)

CREATE INDEX TMP_IDX_MONITORS_ID ON MONITORS(ID);
CREATE INDEX TMP_IDX_RESULT_STAT_MID ON RESULT_STAT(MONITOR_ID);
CREATE INDEX TMP_IDX_RESULT_LOG_MID ON RESULT_LOG(MONITOR_ID);
CREATE INDEX TMP_IDX_RESULT_RAW_MID ON RESULT_RAW(MONITOR_ID);
CREATE INDEX TMP_IDX_RESULT_GRAPH_MID ON RESULT_GRAPH(MONITOR_ID);
CREATE INDEX TMP_IDX_SETTINGS_MID ON M_SETTINGS_FOR_MONITORS(MONITOR_ID);
CREATE INDEX TMP_IDX_STATE_MID ON A_STATE(MONITOR_ID);

After all the fixing is done, drop the temporary indices:

DROP INDEX TMP_IDX_MONITORS_ID;
DROP INDEX TMP_IDX_RESULT_STAT_MID;
DROP INDEX TMP_IDX_RESULT_RAW_MID;
DROP INDEX TMP_IDX_RESULT_LOG_MID;
DROP INDEX TMP_IDX_RESULT_GRAPH_MID;
DROP INDEX TMP_IDX_SETTINGS_MID;
DROP INDEX TMP_IDX_STATE_MID; 

Duplicate entries in primary indices

If you see SQL error that duplicate entries cannot be added to primary index, perhaps there are duplicate values. For the table in question, choose Show Properties, switch to Indices tab, and for the index with PRIMARY string in its name memorize which fields are participating in primary index. Then run SQL query like

SELECT 
  fieldlist,
  COUNT(*)
FROM tablename
GROUP BY fieldlist
HAVING COUNT(*) > 1

Remove all but single copy of entries with duplicate values for one of the above field(s). Index and table names can be found in RDB$INDICES system table, if not directly reported.

Preventing subsequent database failures

After you have restored the monitoring database, please take some precautions to diminish or nullify chances to have the database broken in similar manner in the future:

Keep database tidy

Huge database mean more disk system load; unless you really need that, try to avoid storing monitoring and log data for large period of time. Open, in IPHost GUI client, Settings -> Monitoring, and set, in Monitoring Settings section, proper values for data retention periods. In most cases, 30 days is enough.

Note that you can empty RESULT_* and SYSLOG tables at database recovery time, to remove those records (which will be re-filled with new data as monitoring goes on), by executing queries like

DELETE FROM RESULT_GRAPH;
DELETE FROM RESULT_LOG;
DELETE FROM RESULT_RAW;
DELETE FROM RESULT_STATS;
DELETE FROM SYSLOG;

Note: do not remove those data unless database became way too large.

Avoid bottlenecks

If you notice IPHost became slow to response, or alerts are executed with significant delay, make sure the computer with IPHost running has enough resources:

  • disk drive is fast and can handle much I/O
  • There’s at least 2Gb RAM and corresponding swap space available
  • network speed is fast and connectivity is stable

You can use tool like Process Explorer to detect possible bottlenecks.

Hardware health checks

Do not forget to check hardware for possible problem (on computer where IPHost is running). Note that you can utilize IPHost-provided hardware monitors to find possible problems as soon as possible.

Detect monitoring service failure

You are advised to use external monitoring of IPHost monitoring service to be alerted when/if monitoring service is stopped. if it is stopped, it’s mos probably indicates an error.

Monitor database health

You are strongly encouraged to setup and run the two monitors to ensure database health is in good state:

The backup mentioned is optional, but we recommend to make it mandatory.

Related topics

 

Related links