Nagios, Parent Hosts, and traceroute on the Internet
Nagios has the - very useful - feature of "parent hosts". If it deems a host A being down, it first checks its parent host, B, and reports A only as down if B is up. This goes back recursively until a host with state "up" is found and only the first "down" host is actually reported. This keeps on-call people from being bombed with alerts in case of major network outages and makes sure that the alerts that are actually sent out do reasonably accurately describe the actual outage.
As an individual who has some "external" servers in various data centers on the Internet, I would like to not be alerted multiple times that my servers at ISP C, D, and E are down if there is an outage at the ISP F hosting my Nagios installation or at one of the various exchange points temporarily rendering the servers unreachable (without me being able to do anything).
The solution sounds easy but is surprisingly hard.
Continue reading "Nagios, Parent Hosts, and traceroute on the Internet"