Naemon

From Segfault
Jump to: navigation, search

Installation

This will install Naemon, Thruk and pnp4nagios on a Debian/wheezy system. To keep things simple, let's use Apache as a webserver and use packages instead of building everything from source.

Add repository and its key:

apt-get install apt-transport-https                              # So we can add https repositories
echo 'deb https://labs.consol.de/repo/stable/debian jessie main' > /etc/apt/sources.list.d/naemon.list
gpg --keyserver keys.gnupg.net --recv-keys F8C1CA08A57B9ED7
gpg --armor --export F8C1CA08A57B9ED7 | apt-key add -

Install needed packages:

apt-get update
apt-get install naemon naemon-thruk nagios-plugins pnp4nagios apache2 php5-fpm php-apc libapache2-mod-fastcgi libapache2-mod-fcgid bsd-mailx

Note: after installing, naemon might not be able to start[1] ("Permission denied on /var/cache/naemon/live"). Restarting apache should do the trick though.

Configuration

Apache

Since we're using Apache in this setup, the only important configuration file is /etc/apache2/conf.d/thruk.conf[2] which should be provided by the naemon-thruk package.

naemon.cfg

# log_rotation_method=m                 # This is handled via logrotate now!
process_performance_data=1
date_format=iso8601                     # YYYY-MM-DD HH:MM:SS
use_regexp_matching=1                   # Needed for e.g. dependent_service_description to work with wildcards

cgi.cfg

authorized_for_all_services=admin,guest
authorized_for_all_hosts=admin,guest

Be sure to generate passwor entries for each of these users!

$ sudo htpasswd /etc/naemon/htpasswd admin

commands.cfg

To send the full (multi-line)[3][ output via email, use the LONGSERVICEOUTPUT[4] macro:

define command {
       command_name                    notify-service-by-email
       command_line                    [...]nAdditional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$\n" [...]

thruk.conf

Disable the "Send notification"[5] checkbox by default:

<cmd_defaults>
[...]
send_notification      = 0

PNP4Nagios

PNP4Nagios[6] provides performance data to Nagios or Naemon. The setup[7], in short:

sudo apt-get install rrdtool librrds-perl g++ php5-cli php5-gd           # We need PHP for PNP4Nagios to work!

As there's currently no Debian package for PNP4Nagios[8], we have to build this ourselves:

git pull https://github.com/lingej/pnp4nagios.git pnp4nagios-git
cd pnp4nagios-git
./configure --prefix=/opt/pnp4nagios --with-nagios-user=naemon --with-nagios-group=naemon

make all
sudo make fullinstall                                                    # install install-webconf install-config install-init
sudo cp -i sample-config/httpd.conf /etc/apache2/conf.d/pnp4nagios.conf

You may want to adjust AuthName in the pnp4nagios.conf to the value configured in /etc/apache2/conf.d/thruk.conf.

Once installed properly, we need to move the install.php out of the way:

sudo mv /opt/pnp4nagios/share/install.php{,.installed}

Naemon performance data

For PNP4Nagios to work, we need to enable performance data collection: → Enable Naemon performance data

pnp wrapper

Not every Nagios plugin delivers performance data, a good example is check_procs[9]:

$ check_procs -a foo -w 1: -c 0:
PROCS OK: 2 processes with args 'foo'

To add performance data to check_procs, we could use a wrapper[10] script:

#!/bin/bash
LINE=`/usr/local/nagios/libexec/check_procs $*`
RC=$?
COUNT=`echo $LINE | awk '{print $3}'`
PROCS=`expr $COUNT - 1`
LINE=`echo $LINE | sed "s/: $COUNT /: $PROCS /"`
echo $LINE \| procs=$PROCS
exit $RC

With that in place, performance data will be shown:

$ check_procs.sh -a foo -w 1: -c 0:
PROCS OK: 2 processes with args 'foo'| procs=2

Note: check_procs will include performance data in later releases.[11][12]

TODO:

NRPE

On MacOS, we may have to create a new user first:

dscl . -create /Users/nagios
dscl . -create /Users/nagios UniqueID 500
dscl . -create /Users/nagios PrimaryGroupID 80
dscl . -create /Users/nagios UserShell /usr/bin/false
dscl . -create /Users/nagios NFSHomeDirectory /var/lib/nagios

Create its home directory:

mkdir -m0500 /var/lib/nagios{,/.ssh}
chown -R nagios:admin /var/lib/nagios/

...and install an SSH configuration with the appropriate permissions, if needed:

$ cat ~nagios/.ssh/config 
Host nagios.example.org
       User                    nagios
       IdentityFile            ~/.ssh/hostname-key
       RemoteForward           1234 localhost:5666
       ExitOnForwardFailure    yes
       ServerAliveInterval     60
       ServerAliveCountMax     3

$ chmod u+w ~nagios/.ssh/
$ sudo -u nagios ssh-keygen -t ed25519 -f ~nagios/.ssh/hostname-key

Test once to get the server's host key added to ~/.ssh/known_hosts and then lock down the user's home directory again:

$ sudo -u nagios ssh -v -N nagios.example.org
$ chmod -R a-w ~nagios/

Bugs

NRPE: Could not complete SSL handshake

This:

NRPE: Could not complete SSL handshake 

Now, this would be expected on a bad connection or it the other host is simply not responding, but NRPE would print it every time. Turns out I was using check_tcp to check if NRPE is listening on port 5666. But port 5666 is an SSL-enabled port and check_tcp wasn't using SSL - it just wanted to check if something is listening. So the check returned OK (as expected) but left this nasty message in the logs every time it ran :-)[13]

NRPE: Socket timeout after 10 seconds

Adjust the timeout for check_nrpe:

define command {
       command_name             check_nrpe
       command_line             $USER1$/check_nrpe -H localhost -t 60 -p $ARG1$ -c $ARG2$
}

Magic number checking on storable file failed

thruk.log would print this:

[ERROR][Thruk.Controller.Root] Caught exception in Thruk::Controller::status->index "Magic number checking on storable file failed at /usr/lib/naemon/perl5/x86_64-linux-gnu-thread-multi/Storable.pm line 381, at /usr/share/naemon/script/../lib/Thruk/Utils/Cache.pm line 154"
[ERROR][Thruk.Controller.error] internal server error
[ERROR][Thruk.Controller.error] on page: https://naemon.example.org/thruk/cgi-bin/status.cgi?...

Apparently this happens when the thruk.cache file or token file is corrupted or empty.[14] After is has been deleted, Naemon should work again:

$ ls -l /var/cache/naemon/thruk/thruk.cache
-rw-rw----. 1 www-data naemon 0 Nov 12 12:31 /var/cache/naemon/thruk/thruk.cache
$ perl -MStorable -e 'retrieve "/var/cache/naemon/thruk/thruk.cache"'
Magic number checking on storable file failed at /usr/lib/perl/5.14/Storable.pm line 379, at -e line 1

$ rm /var/cache/naemon/thruk/thruk.cache

Exclude processes from check_procs

TBD...

Service notifications while in downtime

Windows

TBD!

Links

References