How to install Nagios Core 4.4.5 on Ubuntu Server 18.04

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

How to install Nagios Core 4.4.5 on Ubuntu Server 18.04

Post: # 755Post LHammonds
Wed Sep 18, 2019 8:45 am

------------- WORK-IN-PROGRESS -------------

Greetings and salutations,

I hope this thread will be helpful to those who follow in my foot steps as well as getting any advice based on what I have done / documented.

To discuss this thread, please participate here: >> INSERT UBUNTU FORUMS LINK HERE <<

Overview

This thread will cover installation of a Nagios monitoring system on a dedicated Ubuntu server. The server will be installed inside a virtual machine.

I choose to build Nagios from the source download rather than install from the repository with apt-get. The reason is that you get the newer version this way and you have full control over the installation options.

This documentation will only cover a very specific installation. Nagios was designed to be able to handle just about anything you want to monitor so it will be different for each install and even with the same hardware needing to be monitored, two administrators may decide differently on what needs to be monitored.

Tools utilized in this process

Helpful links

The list below are sources of information that helped me configure this system as well as some places that might be helpful to me later on as this process continues.
Assumptions

This documentation will need to make use of some very-specific information that will most-likely be different for each person / location. And as such, I will note some of these in this section. They will be highlighted in red throughout the document as a reminder that you should plug-in your own value rather than actually using my "place-holder" value.

Under no circumstance should you use the actual values I list below. They are place-holders for the real thing. This is just a checklist template you need to have answered before you start the install process.

The RED variables below are used throughout this document, you need to substitute it for what your company uses. Use the list below as a template you need to have answered before you continue.

  • Ubuntu Server name: srv-nagios
  • Internet domain: mydomain.com
  • Ubuntu Server IP address: 192.168.107.21
  • Ubuntu Admin ID: administrator
  • Ubuntu Admin Password: myadminpass
  • Nagios Admin Password: mynagiospass
  • Nagios NSClient Port #: 12489
  • Nagios NRPE Port #: 5666
  • Nagios Service Password: myservicepass
  • Email Server (remote): 192.168.107.25
  • Windows Share ID: myshare
  • Windows Share Password: mysharepass
I also assume the reader knows how to use the VI editor. If not, you will need to beef up your skill set or use a different editor in place of it.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Install Ubuntu Server

Post: # 756Post LHammonds
Wed Sep 18, 2019 8:49 am

Install Ubuntu Server

The Ubuntu Server Long-Term Support (LTS) is free but we have the option of buy support and that is the main reason this server was selected.

The steps for setting up the base server are covered in this article: How to install and configure Ubuntu Server

It is assumed that the server was configured according to that article with the exceptions that the assumptions in red (variables above) are used instead of the assumptions in that document.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Nagios Prerequisites

Post: # 757Post LHammonds
Wed Sep 18, 2019 8:54 am

Nagios Prerequisites
  1. Install the required programs:

    Code: Select all

    sudo apt -y install autoconf gcc libc6 make apache2 php7.2 libapache2-mod-php7.2 libgd-dev wget unzip
  2. Disable default web sites:

    Code: Select all

    sudo a2dissite 000-default
    sudo a2dissite default-ssl
    sudo systemctl restart apache2
  3. Create users and groups:

    Code: Select all

    mkdir -p /etc/nagios /var/nagios
    groupadd --system --gid 9000 nagios
    groupadd --system --gid 9001 nagcmd
    adduser --system --gid 9000 --home /usr/local/nagios nagios
    usermod --groups nagcmd nagios
    usermod --append --groups nagcmd www-data
    chown nagios:nagios /usr/local/nagios /etc/nagios /var/nagios
Firewall Rules

Edit the firewall script that was created during the initial setup of the server (if you followed my instructions):

Code: Select all

sudo vi /var/scripts/prod/en-firewall.sh
Add (or enable) the following:

Code: Select all

echo "Adding Web Server rules"
ufw allow proto tcp to any port 80 comment 'HTTP Service' 1>/dev/null 2>&1
echo "Allowing Nagios connections"
ufw allow proto tcp to any port 12489 comment 'Nagios Server' 1>/dev/null 2>&1
ufw allow proto tcp to any port 5666 comment 'Nagios Server' 1>/dev/null 2>&1
Run the updated rules:

Code: Select all

sudo /var/scripts/prod/en-firewall.sh

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Build and Install Nagios from Source

Post: # 758Post LHammonds
Wed Sep 18, 2019 9:20 am

Build and Install Nagios from Source
  • Download Nagios software (NOTE: You can use newer links once new versions become available):

    Code: Select all

    cd /usr/local/src
    wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.4.5.tar.gz
    
  • Build and install Nagios Core:

    Code: Select all

    tar -xzvf /usr/local/src/nagios-4.4.5.tar.gz
    cd /usr/local/src/nagios-4.4.5
    sudo ./configure --with-httpd-conf=/etc/apache2/sites-enabled --sysconfdir=/etc/nagios --localstatedir=/var/nagios --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagcmd --with-mail=/usr/bin/sendemail
    sudo make all
    sudo make install
    sudo make install-init
    sudo make install-daemoninit
    sudo make install-commandmode
    sudo make install-config
    sudo make install-webconf
    sudo a2enmod rewrite
    sudo a2enmod cgi
    
  • Correct the Apache2 configuration file to conform to Ubuntu standards:

    Code: Select all

    sudo mv /etc/apache2/sites-enabled/nagios.conf /etc/apache2/sites-available/.
    sudo a2ensite nagios
    sudo systemctl restart apache2
    
  • Edit the resources file:

    Code: Select all

    sudo vi /etc/nagios/resources.cfg
  • Set the passwords to match your environment:

    Code: Select all

    # Sets $USER1$ to be the path to the plugins
    $USER1$=/usr/local/nagios/libexec
    # Password to access NSClient on Windows servers.
    $USER5$=my-nsclient-password
    # Password to access ESXi servers.
    $USER6$=my-esxi-password
    # Sets $USER2$ to be the path to event handlers
    #$USER2$=/usr/local/nagios/libexec/eventhandlers
    # Store some usernames and passwords (hidden from the CGIs)
    #$USER3$=someuser
    #$USER4$=somepassword
    
  • Edit the commands:

    Code: Select all

    sudo vi /etc/nagios/objects/commands.cfg
  • Change both sendemail references to match the correct sendemail syntax:

    Code: Select all

    define command{
     command_name    notify-host-by-email
     command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/sendemail -s srv-mail:25 -f "admin <admin@nagios.server>" -t $CONTACTEMAIL$ -u "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **"
    }
     
    define command{
    command_name    notify-service-by-email
    command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/sendemail -s srv-mail:25 -f "admin <admin@nagios.server>" -t $CONTACTEMAIL$ -u "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **"
    }
  • Save and close commands.cfg
  • Edit the contacts:

    Code: Select all

    sudo vi /etc/nagios/objects/contacts.cfg
  • Change the following:

    Code: Select all

    define contact{
        contact_name     nagiosadmin             ; Short name of user
        use              generic-contact         ; Inherit default values from generic-contact template (defined above)
        alias            John Doe                ; Full name of user
        email            John.Doe@mydomain.com   ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
    }
  • Save and close contacts.cfg
  • Set the nagiosadmin password to mynagiospassword by typing the following:

    Code: Select all

    sudo htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
    sudo chown root:www-data /etc/nagios/htpasswd.users
    sudo chmod 640 /etc/nagios/htpasswd.users
    sudo systemctl reload apache2
  • Type the following to avoid startup problems: (NOTE: This is not documented anywhere, it is just my trial, error and observation)

    Code: Select all

    sudo mkdir -p /usr/local/nagios/var/spool/checkresults
    sudo chown nagios:nagios /var/nagios/spool/checkresults
    sudo chown nagios:nagios /var/nagios/spool
    sudo chown nagios:nagios /var/nagios
  • Check your Nagios configuration file for errors. Look for errors in red.

    Code: Select all

    /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
  • NOTE TO SELF: I need to generate (and document) an SSL certificate to enable SSL to protect the password during authentication. Self-Signed Certs
  • Start Nagios for the 1st time.

    Code: Select all

    sudo systemctl start nagios
  • Access the web-based administration utility at http://192.168.107.21/nagios/ (use nagiosadmin for the ID and mynagiospassword for the password)
Fixing Error: Could not open command file

If you get this error message when trying to re-submit a service check from the web interface:

Code: Select all

Error:  Could not open command file '/var/nagios/rw/nagios.cmd' for update!

The permissions on the external command file and/or directory may be incorrect.  Read the FAQs on how to setup proper permissions.
Run this command to fix the group permissions:

Code: Select all

chown --recursive nagios:nagcmd /var/nagios/rw
This group should have been set correctly when we used "--with-command-group=nagcmd" when we did the ".configure" step earlier. This might be fixed in later versions though.

NOTE: The Nagios server is now up-and-running but doing absolutely nothing. ;) We need plugins to actually make it do something so we will install a base plugin pack. However, we will eventually need to get other plugins and maybe write our own in order to monitor everything we want.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Plugins

Post: # 759Post LHammonds
Wed Sep 18, 2019 9:22 am

Nagios Plugin Prerequisites

Base requirements:

Code: Select all

sudo apt install -y libmcrypt-dev dc build-essential
Nagios Plugin Requirements for check_snmp:

Code: Select all

perl -MCPAN -e 'install Net::SNMP'
Configure as much as possible automatically? yes
sudo apt -y install snmp
Requirements for check_mysql: (NOTE: For my site, this is not necessary because I will run it locally on MariaDB/MySQL server)

Code: Select all

sudo apt -y install libmysqlclient-dev
Requirements for check_nrpe:

Code: Select all

sudo apt -y install libssl-dev
Build and Install Nagios Plugins from Source

Download, build and install Nagios plugins (NOTE: You can use newer links once new versions become available):

Code: Select all

cd /usr/local/src
wget https://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz
tar xzf /usr/local/src/nagios-plugins-2.2.1.tar.gz
cd /usr/local/src/nagios-plugins-2.2.1
sudo ./configure --sysconfdir=/etc/nagios --localstatedir=/var/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-openssl
sudo make
sudo make install
Download, build and install NRPE plugin (for 64-bit servers)

Code: Select all

cd /usr/local/src
wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.2.1/nrpe-3.2.1.tar.gz
tar xzf /usr/local/src/nrpe-3.2.1.tar.gz
cd /usr/local/src/nrpe-3.2.1
sudo ./configure --sysconfdir=/etc/nagios --libexecdir=/usr/local/nagios/libexec --prefix=/usr/local/nagios --localstatedir=/var/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-nrpe-user=nagios --with-nrpe-group=nagios --enable-ssl=yes --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/x86_64-linux-gnu
sudo make all
sudo make install-plugin
Verify that Plugins are Working!

For all the plugins we intend on using, we need to verify they are working before trying to integrate them into Nagios. However, not all plugins will work without first configuring the target to be monitored.

Ping an IP address you know to be active:

Code: Select all

/usr/local/nagios/libexec/check_icmp -H 192.168.107.20
Check for an HTTP reply from a web server:

Code: Select all

/usr/local/nagios/libexec/check_http -H 192.168.107.20
Check for a response from an HP LaserJet printer:

Code: Select all

/usr/local/nagios/libexec/check_hpjd -H 192.168.107.51 -C public
Check the uptime of a router via SNMP:

Code: Select all

/usr/local/nagios/libexec/check_snmp -H 192.168.107.1 -C public -o sysUpTime.0
NOTE: For whatever reason, this command hangs on me. Not sure what I did wrong this time but I'll track it down, fix it and update these dox.

Check a MySQL/MariaDB server (if on local host):

Code: Select all

/usr/local/nagios/libexec/check_mysql -H 192.168.107.20 -P 3306 -u mysqlid -p mysqlpassword
NOTE: This will fail if you do not configure the MariaDB/MySQL server 1st. However, you might want to run the MySQL command remotely via NRPE instead.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Configuration Framework

Post: # 760Post LHammonds
Wed Sep 18, 2019 9:24 am

Configuration Framework

The 1st thing I like to do is the creation of the folder structure I plan to use and then copy or rename all example configuration files to unused text files. This ensures the originals are preserved as a reference.

Code: Select all

sudo mkdir -p /etc/nagios/servers
sudo mkdir -p /etc/nagios/printers
sudo mkdir -p /etc/nagios/switches
sudo mkdir -p /etc/nagios/workstations
sudo cp /etc/nagios/nagios.cfg /etc/nagios/example-nagios.txt
sudo cp /etc/nagios/resource.cfg /etc/nagios/example-resource.txt
sudo mv /etc/nagios/objects/windows.cfg /etc/nagios/servers/example-win.txt
sudo cp /etc/nagios/objects/localhost.cfg /etc/nagios/servers/localhost.txt
sudo mv /etc/nagios/objects/localhost.cfg /etc/nagios/servers/example-local.txt
sudo mv /etc/nagios/objects/switch.cfg /etc/nagios/switches/example-sw.txt
sudo mv /etc/nagios/objects/printer.cfg /etc/nagios/printers/example-ptr.txt
sudo cp /etc/nagios/objects/commands.cfg /etc/nagios/objects/example-commands.txt
sudo cp /etc/nagios/objects/contacts.cfg /etc/nagios/objects/example-contacts.txt
sudo cp /etc/nagios/objects/templates.cfg /etc/nagios/objects/example-templates.txt
sudo cp /etc/nagios/objects/timeperiods.cfg /etc/nagios/objects/example-timeperiods.txt
sudo chown --recursive nagios:nagios /etc/nagios/*
sudo chmod --recursive 0664 /etc/nagios/*.cfg
Edit nagios.cfg

Code: Select all

sudo vi /etc/nagios/nagios.cfg
Online line 35, place a comment (#) character in front of this line since it will be loaded automatically simply because it exists in the server folder:

Code: Select all

#cfg_file=/etc/nagios/objects/localhost.cfg
Uncomment/add lines around 51 thru 54 so it looks like this:

Code: Select all

cfg_dir=/etc/nagios/servers
cfg_dir=/etc/nagios/printers
cfg_dir=/etc/nagios/switches
cfg_dir=/etc/nagios/workstations
This allows you to place config files in those folders and they will be automatically picked up without having to edit the Nagios.cfg file. I have a file for each object...or you could place all objects into a single file but it makes it harder to edit with the more you monitor.


verify.sh

Anytime you need to make a configuration change, you should always run a verification against your changes to ensure the Nagios service will be able to start up once you restart the service for the change to take effect. This is called the pre-flight check and this script will make it easier to run.

The full command is this:

Code: Select all

/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
As you can see, it is a lot to type/remember. I prefer to have a handy little script in the configuration folder to make it easier to run a verification.

/etc/nagios/verify.sh

Code: Select all

sudo touch /etc/nagios/verify.sh
sudo chmod 0755 /etc/nagios/verify.sh
sudo printf '%s\n' '#!/bin/bash' '/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg' >> /etc/nagios/verify.sh
Now all that has to be done is to run the verify script.

If you are in the /etc/nagios folder, you type:

Code: Select all

./verify.sh
If currently sitting in a sub-folder, just type:

Code: Select all

../verify.sh

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Host Groups

Post: # 761Post LHammonds
Wed Sep 18, 2019 9:25 am

Configuration - Host Groups

I group all of my objects according to how I like to see them separated. This is done using "hostgroups" when defining a host. I keep all of these hostgroups defined in a single configuration file.

The file is referenced in /etc/nagios/nagios.cfg with the following line:

Code: Select all

cfg_file=/etc/nagios/objects/hostgroups.cfg
Here is a sample of what is contained in that file:

/etc/nagios/objects/hostgroups.cfg

Code: Select all

###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################
 
define hostgroup{
  hostgroup_name    ibm-servers
  alias             IBM Servers
}
 
define hostgroup{
  hostgroup_name    aix-servers
  alias             IBM AIX Servers
}
 
define hostgroup{
  hostgroup_name    ubuntu-servers
  alias             Ubuntu Servers
}
 
define hostgroup{
  hostgroup_name    esx-servers
  alias             ESX Servers
}
 
define hostgroup{
  hostgroup_name    windows2016-servers
  alias             Windows 2016 Servers
}
 
define hostgroup{
  hostgroup_name    windows2012-servers
  alias             Windows 2012 Servers
}
 
define hostgroup{
  hostgroup_name    windows2008-servers
  alias             Windows 2008 Servers
}
 
define hostgroup{
  hostgroup_name    windows2003-servers
  alias             Windows 2003 Servers
}
 
define hostgroup{
  hostgroup_name    windows2000-servers
  alias             Windows 2000 Servers
}
 
define hostgroup{
  hostgroup_name    win10-pcs
  alias             Windows 10 PCs
}
 
define hostgroup{
  hostgroup_name    win7-pcs
  alias             Windows 7 PCs
}
 
define hostgroup{
  hostgroup_name    winxp-pcs
  alias             Windows XP PCs
}
 
define hostgroup{
  hostgroup_name    switches
  alias             Network Switches
}
 
define hostgroup{
  hostgroup_name    wireless
  alias             Wireless Access Points
}
 
define hostgroup{
  hostgroup_name    printers-hp
  alias             HP Printers
}
 
define hostgroup{
  hostgroup_name    printers-brother
  alias             Brother Printers
}
 
define hostgroup{
  hostgroup_name    copiers-toshiba
  alias             Toshiba Copiers
}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Contacts

Post: # 762Post LHammonds
Wed Sep 18, 2019 9:25 am

Configuration - Contacts

It would be a good idea to define contact groups and associate services to them. Further down, define your contacts and what contact groups they belong to. This will make it easier to maintain, even if you have a small shop or grow to a large shop, maintenance is primarily handled at the contact level.

Here is an example of what I mean:

/etc/nagios/objects/contacts.cfg

Code: Select all

###############################################################################
# CONTACTS.CFG - CONTACT/CONTACTGROUP DEFINITIONS
#
# Last Modified: 2012-05-25
###############################################################################
 
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################

define contactgroup{
  contactgroup_name       windows-server-admins
  alias                   Windows Server Administrators
}

define contactgroup{
  contactgroup_name       linux-server-admins
  alias                   Linux Server Administrators
}

define contactgroup{
  contactgroup_name       windows-pc-admins
  alias                   Windows PC Administrators
}

define contactgroup{
  contactgroup_name       ibm-bladecenter-admins
  alias                   IBM BladeCenter Administrators
}

define contactgroup{
  contactgroup_name       network-admins
  alias                   Network Administrators
}

define contactgroup{
  contactgroup_name       phone-admins
  alias                   Phone Administrators
}

define contactgroup{
  contactgroup_name       printer-admins
  alias                   Printer Administrators
}

define contactgroup{
  contactgroup_name       copier-admins
  alias                   Copier/Fax/Scanner Administrators
}

###############################################################################
#
# CONTACTS
#
###############################################################################

define contact{
  contact_name    dirkdiggler-email
  use             generic-contact
  alias           Dirk Diggler
  email           dirk.diggler@mydomain.com
  contactgroups   windows-server-admins,windows-pc-admins,linux-server-admins,ibm-bladecenter-admins,network-admins,phone-admins,printer-admins,copier-admins
}

define contact{
  contact_name    johndoe-email
  use             generic-contact
  alias           John Doe
  email           john.doe@mydomain.com
  contactgroups   windows-server-admins,windows-pc-admins,ibm-bladecenter-admins,network-admins,printer-admins,copier-admins
}

define contact{
  contact_name    maryjane-email
  use             generic-contact
  alias           Mary Jane
  email           mary.jane@mydomain.com
  contactgroups   windows-server-admins,windows-pc-admins,network-admins,phone-admins,printer-admins,copier-admins
}

define contact{
  contact_name    dirkdiggler-pager
  use             generic-pager
  alias           Dirk Diggler Pager
  pager           8005551234@txt.att.net
}

define contact{
  contact_name    johndoe-pager
  use             generic-pager
  alias           John Doe Pager
  pager           8005555678@txt.att.net
}

define contact{
  contact_name    maryjane-pager
  use             generic-pager
  alias           Mary Jane Pager
  pager           8005559876@txt.att.net
}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Timeperiods

Post: # 763Post LHammonds
Wed Sep 18, 2019 9:26 am

Configuration - Timeperiods

The /etc/nagios/objects/timeperiods.cfg file is fairly straight-forward. You can make adjustments to what is defined as "work hours" and leave it at that or add in some custom settings depending on your situation.

Code: Select all

###############################################################################
# TIMEPERIODS.CFG - SAMPLE TIMEPERIOD DEFINITIONS
#
#
# NOTES: This config file provides you with some example timeperiod definitions
#        that you can reference in host, service, contact, and dependency
#        definitions.
#
#        You don't need to keep timeperiods in a separate file from your other
#        object definitions.  This has been done just to make things easier to
#        understand.
#
###############################################################################

###############################################################################
#
# TIMEPERIOD DEFINITIONS
#
###############################################################################

# This defines a timeperiod where all times are valid for checks,
# notifications, etc.  The classic "24x7" support nightmare. :-)

define timeperiod {

    name                    24x7
    timeperiod_name         24x7
    alias                   24 Hours A Day, 7 Days A Week

    sunday                  00:00-24:00
    monday                  00:00-24:00
    tuesday                 00:00-24:00
    wednesday               00:00-24:00
    thursday                00:00-24:00
    friday                  00:00-24:00
    saturday                00:00-24:00
}

# This defines a timeperiod that is normal workhours for
# those of us monitoring networks and such in the U.S.

define timeperiod {

    name                    workhours
    timeperiod_name         workhours
    alias                   Normal Work Hours

    monday                  09:00-17:00
    tuesday                 09:00-17:00
    wednesday               09:00-17:00
    thursday                09:00-17:00
    friday                  09:00-17:00
}

# This defines the *perfect* check and notification
# timeperiod

define timeperiod {

    name                    none
    timeperiod_name         none
    alias                   No Time Is A Good Time
}

# Some U.S. holidays
# Note: The timeranges for each holiday are meant to *exclude* the holidays from being
# treated as a valid time for notifications, etc.  You probably don't want your pager
# going off on New Year's.  Although your employer might... :-)

define timeperiod {

    name                    us-holidays
    timeperiod_name         us-holidays
    alias                   U.S. Holidays

    january 1               00:00-00:00     ; New Years
    monday -1 may           00:00-00:00     ; Memorial Day (last Monday in May)
    july 4                  00:00-00:00     ; Independence Day
    monday 1 september      00:00-00:00     ; Labor Day (first Monday in September)
    thursday 4 november     00:00-00:00     ; Thanksgiving (4th Thursday in November)
    december 25             00:00-00:00     ; Christmas
}



# This defines a modified "24x7" timeperiod that covers every day of the
# year, except for U.S. holidays (defined in the timeperiod above).

define timeperiod {

    name                    24x7_sans_holidays
    timeperiod_name         24x7_sans_holidays
    alias                   24x7 Sans Holidays

    use                     us-holidays     ; Get holiday exceptions from other timeperiod

    sunday                  00:00-24:00
    monday                  00:00-24:00
    tuesday                 00:00-24:00
    wednesday               00:00-24:00
    thursday                00:00-24:00
    friday                  00:00-24:00
    saturday                00:00-24:00
}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Templates

Post: # 764Post LHammonds
Wed Sep 18, 2019 9:27 am

Configuration - Templates

The file is also fairly straight-forward. You can use what is defined in there or make changes to them or add new templates as I have done. In case you spotted references to them in earlier examples, I will share some of the custom ones here (this is only part of the entire file)

/etc/nagios/objects/templates.cfg

Code: Select all

define host{
  name                            template-server-critical
  notifications_enabled           1
  check_period                    24x7
  check_interval                  5
  retry_interval                  1
  check_command                   check-host-alive
  max_check_attempts              10
  event_handler_enabled           1
  flap_detection_enabled          1
  failure_prediction_enabled      1
  process_perf_data               1
  retain_status_information       1
  retain_nonstatus_information    1
  notification_period             24x7
  notification_interval           120
  notification_options            d,u,r
  register                        0
}

define host{
  name                            template-server-non-critical
  notifications_enabled           1
  check_period                    workhours
  check_interval                  5
  retry_interval                  1
  check_command                   check-host-alive
  max_check_attempts              10
  event_handler_enabled           1
  flap_detection_enabled          1
  failure_prediction_enabled      1
  process_perf_data               1
  retain_status_information       1
  retain_nonstatus_information    1
  notification_period             workhours
  notification_interval           120
  notification_options            d,u,r
  register                        0
}

define host{
  name                    linux-server
  use                     template-server-critical
  contact_groups          linux-server-admins
  register                0
}

define host{
  name                    esx-server
  use                     template-server-critical
  contact_groups          ibm-bladecenter-admins
  register                0
}

define host{
  name                    toshiba-copier
  use                     template-server-non-critical
  notification_options    n
  contact_groups          copier-admins
  register                0
}

define host{
  name                    wirelessap
  use                     template-server-non-critical
  notification_options    n
  contact_groups          network-admins
  register                0
}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Sounds

Post: # 765Post LHammonds
Wed Sep 18, 2019 9:27 am

Configuration - Custom Sounds

Nagios allows custom WAV sounds to be played as alerts on the web page but does not come with any (that I could tell). So I went through my audio collection and pulled out a few clips I thought would work good for the various event types.

Each audio clip was converted to WAV format and stereo turned into mono.

Nagios-Sounds.7z (80 files, 10 MB)

To enable this, here is what you can do.

Edit /etc/nagios/cgi.cfg
Around line #313, you will find the following section commented out:

Code: Select all

#host_unreachable_sound=host-unreachable.wav
#host_down_sound=host-down.wav
#service_critical_sound=critical.wav
#service_warning_sound=warning.wav
#service_unknown_sound=warning.wav
#normal_sound=noproblem.wav
I do not know about you but I tend to have services in the red all the time (WindowsUpdate) which tend to stay that way more often than not...so I would not be interested at all in service alert sounds. However, I would like to have audible alerts for host issues and this is how I modified mine:

Code: Select all

host_unreachable_sound=host-unreachable.wav
host_down_sound=host-down.wav
#service_critical_sound=critical.wav
#service_warning_sound=warning.wav
#service_unknown_sound=warning.wav
normal_sound=noproblem.wav
I then used the following files from my audio collection and copied them to my Samba share and moved/renamed them to the proper location.

Star Trek\command-path-discontinuity.wav
Star Trek\losing-power.wav
Star Trek\i-hate-prototypes.wav

On the server, I then typed these commands:

Code: Select all

mv /srv/samba/share/command-path-discontinuity.wav /usr/local/nagios/share/media/host-unreachable.wav
mv /srv/samba/share/losing-power.wav /usr/local/nagios/share/media/host-down.wav
mv /srv/samba/share/i-hate-prototypes.wav /usr/local/nagios/share/media/noproblem.wav
chown nagios:nagios /usr/local/nagios/share/media/*.wav
chmod 0444 /usr/local/nagios/share/media/*.wav
As long as your browser can play wav files, you will be able to hear any changes in host status while you leave your web browser on the nagios monitoring page. I typically have mine on the tactical overview page.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Sample Configs

Post: # 766Post LHammonds
Wed Sep 18, 2019 9:28 am

Configuration - Sample Ubuntu Server Config File

Here is my basic shell for an Ubuntu server:

/etc/nagios/servers/srv-php.cfg

Code: Select all

###############################################################################
#
# HOST DEFINITION
#
###############################################################################

define host{
  use             ubuntu-server
  host_name       srv-php
  alias           SRV-php
  address         192.168.107.23
  hostgroups      ubuntu-servers
  contacts        linux-admin-pager
  parents         srv-esxi1
}

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

define service{
  use                     generic-service
  host_name               srv-php
  service_description     PING
  check_command           check_icmp!100.0,20%!500.0,60%
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     HTTP
  check_command           check_http
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     APT Upgrade
  check_command           check_nrpe!check_apt
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     APT Upgrade MotD
  check_command           check_nrpe!check_apt_motd
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     All Disks
  check_command           check_nrpe!check_disk_all
  notifications_enabled   1
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     Current Load
  check_command           check_nrpe!check_load
  notifications_enabled   1
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     Total Processes
  check_command           check_nrpe!check_total_procs
  notifications_enabled   1
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     Swap Usage
  check_command           check_nrpe!check_swap
  notifications_enabled   1
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     Zombie Processes
  check_command           check_nrpe!check_zombie_procs
  notifications_enabled   1
}

define service{
  use                     generic-service
  host_name               srv-php
  service_description     Users
  check_command           check_nrpe!check_users
}
Configuration - Sample Windows Server Config File

Here is my basic shell for a Windows server:

/etc/nagios/servers/srv-mssql.cfg

Code: Select all

define host{
  use             windows-server
  host_name       srv-mssql
  alias           Win2008-SRV-GP
  address         192.168.107.69
  hostgroups      windows2012-servers
  contacts        windows-admin-email
  parents         srv-esxi2
}

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     NSClient++ Version
  check_command           check_nt!CLIENTVERSION -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     Uptime
  check_command           check_nt!UPTIME -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     CPU Load
  check_command           check_nt!CPULOAD!-l 5,80,90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     Memory Usage
  check_command           check_nt!MEMUSE!-w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     hd-service
  host_name               srv-mssql
  service_description     Drive C:
  check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     hd-service
  host_name               srv-mssql
  service_description     Drive D:
  check_command           check_nt!USEDDISKSPACE!-l d -w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     MS SQL Server
  check_command           check_nt!SERVICESTATE!-d SHOWALL -l MSSQLSERVER -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     SQL Server Agent
  check_command           check_nt!SERVICESTATE!-d SHOWALL -l SQLSERVERAGENT -H $HOSTADDRESS$ -p 12489 -s $USER5$
}

define service{
  use                     generic-service
  host_name               srv-mssql
  service_description     WindowsUpdates
  check_command           check_nrpe!check_updates!1
}

## This can be used for servers that require the console to be logged in.
#define service{
#  use                     generic-service
#  host_name               srv-mssql
#  service_description     Explorer
#  check_command           check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe -H $HOSTADDRESS$ -p 12489 -s $USER5$
#}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

More Sample Configs

Post: # 767Post LHammonds
Wed Sep 18, 2019 9:29 am

Configuration - Sample Network Switch Config File

Here is my basic shell for a switch:

NOTE: The MIB codes are specific to the hardware, you probably will need to research the MIB that matches your hardware.

Code: Select all

###############################################################################
# Switches.cfg
#
# Last Modified: 2012-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
  use             summit-switch
  host_name       SW-TX-IS
  alias           Texas IS Area
  address         192.168.107.230
  hostgroups      switches
  parents         SW-TX-Core
}

define host{
  use             cisco-switch
  host_name       SW-TX-FD
  alias           Texas Front Desk
  address         192.168.107.231
  hostgroups      switches
  parents         SW-TX-FD
}

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Ping switch

define service{
  use                     switch-critical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     PING
  check_command           check_ping!200.0,20%!600.0,60%
}

# Monitor uptime via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Uptime
  check_command           check_snmp!-C public -o sysUpTime.0
}

# Monitor Contact via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Contact
  check_command           check_snmp!-C public -o sysContact.0
}

# Monitor Location via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Location
  check_command           check_snmp!-C public -o sysLocation.0
}

# Monitor Over Temperature Alarm via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Temperature Over Alarm
  check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.7.0
}

# Monitor Current Temperature via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Temperature Current
  check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.8.0
}

# Monitor the Primary Software Revision Number via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Software Rev 1st
  check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.13.0
}

# Monitor the Secondary Software Revision Number via SNMP

define service{
  use                     switch-noncritical-service
  host_name               SW-TX-IS,SW-TX-FD
  service_description     Software Rev 2nd
  check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.14.0
}
Configuration - Sample HP Printer Config File

Here is my basic shell for an HP printer:

Code: Select all

###############################################################################
# Printer-HP.cfg
#
# Last Modified: 2012-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
  use             generic-printer
  host_name       PTR-TX-ADMIN
  alias           Texas Admin
  address         192.168.107.254
  hostgroups      printers-hp
  parents         SW-TX-Core
}

define host{
  use             generic-printer
  host_name       PTR-TX-ADMIN-COLOR
  alias           Texas Admin - HPColor
  address         192.168.107.253
  hostgroups      printers-hp
  parents         SW-TX-Core
}

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

define service{
  use                     hp-noncritical-service
  host_name               PTR-TX-ADMIN,PTR-TX-ADMIN-COLOR
  service_description     PING
  check_command           check_ping!3000.0,80%!5000.0,100%
}

define service{
  use                     hp-noncritical-service
  host_name               PTR-TX-ADMIN,PTR-TX-ADMIN-COLOR
  service_description     Printer Status
  check_command           check_hpjd!-C public
}
Configuration - Sample Brother Printer Config File

Here is my basic shell for an Brother printer:

Code: Select all

###############################################################################
# Printer-Brother.cfg
#
# Last Modified: 2010-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
  use             generic-printer
  host_name       PTR-TX-IS
  alias           Texas IS - ISHP
  address         192.168.107.252
  hostgroups      printers-brother
  parents         SW-TX-Core
}

define host{
  use             generic-printer
  host_name       PTR-TX-FD
  alias           Texas Front Desk
  address         192.168.107.251
  hostgroups      printers-brother
  parents         SW-TX-Core
}

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Create a service for "pinging" the printer occassionally.  Useful for monitoring RTA, packet loss, etc.

define service{
  use                     brother-noncritical-service
  host_name               PTR-TX-IS,PTR-TX-FD
  service_description     PING
  check_command           check_ping!3000.0,80%!5000.0,100%
  normal_check_interval   10
  retry_check_interval    1
}
Configuration - Sample Toshiba Copier Config File

Here is my basic shell for a Toshiba Copier:

Code: Select all

###############################################################################
# Copier-Toshiba.cfg
#
# Last Modified: 2012-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
  use             toshiba-copier
  host_name       TE-COPIER-01
  alias           Toshiba e-Studio255
  address         192.168.107.250
  hostgroups      copiers-toshiba
  parents         SW-TX-Core
}

define host{
  use             toshiba-copier
  host_name       TE-COPIER-02
  alias           Toshiba e-Studio255
  address         192.168.107.249
  hostgroups      copiers-toshiba
  parents         SW-TX-Core
}

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Create a service for "pinging" the printer occassionally.  Useful for monitoring RTA, packet loss, etc.

define service{
  use                     copier-service
  host_name               TE-COPIER-01,TE-COPIER-02
  service_description     PING
  check_command           check_ping!3000.0,80%!5000.0,100%
}

define service{
  use                     copier-service
  host_name               TE-COPIER-01,TE-COPIER-02
  service_description     Contact
  check_command           check_snmp!-C public -o sysContact.0
}

define service{
  use                     copier-service
  host_name               TE-COPIER-01,TE-COPIER-02
  service_description     Location
  check_command           check_snmp!-C public -o sysLocation.0
}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Managing User Accounts

Post: # 768Post LHammonds
Wed Sep 18, 2019 9:29 am

Managing User Accounts

It is recommended to replace the nagiosadmin with a different account and here is how you do it.

We are going to add 3 administrators with the same level of access as the nagiosadmin.

ID / Password: lhammonds / abc123
ID / Password: ddiggler / jigglier69
ID / Password: jdoe / jlow9876
  1. Login with your administrator account.
  2. Type the following commands to add the users to the web interface (this will also update passwords of existing users):

    Code: Select all

    sudo htpasswd /etc/nagios/htpasswd.users lhammonds
    abc123
    sudo htpasswd /etc/nagios/htpasswd.users ddiggler
    jiggler69
    sudo htpasswd /etc/nagios/htpasswd.users jdoe
    jlow9876
  3. Edit cgi.cfg

    Code: Select all

    sudo vi /etc/nagios/cgi.cfg
    Search/replace "nagiosadmin" with "lhammonds,ddiggler,jdoe"
    For example, in VI, you type:

    Code: Select all

    :%s/nagiosadmin/lhammonds,ddiggler,jdoe/g
  4. Restart the apache service:

    Code: Select all

    sudo systemctl restart apache2
  5. Now open a web browser and go to http://192.168.107.21/nagios and see if you can login with your new accounts. NOTE: There is no logout option, you will need to close the browser and re-open it to test different accounts.
  6. Once you have verified your accounts work, you can safely delete the nagiosadmin account by typing the following:

    Code: Select all

    sudo htpasswd -D /etc/nagios/htpasswd.users nagiosadmin
To fine-tune user accounts, you can add or remove them from the following permission branches in /etc/nagios/cgi.cfg

Code: Select all

authorized_for_system_information
authorized_for_configuration_information
authorized_for_system_commands
authorized_for_all_services
authorized_for_all_hosts
authorized_for_all_service_commands
authorized_for_all_host_commands

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Monitoring Remote Windows Servers

Post: # 769Post LHammonds
Wed Sep 18, 2019 9:30 am

Monitoring Remote Windows Servers

Monitoring Windows Servers and Workstations will requiring installing a service if you need data better than a simple ping.

For this, we will be using NSClient++. In particular, we will be downloading the Win32 and x64 "zip" files for version 0.5.2.35.

The reason why I chose ZIP files instead of the MSI files is that it easier to configure and rollout (to me). You can use the MSI files and configure those if that is what you are comfortable with but the process will be quite a bit different but the same functionality would to be handled (installation, configuration, firewall rules).

Configure Install Repository

For this example, we will be using a network share to store the software and configurations. We will refer to this share as X: drive but it could be a direct UNC path as well such as \\ServerNameOrIP\ShareName\

I will assume the target location where NSClient will reside on each server/workstation to monitor will be C:\NSClient\ but if you wish it to be a different folder, make sure you change all references to this location in the batch and .ini files.

Extract the Win32 ZIP file to X:\NSClient\nsclient-32bit\
Extract the Win64 ZIP file to X:\NSClient\nsclient-64bit\

Create a new file: X:\NSClient\boot.ini

Copy/paste the following into the file:

Code: Select all

[settings]
1=c:\nsclient\nsclient.ini
Create a new file: X:\NSClient\nsclient.ini

Copy/paste the following into the file:

Code: Select all

[/modules]
; A list of modules.
; CheckDisk - CheckDisk can check various file and disk related things.
CheckDisk=enabled
; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer=enabled
; NSClientServer - A server that listens for incoming check_nt connection and processes incoming requests.
NSClientServer=enabled
CheckLogFile=enabled
CheckEventLog=enabled
CheckExternalScripts=enabled
CheckHelpers=enabled
CheckNSCP=enabled
CheckSystem=enabled
CheckWMI=enabled

[/settings/default]
; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts=127.0.0.1,192.168.107.21
; PASSWORD - Password used to authenticate against server
password=my-nsclient-password
; BIND TO - Allows you to bind server to a specific local address. This has to be a dotted ip address not a host name. Leaving this blank will bind to all available IP addresses.
;bind to=127.0.0.1
; CACHE ALLOWED HOSTS - If host names (DNS entries) should be cached, improves speed and security somewhat but won't allow you to have dynamic IPs for your Nagios server.
cache allowed hosts=true
; INBOX - The default channel to post incoming messages on
inbox=inbox
; SOCKET QUEUE SIZE - Number of sockets to queue before starting to refuse new incoming connections.
socket queue size=0
thread pool=10
; TIMEOUT - Timeout when reading packets on incoming sockets. If the data has not arrived within this time we will bail out.
timeout=90

[/settings/NSClient/server]
; ALLOWED CIPHERS - A better value is: ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
; allowed ciphers = Old setting is: TLSv1+HIGH:!SSLv2:!aNULL:!eNULL:!3DES:@STRENGTH
; allowed ciphers = ADH
allowed ciphers=ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
; BIND TO - Allows you to bind server to a specific local address. This has to be a dotted ip address not a host name. Leaving this blank will bind to all available IP addresses.
;bind to=127.0.0.1
; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
;allowed hosts=127.0.0.1,192.168.107.21
ca=${certificate-path}/ca.pem
; CACHE ALLOWED HOSTS - (should use setting in default section)
;cache allowed hosts=true
certificate=${certificate-path}/certificate.pem
certificate format=PEM
# SSL CERTIFICATE
;certificate key=
;dh=${certificate-path}/nrpe_dh_512.pem
; PASSWORD - Password used to authenticate against server (should use setting in default section)
;password=my-nsclient-password
; PERFORMANCE DATA - Send performance data back to Nagios (set this to 0 to remove all performance data).
performance data=true
; PORT NUMBER - Port to use for check_nt.
port=12489
socket queue size=0
; THREAD POOL - (should use setting in default section)
;thread pool=10
timeout=30
; ENABLE SSL ENCRYPTION - This option controls if SSL should be enabled.
use ssl=true
; VERIFY MODE - Comma separated list of verification flags to set on the SSL socket.
verify mode=none

[/settings/NRPE/server]
; COMMAND ARGUMENT PROCESSING - This option determines whether or not the we will allow clients to specify arguments to commands that are executed.
allow arguments=true
; COMMAND ALLOW NASTY META CHARS - This option determines whether or not the we will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow nasty characters=false
; ALLOWED CIPHERS - The chipers which are allowed to be used. The default here will differ is used in “insecure” mode or not.
allowed ciphers=ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
; ALLOWED HOSTS - (should use setting in default section)
;allowed hosts=127.0.0.1,192.168.107.21
;ca=${certificate-path}/ca.pem
; CACHE ALLOWED HOSTS - (should use setting in default section)
;cache allowed hosts=true
;certificate=${certificate-path}/certificate.pem
;certificate format=PEM
;dh=${certificate-path}/nrpe_dh_512.pem
; EXTENDED RESPONSE - Send more then 1 return packet to allow response to go beyond payload size (requires modified client if legacy is true this defaults to false).
extended response=true
; ALLOW INSECURE CHIPHERS and ENCRYPTION - Only enable this if you are using legacy check_nrpe client.
;insecure=false
; PAYLOAD LENGTH - Length of payload to/from the NRPE agent. This is a hard specific value and would require a recompile to change.
payload length=1024
; PERFORMANCE DATA - Send performance data back to nagios.
performance data=true
; PORT NUMBER - Port to use for NRPE.
port=5666
; SOCKET QUEUE SIZE - (should use setting in default section)
;socket queue size=0
; THREAD POOL - (should use setting in default section)
;thread pool=10
; TIMEOUT - (should use setting in default section)
;timeout=30
; ENABLE SSL ENCRYPTION - This option controls if SSL should be enabled.
use ssl=true
; VERIFY MODE - Comma separated list of verification flags to set on the SSL socket.
verify mode=none

[/settings/NRPE/client]
; Section for NRPE active/passive check module.
; CHANNEL - The channel to listen to.
channel=NRPE

[/settings/NRPE/client/targets/default]
; Target definition for: default
; TARGET ADDRESS - Target host address
;address=
; PAYLOAD LENGTH - Length of payload to/from the NRPE agent. This is a hard specific value so you have to "configure" (read recompile) your NRPE agent to use the same value for it to work.
payload length=1024
retries=3
; TIMEOUT - Timeout when reading/writing packets to/from sockets.
timeout=180
; allow old “legacy” check_nrpe connect to NSClient++ requited to enable the insecure mode via:
; VERIFY MODE - 
verify mode=none

[/settings/shared session]

[/settings/log/file]
max size = 2048000

[/settings/log]
; FILENAME - The file to write log data to. Set this to none to disable log to file.
file name=${exe-path}/nsclient.log

; DATEMASK - The size of the buffer to use when getting messages this affects the speed and maximum size of messages you can recieve.
date format=%Y-%m-%d %H:%M:%S

; LOG LEVEL - Log level to use. Available levels are error,warning,info,debug,trace
level=info

[/settings/crash]
; SUBMISSION URL - The url to submit crash reports to
submit url=https://crash.nsclient.org/post

; RESTART SERVICE NAME - The url to submit crash reports to
restart target=NSCP

; CRASH ARCHIVE LOCATION - The folder to archive crash dumps in
archive folder=${crash-folder}

; ARCHIVE CRASHREPORTS - Archive crash reports in the archive folder
archive=true

; RESTART - Submit crash reports to nsclient.org (or your configured submission server)
restart=true

[/paths]
; Path for shared-path - 
shared-path=C:\nsclient

; Path for module-path - 
module-path=${shared-path}/modules

; Path for exe-path - 
exe-path=C:\nsclient

; Path for crash-folder - 
crash-folder=${shared-path}/crash-dumps

; Path for certificate-path - 
certificate-path=${shared-path}/security

[/settings/external scripts/scripts]
timeout=300
;** Old VBS check ** CheckWSUS=cscript.exe //T:160 //NoLogo "scripts\nm-check-available-updates.vbs"
CheckWSUS=cmd /c echo scripts\check_windows_updates.ps1; exit($LastExitCode) | powershell.exe -command -
CheckReboot=cscript.exe //T:60 //NoLogo "scripts\nm-check-reboot-status.vbs"
CheckUpdates=check_updates.vbs

; Files to be included in the configuration
[/includes]
Reference Documentation

On line 19 of nsclient.ini in the [/settings/default] section, set the IP of the Nagios server (192.168.107.21) to limit access to just that host. For example:

Code: Select all

allowed_hosts=127.0.0.1,192.168.107.21
On line 21 of nsclient.ini in the [/settings/default] section, set the password that will be required to access the remote functions.

Code: Select all

password=my-nsclient-password
On the Nagios server, you will need to match this password in your resource file which will then be referenced in your server config file.
/etc/nagios/resources.cfg

Code: Select all

$USER5$=my-nsclient-password
On line 56 of nsclient.ini in the [/settings/NSClient/server] section, set the port number that will be used for communication with Nagios via check_nt. It would be wise to use a port other than the default. This example is using the default port of 12489:

Code: Select all

port=12489
On line 62 of nsclient.ini in the [/settings/NSClient/server] section, enable SSL. For example:

Code: Select all

use ssl=true
On line 90 of nsclient.ini in the [/settings/NRPE/server] section, set the port number that will be used for communication with Nagios via check_nrpe. It would be wise to use a port other than the default. This example is using the default port of 5666:

Code: Select all

port=5666
On line 98 of nsclient.ini in the [/settings/NRPE/server] section, enable SSL. For example:

Code: Select all

use ssl=true
On line 172 of nsclient.ini in the [/settings/external scripts/scripts] section, enable the custom checks you want to use. For example:

Code: Select all

CheckWSUS=cmd /c echo scripts\check_windows_updates.ps1; exit($LastExitCode) | powershell.exe -command -
CheckReboot=cscript.exe //T:60 //NoLogo "scripts\nm-check-reboot-status.vbs"
CheckUpdates=check_updates.vbs
To make rolling this out a snap, create these batch files to install NSClient and to manipulate service directly after install.

X:\NSClient\service-install.bat

Code: Select all

@ECHO OFF
ECHO Installing the NSClient service...
nscp.exe service --install --name NSClientpp
ECHO Starting the NSClient service...
nscp.exe service --start --name NSClientpp
PAUSE
X:\NSClient\service-uninstall.bat

Code: Select all

@ECHO OFF
ECHO Stopping the NSClient service...
nscp.exe service --stop --name NSClientpp
ECHO Uninstalling the NSClient service...
nscp.exe service --uninstall --name NSClientpp
PAUSE
Be sure to update the SOURCEPATH and NAGIOSIP variables at the beginning of the following batch file.

X:\NSClient\install.bat

Code: Select all

@ECHO OFF
REM ** SourcePath can also be a UNC path such as \\ServerNameIP\ShareName **
SET SOURCEPATH=X:\NSClient
SET TARGETPATH=C:\NSClient
SET NAGIOSIP=192.168.107.21
TITLE Installing Nagios NSCP...
ECHO.
ECHO NOTE: NSClient requires Visual C++ Redist 2012 Update4
ECHO.
PAUSE
MKDIR %TARGETPATH%
REM
REM Check 32/64bit
REM 
IF %PROCESSOR_ARCHITECTURE% == AMD64 GOTO 64BIT
GOTO 32BIT

:64BIT
ECHO Copying client files...
XCOPY %SOURCEPATH%\nsclient-64bit\*.* %TARGETPATH%\ /E /V /Q /Y
ECHO Setting PowerShell 32-bit/64-bit to allow Remote Scripts to execute:
%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe "Set-ExecutionPolicy RemoteSigned"
%SystemRoot%\SysWOW64\WindowsPowerShell\v1.0\powershell.exe "Set-ExecutionPolicy RemoteSigned"
GOTO COMMON

:32BIT
ECHO Copying client files...
XCOPY %SOURCEPATH%\nsclient-32bit\*.* %TARGETPATH%\ /E /V /Q /Y
ECHO Setting PowerShell 32-bit to allow Remote Scripts to execute:
%SystemRoot%\SysWOW64\WindowsPowerShell\v1.0\powershell.exe "Set-ExecutionPolicy RemoteSigned"
GOTO COMMON

:COMMON
REM Copy scripts
XCOPY %SOURCEPATH%\*.vbs %TARGETPATH%\scripts\
XCOPY %SOURCEPATH%\*.ps1 %TARGETPATH%\scripts\
XCOPY %SOURCEPATH%\service*.bat %TARGETPATH%\
REM Copy configurations
XCOPY %SOURCEPATH%\*.ini %TARGETPATH%\

REM Docs - https://technet.microsoft.com/en-us/library/dd734783%28v=ws.10%29.aspx
REM
REM Open the firewall for NSCP Agent
REM
netsh advfirewall firewall add rule name="Nagios 12489 TCP" dir=in action=allow localport=12489 remoteport=any protocol=tcp remoteip=%NAGIOSIP% profile=Domain,Public,Private
netsh advfirewall firewall add rule name="Nagios 12489 UDP" dir=in action=allow localport=12489 remoteport=any protocol=udp remoteip=%NAGIOSIP% profile=Domain,Public,Private
netsh advfirewall firewall add rule name="Nagios 5666 TCP" dir=in action=allow localport=5666 remoteport=any protocol=tcp remoteip=%NAGIOSIP% profile=Domain,Public,Private

REM
REM Install the service and start it
REM
ECHO Installing NSCP service...
%TARGETPATH%\nscp.exe service --install --name NSClientpp
ECHO Starting NSCP service...
%TARGETPATH%\nscp.exe service --start --name NSClientpp
ECHO.
pause
Go to each Windows host you want to monitor and run X:\NSClient\Install.bat (Run as Administrator) which will copy the correct architecture folder to C:\NSClient and then install/start the service and add the correct firewall rules.

The install batch file will attempt to add firewall rules to the Windows firewall as follows (but may fail if you are using a different software firewall or a newer version of the operating system that does not use the same command-line utility.

Inbound Rule Name: Nagios 12489 TCP
- Check: Enabled
- Action: Allow the connection
- Protocol Type: TCP
- Local Port: 12489
- Remote Port: All Ports
- Profile: Domain
- Local IP address: Any IP address
- Remote IP address: These IP addresses: 192.168.107.21

Inbound Rule Name: Nagios 12489 UDP
- Check: Enabled
- Action: Allow the connection
- Protocol Type: UDP
- Local Port: 12489
- Remote Port: All Ports
- Profile: Domain
- Local IP address: Any IP address
- Remote IP address: These IP addresses: 192.168.107.21

Inbound Rule Name: Nagios 5666 TCP
- Check: Enabled
- Action: Allow the connection
- Protocol Type: TCP
- Local Port: 5666
- Remote Port: All Ports
- Profile: Domain
- Local IP address: Any IP address
- Remote IP address: These IP addresses: 192.168.107.21

On the Nagios server, create or copy a Windows config file and make appropriate changes such as server name and IP. See the Sample Windows config file posted earlier in the thread.

The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

sudo systemctl stop nagios
sudo systemctl start nagios
NOTE: The Win32 version will work on 64-bit servers. The only problem is if you need to check for the existence of running 64-bit processes such as Explorer.exe or Notepad.exe. The Win32 client cannot properly detect 64-bit programs.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Monitoring Remote Linux Servers

Post: # 770Post LHammonds
Wed Sep 18, 2019 9:31 am

Monitoring Remote Linux Servers

Since there are other Linux boxes that need to be monitored, the NRPE plugin and NRPE service will be installed on each Linux box.

Setup the remote Linux server to be monitored:

Create the Nagios user and group:

Code: Select all

groupadd --system --gid 9000 nagios
adduser --system --gid 9000 --home /usr/local/nagios nagios
chown nagios:nagios /usr/local/nagios
chmod 0755 /usr/local/nagios
Install Nagios standard and NRPE plugins. Rather and compiling from source, we will just use what comes with the repository.

Code: Select all

apt -y install nagios-plugins nagios-nrpe-server
Make a backup of the NRPE configuration files before modifying them:

Code: Select all

cp /etc/nagios/nrpe.cfg /etc/nagios/nrpe.cfg.bak
cp /etc/nagios/nrpe_local.cfg /etc/nagios/nrpe_local.cfg.bak
Edit the local configuration:

Code: Select all

vi /etc/nagios/nrpe_local.cfg
Add the IP of your Nagios server (192.168.107.21) to the "allowed_hosts" line and list only the plugins that will be used:

Code: Select all

allowed_hosts=192.168.107.21,127.0.0.1
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk_app]=/usr/lib/nagios/plugins/check_disk -p /var -w 20% -c 10%
command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -p / -w 20% -c 10%
command[check_disk_all]=/usr/lib/nagios/plugins/check_disk -w 15% -c 10%
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 200 -c 240
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 15% -c 10%
command[check_apt]=/usr/lib/nagios/plugins/check_apt
TIP: if you define separate disk checks like the above, you can assign different notifications. For example, you could have the Linux administrator get email notification when the root partition reaches the warning threshold (during business hours) and send an alert to his pager (at any time of the day) if the root partition reaches critical. The application manager could get a different notice for /var notices such as both warnings and criticals going to through SMS to his phone at any time of the day.

Check the status of the NRPE server:

Code: Select all

systemctl status nagios-nrpe-server
If the NRPE server is not running, this is how you can start it:

Code: Select all

sudo systemctl start nagios-nrpe-server
If the NRPE server was already running and you made configuration changes, use this command to load the new changes:

Code: Select all

sudo systemctl reload nagios-nrpe-server
Now see if your configured commands will run on your server (before trying to test them remotely on the Nagios server)

Code: Select all

/usr/lib/nagios/plugins/check_users -w 5 -c 10
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
/usr/lib/nagios/plugins/check_disk -w 15% -c 10%
/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
/usr/lib/nagios/plugins/check_procs -w 200 -c 240
/usr/lib/nagios/plugins/check_swap -w 15% -c 10%
/usr/lib/nagios/plugins/check_apt
Test Connectivity of NRPE Plugin

Test the connectivity of the NRPE service on your server to be monitored by trying to access the server via telnet using the NRPE port number.

If we installed the NRPE server on a machine with the address of 192.168.107.20, type the following at the console of your Nagios server:

Code: Select all

telnet 192.168.107.20 5666
If you get a response of Escape character is '^]'., then you have a good connection. Type exit to close the connection.

If the command fails with a timeout, you might need to add firewall rules to the remote server:

UFW (Uncomplicated Firewall)

Code: Select all

ufw allow proto tcp to any port 5666 comment 'Nagios Server'
IPTABLES

Code: Select all

iptables -A INPUT -p tcp  --dport 5666 -j ACCEPT
iptables -A OUTPUT -p tcp  --dport 5666 -j ACCEPT
service iptables save
Now try executing some of the commands you have configured on your remote Linux server 192.168.107.20 (that stuff in the nrpe_local.cfg file)

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_users
/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_load
/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_disk_all
/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_zombie_procs
/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_total_procs
/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_apt
If it all looks good, you can then use commands in a server configuration file. See the sample configurations posted earlier.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Monitoring MariaDB/MySQL Server

Post: # 771Post LHammonds
Tue Sep 24, 2019 4:22 pm

Monitoring MariaDB/MySQL Server

The script will be executed on the remote Linux server so we will be making use of NRPE.

On the remote MariaDB/MySQL server, install the Nagios plugins, NRPE server and NRPE plugin as mentioned earlier for remote Linux servers.

An extra step to allow the check_mysql plugin to work is to grant the nagios user access to a database. Rather than granting access to an existing database (for security reasons), let's create an empty database just for Nagios.

Type the following commands to create a nagios database, nagios user and read-only access to just the empty Nagios database:

Code: Select all

mysql
CREATE DATABASE nagiosdb;
CREATE USER 'nagiosuser'@'%' IDENTIFIED BY 'nagiosuserpass';
GRANT SELECT ON nagiosdb.* TO 'nagiosuser'@'%';
FLUSH PRIVILEGES;
exit
Now see if the command will run on your server (before trying to test them remotely on the Nagios server)

Code: Select all

/usr/lib/nagios/plugins/check_mysql -w 20 -c 10 -d nagiosdb -u nagiosuser -p nagiosuserpass
Add the plugin to the trusted NRPE commands to be executed.

Code: Select all

vi /etc/nagios/nrpe_local.cfg

Code: Select all

command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -w 20 -c 10 -d nagiosdb -u nagiosuser -p nagiosuserpass
Even though we are using a low-acces and read-only ID, the password is exposed in the config file so make sure the file ownership and permissions are set accordingly:

Code: Select all

chown root:nagios /etc/nagios/nrpe_local.cfg
chmod 0640 /etc/nagios/nrpe_local.cfg
The NRPE Server now needs to reload the configuration for the changes to take affect.

Code: Select all

sudo systemctl reload nagios-nrpe-server
On the Nagios server, see if the command will successfully connect to the remote server:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -c check_mysql
On the Nagios server, add the following command to the remote MariaDB/MySQL Linux server's configuration file:

/etc/nagios/servers/srv-mariadb.cfg

Code: Select all

define service{
       use                             generic-service
       host_name                       srv-mariadb
       service_description             Server Health
       check_command                   check_nrpe!check_mysql
       }
The final step is to verify that nothing is broken in the configuration files:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

sudo systemctl restart nagios

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Plugin - Check HTTPS

Post: # 772Post LHammonds
Tue Sep 24, 2019 4:22 pm

Custom Plugin - Check HTTPS

On one of my Linux servers, I have a web mail service that I wanted to keep an eye on. However, the check_http did not work because the server only uses SSL (HTTPS) on port 443. I did not see a check_https command so I tried my hand at making one and it works like a champ.

Here is how I made and implemented the custom HTTPS checking function.

The first thing was to create a script that would communicate to the server. We already have WGET installed as one of the prerequisite programs so I used that program. Here is what the script looks like:

Code: Select all

touch /usr/local/nagios/libexec/check_https
chown nagios:nagios /usr/local/nagios/libexec/check_https
chmod 0755 /usr/local/nagios/libexec/check_https
/usr/local/nagios/libexec/check_https

Code: Select all

#!/bin/bash
###########################################
## Name         : check_https
## Version      : 1.0
## Date         : 2012-01-03
## Author       : LHammonds
## Purpose      : Check for response from HTTPS server
## Requirements : WGET
## Parameters   :
##    1 = Server IP Address (Required)
##    2 = Port Number (Optional)
## Exit Codes   :
##    0 = Success
##    1 = Failure
##    2 = Error, missing required parameter
###########################################
OUTFILE="/tmp/check_https_out.$$"
ERRFILE="/tmp/check_https_err.$$"
WGETCMD="$(which wget)"

## Do basic check on arguments passed to the script.
if [ "$1" = "" ]; then
  echo "Missing required parameter"
  exit 2
fi
if [ "$2" = "" ]; then
  ## Assume default port.
  SSLPORT="443"
else
  SSLPORT=$2
fi
${WGETCMD} --no-check-certificate --output-document=${OUTFILE} -S https://$1:${SSLPORT} 2> ${ERRFILE}
RETURNVALUE=$?
if [ ${RETURNVALUE} -eq 0 ];  then
  echo "HTTPS OK"
  EXITCODE=0
else
  echo "Connection refused. Code=${RETURNVALUE}"
  EXITCODE=1
fi
if [ -f ${OUTFILE} ]; then
  rm ${OUTFILE}
fi
if [ -f ${ERRFILE} ]; then
  rm ${ERRFILE}
fi
exit ${EXITCODE}
To test it out, run the command against a server running HTTPS and then against a server not running HTTPS. Example:

Code: Select all

/usr/local/nagios/libexec/check_https 192.168.107.25 443
Next, we add this script to the commands file.

Code: Select all

vi /etc/local/nagios/etc/objects/commands.cfg

Find the existing "check_http" command and you basically just copy the definition and add "s" to the end of http and remove the "-I" option.

Find this:

Code: Select all

define command{
        command_name     check_http
        command_line     $USER1$/check_http -I $HOSTADDRESS $ARG1$
        }
Copy and change to this:

Code: Select all

define command{
       command_name     check_https
       command_line     $USER1$/check_https $HOSTADDRESS 443
       }
Now we can add a service to monitor HTTPS by adding the following to the server configuration file:

Code: Select all

define service{
  use                     generic-service
  host_name               srv-securewebserver
  service_description     web mail server
  check_command           check_https
}

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Plugin - Check APT MotD

Post: # 773Post LHammonds
Tue Sep 24, 2019 4:23 pm

Custom Plugin - Check APT MotD

Reference: Original source

This plugin is a bit different from the built-in APT check for Linux servers. This plugin was designed to give the same kind of messages that you get when you login to an Ubuntu console.

One thing this script will catch that the built-in APT will not is the "reboot required" state of the server.

The script will be executed on the remote Linux server so we will be making use of NRPE.

On the remote Linux server, create the script:

Code: Select all

touch /usr/lib/nagios/plugins/check_apt_motd.sh
chown root:root /usr/lib/nagios/plugins/check_apt_motd.sh
chmod 0755 /usr/lib/nagios/plugins/check_apt_motd.sh
vi /usr/lib/nagios/plugins/check_apt_motd.sh
/usr/lib/nagios/plugins/check_apt_motd.sh

Code: Select all

#!/bin/sh
#
# check_apt_packages - nagios plugin
#
# Checks for any packages to be applied
# Built for Ubuntu 10 (LTS), see following URL for further info
# - http://www.sandfordit.com/vwiki/index.php/Nagios#Ubuntu_Software_Updates_Monitor
#
# By Simon Strutt
# Version 1 - Jan 2012

# Include standard Nagios library
. /usr/lib/nagios/plugins/utils.sh || exit 3

if [ ! -f /usr/lib/update-notifier/apt-check ]; then
        exit $STATE_UNKNOWN
fi

APTRES=$(/usr/lib/update-notifier/apt-check 2>&1)
PKGS=$(echo $APTRES | cut -f1 -d';')
SEC=$(echo $APTRES | cut -f2 -d';')

if [ -f /var/run/reboot-required ]; then
        REBOOT=1
        TOAPPLY=`cat /var/run/reboot-required.pkgs`
else
        REBOOT=0
fi

if [ "${PKGS}" -eq 0 ]; then
        if [ "${REBOOT}" -eq 1 ]; then
                RET=$STATE_WARNING
                RESULT="Reboot required to apply ${TOAPPLY}"
        else
                RET=$STATE_OK
                RESULT="No packages to be updated"
        fi
elif [ "${SEC}" -eq 0 ]; then
        RET=$STATE_WARNING
        RESULT="${PKGS} packages to update (no security updates)"
else
        RET=$STATE_CRITICAL
        RESULT="${PKGS} packages (including ${SEC} security) packages to update"
fi

echo $RESULT
exit $RET
Test the script to see if it is working:

Code: Select all

/usr/lib/nagios/plugins/check_apt_motd.sh
The output should look something like one of these:

Code: Select all

Reboot required to apply libssl0.9.8
or

Code: Select all

1 packages to update (no security updates)
or

Code: Select all

No packages to be updated
Add the script to the trusted NRPE commands to be executed.

Code: Select all

sudo vi /etc/nagios/nrpe_local.cfg

Code: Select all

command[check_apt_motd]=/usr/lib/nagios/plugins/check_apt_motd.sh
The NRPE Server now needs to reload the configuration for the changes to take affect.

Code: Select all

sudo systemctl reload nagios-nrpe-server
On the Nagios server, add the following command to the remote Linux server's configuration file:

/etc/nagios/servers/srv-wiki.cfg

Code: Select all

define service{
       use                             generic-service
       host_name                       srv-wiki
       service_description             APT Upgrade MotD
       check_command                   check_apt_motd
       }
The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

sudo systemctl restart nagios

User avatar
LHammonds
Site Admin
Site Admin
Posts: 764
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Plugin - Check ESXi Hardware

Post: # 774Post LHammonds
Tue Sep 24, 2019 4:24 pm

Custom Plugin - Check ESXi Hardware

Reference: Original source

I use this custom script to check the health of my ESXi servers. It is run directly from the Nagios server.

This script requires the PyWBEM Python library. Here is how to install it:

Code: Select all

apt-get -y install python-pywbem
You then need to add a command to call the script. Edit /etc/nagios/objects/commands.cfg and add the following:

Code: Select all

# 'check_esxi_hardware' command definition

define command{
      command_name    check_esxi_hardware
      command_line    $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$ $ARG4$
      }
To access the ESXi data, you will need to supply and ID/password. The password can be placed in the "resources.cfg" file but let's make sure it is secured first.

Code: Select all

chmod 0600 /etc/nagios/resources.cfg
chown nagios:nagios /etc/nagios/resources.cfg
Edit /etc/nagios/resources.cfg and add the following:

Code: Select all

# Password to access ESXi servers.
$USER6$=your-esxi-password-here
To add this command to an ESXi configuration file, add the following to its config file:

/etc/nagios/servers/srv-esxi1.cfg

Code: Select all

define service{
      use                             generic-service
      host_name                       srv-esxi1
      service_description             Server Health
      check_command                   check_esxi_hardware!your-esxi-userid-here!$USER6$!ibm
      }
Now it is time to create the script:

Code: Select all

touch /usr/local/nagios/libexec/check_esxi_hardware.py
chown nagios:nagios /usr/local/nagios/libexec/check_esxi_hardware.py
chmod 0755 /usr/local/nagios/libexec/check_esxi_hardware.py
vi /usr/local/nagios/libexec/check_esxi_hardware.py
/usr/local/nagios/libexec/check_esxi_hardware.py

Code: Select all

#!/usr/bin/python
# -*- coding: UTF-8 -*-
#
# Script for checking global health of host running VMware ESX/ESXi
#
# Licence : GNU General Public Licence (GPL) http://www.gnu.org/
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301, USA.
#
# Pre-req : pywbem
#
# Copyright (c) 2008 David Ligeret
# Copyright (c) 2009 Joshua Daniel Franklin
# Copyright (c) 2010 Branden Schneider
# Copyright (c) 2010-2012 Claudio Kuenzler
# Copyright (c) 2010 Samir Ibradzic
# Copyright (c) 2010 Aaron Rogers
# Copyright (c) 2011 Ludovic Hutin
# Copyright (c) 2011 Carsten Schoene
# Copyright (c) 2011-2012 Phil Randal
# Copyright (c) 2011 Fredrik Aslund
# Copyright (c) 2011 Bertrand Jomin
# Copyright (c) 2011 Ian Chard
# Copyright (c) 2012 Craig Hart
#
# The VMware 4.1 CIM API is documented here:
#
#   http://www.vmware.com/support/developer/cim-sdk/4.1/smash/cim_smash_410_prog.pdf
#
#   http://www.vmware.com/support/developer/cim-sdk/smash/u2/ga/apirefdoc/
#
# This Nagios plugin is maintained here:
# http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php
#
#@---------------------------------------------------
#@ History
#@---------------------------------------------------
#@ Date   : 20080820
#@ Author : David Ligeret
#@ Reason : Initial release
#@---------------------------------------------------
#@ Date   : 20080821
#@ Author : David Ligeret
#@ Reason : Add verbose mode
#@---------------------------------------------------
#@ Date   : 20090219
#@ Author : Joshua Daniel Franklin
#@ Reason : Add try/except to catch AuthError and CIMError
#@---------------------------------------------------
#@ Date   : 20100202
#@ Author : Branden Schneider
#@ Reason : Added HP Support (HealthState)
#@---------------------------------------------------
#@ Date   : 20100512
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Combined different versions (Joshua and Branden)
#@ Reason : Added hardware type switch (dell or hp)
#@---------------------------------------------------
#@ Date   : 20100626/28
#@ Author : Samir Ibradzic www.brastel.com
#@ Reason : Added basic server info
#@ Reason : Wanted to have server name, serial number & bios version at output
#@ Reason : Set default return status to Unknown
#@---------------------------------------------------
#@ Date   : 20100702
#@ Author : Aaron Rogers www.cloudmark.com
#@ Reason : GlobalStatus was incorrectly getting (re)set to OK with every CIM element check
#@---------------------------------------------------
#@ Date   : 20100705
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Due to change 20100702 all Dell servers would return UNKNOWN instead of OK...
#@ Reason : ... so added Aaron's logic at the end of the Dell checks as well
#@---------------------------------------------------
#@ Date   : 20101028
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Changed text in Usage and Example so people dont forget to use https://
#@---------------------------------------------------
#@ Date   : 20110110
#@ Author : Ludovic Hutin (Idea and Coding) / Claudio Kuenzler (Bugfix)
#@ Reason : If Dell Blade Servers are used, Serial Number of Chassis was returned
#@---------------------------------------------------
#@ Date   : 20110207
#@ Author : Carsten Schoene carsten.schoene.cc
#@ Reason : Bugfix for Intel systems (in this case Intel SE7520) - use 'intel' as system type
#@---------------------------------------------------
#@ Date   : 20110215
#@ Author : Ludovic Hutin
#@ Reason : Plugin now catches Socket Error (Timeout Error) and added a timeout parameter
#@---------------------------------------------------
#@ Date   : 20110217/18
#@ Author : Ludovic Hutin / Tom Murphy
#@ Reason : Bugfix in Socket Error if clause
#@---------------------------------------------------
#@ Date   : 20110221
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Remove recently added Timeout due to incompabatility on Windows
#@ Reason : and changed name of plugin to check_esxi_hardware
#@---------------------------------------------------
#@ Date   : 20110426
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Added 'ibm' hardware type (compatible to Dell output). Tested by Keith Erekson.
#@---------------------------------------------------
#@ Date   : 20110426
#@ Author : Phil Randal
#@ Reason : URLise Dell model and tag numbers (as in check_openmanage)
#@ Reason : Return performance data (as in check_openmanage, using similar names where possible)
#@ Reason : Minor code tidyup - use elementName instead of instance['ElementName']
#@---------------------------------------------------
#@ Date   : 20110428
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : If hardware type is specified as 'auto' try to autodetect vendor
#@ Reason : Return performance data for some HP models
#@ Reason : Indent 'verbose' output to make it easier to read
#@ Reason : Use OptionParser to give better parameter parsing (retaining compatability with original)
#@---------------------------------------------------
#@ Date   : 20110503
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Fix bug in HP Virtual Fan percentage output
#@ Reason : Slight code reorganisation
#@ Reason : Sort performance data
#@ Reason : Fix formatting of current output
#@---------------------------------------------------
#@ Date   : 20110504
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Minor code changes and documentation improvements
#@ Reason : Remove redundant mismatched ' character in performance data output
#@ Reason : Output non-integral values for all sensors to fix problem seen with system board voltage sensors
#@          on an IBM server (thanks to Attilio Drei for the sample output)
#@---------------------------------------------------
#@ Date   : 20110505
#@ Author : Fredrik Aslund
#@ Reason : Added possibility to use first line of a file as password (file:)
#@---------------------------------------------------
#@ Date   : 20110505
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Simplfy 'verboseoutput' to use 'verbose' as global variable instead of as parameter
#@ Reason : Don't look at performance data from CIM_NumericSensor if we're not using it
#@ Reason : Add --no-power, --no-volts, --no-current, --no-temp, and --no-fan options
#@---------------------------------------------------
#@ Date   : 20110506
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Reinstate timeouts with --timeout parameter (but not on Windows)
#@ Reason : Allow file:passwordfile in old-style arguments too
#@---------------------------------------------------
#@ Date   : 20110507
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : On error, include numeric sensor value in output
#@---------------------------------------------------
#@ Date   : 20110520
#@ Author : Bertrand Jomin
#@ Reason : Plugin had problems to handle some S/N from IBM Blade Servers
#@---------------------------------------------------
#@ Date   : 20110614
#@ Author : Claudio Kuenzler (www.claudiokuenzler.com)
#@ Reason : Rewrote file handling and file can now be used for user AND password
#@---------------------------------------------------
#@ Date   : 20111003
#@ Author : Ian Chard (ian@chard.org)
#@ Reason : Allow a list of unwanted elements to be specified, which is useful
#@          in cases where hardware isn't well supported by ESXi
#@---------------------------------------------------
#@ Date   : 20120402
#@ Author : Claudio Kuenzler (www.claudiokuenzler.com)
#@ Reason : Making plugin GPL compatible (Copyright) and preparing for OpenBSD port
#@---------------------------------------------------
#@ Date   : 20120405
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Fix lookup of warranty info for Dell
#@---------------------------------------------------
#@ Date   : 20120501
#@ Author : Craig Hart
#@ Reason : Bugfix in manufacturer discovery when cim entry not found or empty
#@---------------------------------------------------


import sys
import time
import pywbem
import re
import string
from optparse import OptionParser,OptionGroup

version = '20120501'

NS = 'root/cimv2'

# define classes to check 'OperationStatus' instance
ClassesToCheck = [
  'OMC_SMASHFirmwareIdentity',
  'CIM_Chassis',
  'CIM_Card',
  'CIM_ComputerSystem',
  'CIM_NumericSensor',
  'CIM_Memory',
  'CIM_Processor',
  'CIM_RecordLog',
  'OMC_DiscreteSensor',
  'OMC_Fan',
  'OMC_PowerSupply',
  'VMware_StorageExtent',
  'VMware_Controller',
  'VMware_StorageVolume',
  'VMware_Battery',
  'VMware_SASSATAPort'
]

sensor_Type = {
  0:'unknown',
  1:'Other',
  2:'Temperature',
  3:'Voltage',
  4:'Current',
  5:'Tachometer',
  6:'Counter',
  7:'Switch',
  8:'Lock',
  9:'Humidity',
  10:'Smoke Detection',
  11:'Presence',
  12:'Air Flow',
  13:'Power Consumption',
  14:'Power Production',
  15:'Pressure',
  16:'Intrusion',
  32768:'DMTF Reserved',
  65535:'Vendor Reserved'
}

data = []

perf_Prefix = {
  1:'Pow',
  2:'Vol',
  3:'Cur',
  4:'Tem',
  5:'Fan',
  6:'FanP'
}


# parameters

# host name
hostname=''

# user
user=''

# password
password=''

# vendor - possible values are 'unknown', 'auto', 'dell', 'hp', 'ibm', 'intel'
vendor='unknown'

# verbose
verbose=False

# Produce performance data output for nagios
perfdata=False

# timeout
timeout = 0

# elements to ignore (full SEL, broken BIOS, etc)
ignore_list=[]

# urlise model and tag numbers (currently only Dell supported, but the code does the right thing for other vendors)
urlise_country=''

# collect perfdata for each category
get_power   = True
get_volts   = True
get_current = True
get_temp    = True
get_fan     = True

# define exit codes
ExitOK = 0
ExitWarning = 1
ExitCritical = 2
ExitUnknown = 3

def urlised_server_info(vendor, country, server_info):
  #server_inf = server_info
  if vendor == 'dell' :
    # Dell support URLs (idea and tables borrowed from check_openmanage)
    du = 'http://support.dell.com/support/edocs/systems/pe'
    if (server_info is not None) :
      p=re.match('(.*)PowerEdge (.*) (.*)',server_info)
      if (p is not None) :
        md=p.group(2)
        if (re.match('M',md)) :
          md = 'm'
        server_info = p.group(1) + '<a href="' + du + md + '/">PowerEdge ' + p.group(2)+'</a> ' + p.group(3)
  elif vendor == 'hp':
    return server_info
  elif vendor == 'ibm':
    return server_info
  elif vendor == 'intel':
    return server_info

  return server_info

# ----------------------------------------------------------------------

def system_tag_url(vendor,country):
  url = {'xx':''}
  if vendor == 'dell':
    # Dell support sites
    supportsite = 'http://www.dell.com/support/troubleshooting/'
    dellsuffix = 'nodhs1/Index?t=warranty&servicetag='

    # warranty URLs for different country codes
    # EMEA
    url['at'] = supportsite + 'at/de/' + dellsuffix  # Austria
    url['be'] = supportsite + 'be/nl/' + dellsuffix  # Belgium
    url['cz'] = supportsite + 'cz/cs/' + dellsuffix  # Czech Republic
    url['de'] = supportsite + 'de/de/' + dellsuffix  # Germany
    url['dk'] = supportsite + 'dk/da/' + dellsuffix  # Denmark
    url['es'] = supportsite + 'es/es/' + dellsuffix  # Spain
    url['fi'] = supportsite + 'fi/fi/' + dellsuffix  # Finland
    url['fr'] = supportsite + 'fr/fr/' + dellsuffix  # France
    url['gr'] = supportsite + 'gr/en/' + dellsuffix  # Greece
    url['it'] = supportsite + 'it/it/' + dellsuffix  # Italy
    url['il'] = supportsite + 'il/en/' + dellsuffix  # Israel
    url['me'] = supportsite + 'me/en/' + dellsuffix  # Middle East
    url['no'] = supportsite + 'no/no/' + dellsuffix  # Norway
    url['nl'] = supportsite + 'nl/nl/' + dellsuffix  # The Netherlands
    url['pl'] = supportsite + 'pl/pl/' + dellsuffix  # Poland
    url['pt'] = supportsite + 'pt/en/' + dellsuffix  # Portugal
    url['ru'] = supportsite + 'ru/ru/' + dellsuffix  # Russia
    url['se'] = supportsite + 'se/sv/' + dellsuffix  # Sweden
    url['uk'] = supportsite + 'uk/en/' + dellsuffix  # United Kingdom
    url['za'] = supportsite + 'za/en/' + dellsuffix  # South Africa
    # America
    url['br'] = supportsite + 'br/pt/' + dellsuffix  # Brazil
    url['ca'] = supportsite + 'ca/en/' + dellsuffix  # Canada
    url['mx'] = supportsite + 'mx/es/' + dellsuffix  # Mexico
    url['us'] = supportsite + 'us/en/' + dellsuffix  # USA
    # Asia/Pacific
    url['au'] = supportsite + 'au/en/' + dellsuffix  # Australia
    url['cn'] = supportsite + 'cn/zh/' + dellsuffix  # China
    url['in'] = supportsite + 'in/en/' + dellsuffix  # India
    # default fallback
    url['xx'] = supportsite + 'us/en/' + dellsuffix  # default
  # elif vendor == 'hp':
  # elif vendor == 'ibm':
  # elif vendor == 'intel':

  return url.get(country,url['xx'])

# ----------------------------------------------------------------------

def urlised_serialnumber(vendor,country,SerialNumber):
  if SerialNumber is not None :
    tu = system_tag_url(vendor,country)
    if tu != '' :
      SerialNumber = '<a href="' + tu + SerialNumber + '">' + SerialNumber + '</a>'
  return SerialNumber

# ----------------------------------------------------------------------

def verboseoutput(message) :
  if verbose:
    print "%s %s" % (time.strftime("%Y%m%d %H:%M:%S"), message)

# ----------------------------------------------------------------------

def getopts() :
  global hosturl,user,password,vendor,verbose,perfdata,urlise_country,timeout,ignore_list,get_power,get_volts,get_current,get_temp,get_fan
  usage = "usage: %prog  https://hostname user password system [verbose]\n" \
    "example: %prog https://my-shiny-new-vmware-server root fakepassword dell\n\n" \
    "or, using new style options:\n\n" \
    "usage: %prog -H hostname -U username -P password [-V system -v -p -I XX]\n" \
    "example: %prog -H my-shiny-new-vmware-server -U root -P fakepassword -V auto -I uk\n\n" \
    "or, verbosely:\n\n" \
    "usage: %prog --host=hostname --user=username --pass=password [--vendor=system --verbose --perfdata --html=XX]\n"

  parser = OptionParser(usage=usage, version="%prog "+version)
  group1 = OptionGroup(parser, 'Mandatory parameters')
  group2 = OptionGroup(parser, 'Optional parameters')

  group1.add_option("-H", "--host", dest="host", help="report on HOST", metavar="HOST")
  group1.add_option("-U", "--user", dest="user", help="user to connect as", metavar="USER")
  group1.add_option("-P", "--pass", dest="password", \
      help="password, if password matches file:<path>, first line of given file will be used as password", metavar="PASS")

  group2.add_option("-V", "--vendor", dest="vendor", help="Vendor code: auto, dell, hp, ibm, intel, or unknown (default)", \
      metavar="VENDOR", type='choice', choices=['auto','dell','hp','ibm','intel','unknown'],default="unknown")
  group2.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, \
      help="print status messages to stdout (default is to be quiet)")
  group2.add_option("-p", "--perfdata", action="store_true", dest="perfdata", default=False, \
      help="collect performance data for pnp4nagios (default is not to)")
  group2.add_option("-I", "--html", dest="urlise_country", default="", \
      help="generate html links for country XX (default is not to)", metavar="XX")
  group2.add_option("-t", "--timeout", action="store", type="int", dest="timeout", default=0, \
      help="timeout in seconds - no effect on Windows (default = no timeout)")
  group2.add_option("-i", "--ignore", action="store", type="string", dest="ignore", default="", \
      help="comma-separated list of elements to ignore")
  group2.add_option("--no-power", action="store_false", dest="get_power", default=True, \
      help="don't collect power performance data")
  group2.add_option("--no-volts", action="store_false", dest="get_volts", default=True, \
      help="don't collect voltage performance data")
  group2.add_option("--no-current", action="store_false", dest="get_current", default=True, \
      help="don't collect current performance data")
  group2.add_option("--no-temp", action="store_false", dest="get_temp", default=True, \
      help="don't collect temperature performance data")
  group2.add_option("--no-fan", action="store_false", dest="get_fan", default=True, \
      help="don't collect fan performance data")

  parser.add_option_group(group1)
  parser.add_option_group(group2)

  # check input arguments
  if len(sys.argv) < 2:
    print "no parameters specified\n"
    parser.print_help()
    sys.exit(-1)
  # if first argument starts with 'https://' we have old-style parameters, so handle in old way
  if re.match("https://",sys.argv[1]):
    # check input arguments
    if len(sys.argv) < 5:
      print "too few parameters\n"
      parser.print_help()
      sys.exit(-1)
    if len(sys.argv) > 5 :
      if sys.argv[5] == "verbose" :
        verbose = True
    hosturl = sys.argv[1]
    user = sys.argv[2]
    password = sys.argv[3]
    vendor = sys.argv[4]
  else:
    # we're dealing with new-style parameters, so go get them!
    (options, args) = parser.parse_args()

    # Making sure all mandatory options appeared.
    mandatories = ['host', 'user', 'password']
    for m in mandatories:
      if not options.__dict__[m]:
        print "mandatory parameter '--" + m + "' is missing\n"
        parser.print_help()
        sys.exit(-1)

    hostname=options.host.lower()
    # if user has put "https://" in front of hostname out of habit, do the right thing
    # hosturl will end up as https://hostname
    if re.match('^https://',hostname):
      hosturl = hostname
    else:
      hosturl = 'https://' + hostname

    user=options.user
    password=options.password
    vendor=options.vendor.lower()
    verbose=options.verbose
    perfdata=options.perfdata
    urlise_country=options.urlise_country.lower()
    timeout=options.timeout
    ignore_list=options.ignore.split(',')
    get_power=options.get_power
    get_volts=options.get_volts
    get_current=options.get_current
    get_temp=options.get_temp
    get_fan=options.get_fan

  # if user or password starts with 'file:', use the first string in file as user, second as password
  if (re.match('^file:', user) or re.match('^file:', password)):
        if re.match('^file:', user):
          filextract = re.sub('^file:', '', user)
          filename = open(filextract, 'r')
          filetext = filename.readline().split()
          user = filetext[0]
          password = filetext[1]
          filename.close()
        elif re.match('^file:', password):
          filextract = re.sub('^file:', '', password)
          filename = open(filextract, 'r')
          filetext = filename.readline().split()
          password = filetext[0]
          filename.close()

# ----------------------------------------------------------------------

getopts()

# if running on Windows, don't use timeouts and signal.alarm
on_windows = True
os_platform = sys.platform
if os_platform != "win32":
  on_windows = False
  import signal
  def handler(signum, frame):
    print 'CRITICAL: Execution time too long!'
    sys.exit(ExitCritical)

# connection to host
verboseoutput("Connection to "+hosturl)
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), NS)

# Add a timeout for the script. When using with Nagios, the Nagios timeout cannot be < than plugin timeout.
if on_windows == False and timeout > 0:
  signal.signal(signal.SIGALRM, handler)
  signal.alarm(timeout)

# run the check for each defined class
GlobalStatus = ExitUnknown
server_info = ""
bios_info = ""
SerialNumber = ""
ExitMsg = ""

# if vendor is specified as 'auto', try to get vendor from CIM
# note: the default vendor is 'unknown'
if vendor=='auto':
  c=wbemclient.EnumerateInstances('CIM_Chassis')
  man=c[0][u'Manufacturer']
  if re.match("Dell",man):
    vendor="dell"
  elif re.match("HP",man):
    vendor="hp"
  elif re.match("IBM",man):
    vendor="ibm"
  elif re.match("Intel",man):
    vendor="intel"
  else:
    vendor='unknown'

for classe in ClassesToCheck :
  verboseoutput("Check classe "+classe)
  try:
    instance_list = wbemclient.EnumerateInstances(classe)
  except pywbem.cim_operations.CIMError,args:
    if ( args[1].find('Socket error') >= 0 ):
      print "CRITICAL: %s" %args
      sys.exit (ExitCritical)
    else:
      verboseoutput("Unknown CIM Error: %s" % args)
  except pywbem.cim_http.AuthError,arg:
    verboseoutput("Global exit set to CRITICAL")
    GlobalStatus = ExitCritical
    ExitMsg = " : Authentication Error! "
  else:
    # GlobalStatus = ExitOK #ARR
    for instance in instance_list :
      sensor_value = ""
      elementName = instance['ElementName']
      elementNameValue = elementName
      verboseoutput("  Element Name = "+elementName)

      # Ignore element if we don't want it
      if elementName in ignore_list :
        verboseoutput("    (ignored)")
        continue

      # BIOS & Server info
      if elementName == 'System BIOS' :
        bios_info =     instance[u'Name'] + ': ' \
            + instance[u'VersionString'] + ' ' \
            + str(instance[u'ReleaseDate'].datetime.date())
        verboseoutput("    VersionString = "+instance[u'VersionString'])

      elif elementName == 'Chassis' :
        man = instance[u'Manufacturer']
    if man is None :
      man = 'Unknown Manufacturer'
        verboseoutput("    Manufacturer = "+man)
        SerialNumber = instance[u'SerialNumber']
        if SerialNumber:
          verboseoutput("    SerialNumber = "+SerialNumber)
        server_info = man + ' '
        if vendor != 'intel':
          model = instance[u'Model']
          if model:
            verboseoutput("    Model = "+model)
            server_info +=  model + ' s/n:'

      elif elementName == 'Server Blade' :
        SerialNumber = instance[u'SerialNumber']
        if SerialNumber:
          verboseoutput("    SerialNumber = "+SerialNumber)

      # Report detail of Numeric Sensors and generate nagios perfdata

      if classe == "CIM_NumericSensor" :
        sensorType = instance[u'sensorType']
        sensStr = sensor_Type.get(sensorType,"Unknown")
        if sensorType:
          verboseoutput("    sensorType = %d - %s" % (sensorType,sensStr))
        units = instance[u'BaseUnits']
        if units:
          verboseoutput("    BaseUnits = %d" % units)
        # grab some of these values for Nagios performance data
        scale = 10**instance[u'UnitModifier']
        verboseoutput("    Scaled by = %f " % scale)
        cr = int(instance[u'CurrentReading'])*scale
        verboseoutput("    Current Reading = %f" % cr)
        elementNameValue = "%s: %g" % (elementName,cr)
        ltnc = 0
        utnc = 0
        ltc  = 0
        utc  = 0
        if instance[u'LowerThresholdNonCritical'] is not None:
          ltnc = instance[u'LowerThresholdNonCritical']*scale
          verboseoutput("    Lower Threshold Non Critical = %f" % ltnc)
        if instance[u'UpperThresholdNonCritical'] is not None:
          utnc = instance[u'UpperThresholdNonCritical']*scale
          verboseoutput("    Upper Threshold Non Critical = %f" % utnc)
        if instance[u'LowerThresholdCritical'] is not None:
          ltc = instance[u'LowerThresholdCritical']*scale
          verboseoutput("    Lower Threshold Critical = %f" % ltc)
        if instance[u'UpperThresholdCritical'] is not None:
          utc = instance[u'UpperThresholdCritical']*scale
          verboseoutput("    Upper Threshold Critical = %f" % utc)
        #
        if perfdata:
          perf_el = elementName.replace(' ','_')

          # Power and Current
          if sensorType == 4:               # Current or Power Consumption
            if units == 7:            # Watts
              if get_power:
                data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),1) )
            elif units == 6:          # Current
              if get_current:
                data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),3) )

          # PSU Voltage
          elif sensorType == 3:               # Voltage
            if get_volts:
              data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),2) )

          # Temperatures
          elif sensorType == 2:               # Temperature
            if get_temp:
              data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),4) )

          # Fan speeds
          elif sensorType == 5:               # Tachometer
            if get_fan:
              if units == 65:           # percentage
                data.append( ("%s=%g%%;%g;%g " % (perf_el, cr, utnc, utc),6) )
              else:
                data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),5) )

      elif classe == "CIM_Processor" :
        verboseoutput("    Family = %d" % instance['Family'])
        verboseoutput("    CurrentClockSpeed = %dMHz" % instance['CurrentClockSpeed'])


      # HP Check
      if vendor == "hp" :
        if instance['HealthState'] is not None :
          elementStatus = instance['HealthState']
          verboseoutput("    Element HealthState = %d" % elementStatus)
          interpretStatus = {
            0  : ExitOK,    # Unknown
            5  : ExitOK,    # OK
            10 : ExitWarning,  # Degraded
            15 : ExitWarning,  # Minor
            20 : ExitCritical,  # Major
            25 : ExitCritical,  # Critical
            30 : ExitCritical,  # Non-recoverable Error
          }[elementStatus]
          if (interpretStatus == ExitCritical) :
            verboseoutput("GLobal exit set to CRITICAL")
            GlobalStatus = ExitCritical
            ExitMsg += " CRITICAL : %s " % elementNameValue
          if (interpretStatus == ExitWarning and GlobalStatus != ExitCritical) :
            verboseoutput("GLobal exit set to WARNING")
            GlobalStatus = ExitWarning
            ExitMsg += " WARNING : %s " % elementNameValue
          # Added the following for when GlobalStatus is ExitCritical and a warning is detected
          # This way the ExitMsg gets added but GlobalStatus isn't changed
          if (interpretStatus == ExitWarning and GlobalStatus == ExitCritical) : # ARR
            ExitMsg += " WARNING : %s " % elementNameValue #ARR
          # Added the following so that GlobalStatus gets set to OK if there's no warning or critical
          if (interpretStatus == ExitOK and GlobalStatus != ExitWarning and GlobalStatus != ExitCritical) : #ARR
            GlobalStatus = ExitOK #ARR



      # Dell, Intel, IBM and unknown hardware check
      elif (vendor == "dell" or vendor == "intel" or vendor == "ibm" or vendor=="unknown") :
        if instance['OperationalStatus'] is not None :
          elementStatus = instance['OperationalStatus'][0]
          verboseoutput("    Element Op Status = %d" % elementStatus)
          interpretStatus = {
            0  : ExitOK,            # Unknown
            1  : ExitCritical,      # Other
            2  : ExitOK,            # OK
            3  : ExitWarning,       # Degraded
            4  : ExitWarning,       # Stressed
            5  : ExitWarning,       # Predictive Failure
            6  : ExitCritical,      # Error
            7  : ExitCritical,      # Non-Recoverable Error
            8  : ExitWarning,       # Starting
            9  : ExitWarning,       # Stopping
            10 : ExitCritical,      # Stopped
            11 : ExitOK,            # In Service
            12 : ExitWarning,       # No Contact
            13 : ExitCritical,      # Lost Communication
            14 : ExitCritical,      # Aborted
            15 : ExitOK,            # Dormant
            16 : ExitCritical,      # Supporting Entity in Error
            17 : ExitOK,            # Completed
            18 : ExitOK,            # Power Mode
            19 : ExitOK,            # DMTF Reserved
            20 : ExitOK             # Vendor Reserved
          }[elementStatus]
          if (interpretStatus == ExitCritical) :
            verboseoutput("Global exit set to CRITICAL")
            GlobalStatus = ExitCritical
            ExitMsg += " CRITICAL : %s " % elementNameValue
          if (interpretStatus == ExitWarning and GlobalStatus != ExitCritical) :
            verboseoutput("GLobal exit set to WARNING")
            GlobalStatus = ExitWarning
            ExitMsg += " WARNING : %s " % elementNameValue
          # Added same logic as in 20100702 here, otherwise Dell servers would return UNKNOWN instead of OK
          if (interpretStatus == ExitWarning and GlobalStatus == ExitCritical) : # ARR
            ExitMsg += " WARNING : %s " % elementNameValue #ARR
          if (interpretStatus == ExitOK and GlobalStatus != ExitWarning and GlobalStatus != ExitCritical) : #ARR
            GlobalStatus = ExitOK #ARR
        if elementName == 'Server Blade' :
                if SerialNumber :
                        if SerialNumber.find(".") != -1 :
                                SerialNumber = SerialNumber.split('.')[1]


# Munge the ouptput to give links to documentation and warranty info
if (urlise_country != '') :
  SerialNumber = urlised_serialnumber(vendor,urlise_country,SerialNumber)
  server_info = urlised_server_info(vendor,urlise_country,server_info)

# Output performance data
perf = '|'
if perfdata:
  sdata=[]
  ctr=[0,0,0,0,0,0,0]
  # sort the data so we always get perfdata in the right order
  # we make no assumptions about the order in which CIM returns data
  # first sort by element name (effectively) and insert sequence numbers
  for p in sorted(data):
    p1 = p[1]
    sdata.append( ("P%d%s_%d_%s") % (p1,perf_Prefix[p1], ctr[p1], p[0]) )
    ctr[p1] += 1
  # then sort perfdata into groups and output perfdata string
  for p in sorted(sdata):
    perf += p

# sanitise perfdata - don't output "|" if nothing to report
if perf == '|':
  perf = ''

if GlobalStatus == ExitOK :
  print "OK - Server: %s %s %s%s" % (server_info, SerialNumber, bios_info, perf)

elif GlobalStatus == ExitUnknown :
  print "UNKNOWN: %s" % (ExitMsg) #ARR

else:
  print "%s- Server: %s %s %s%s" % (ExitMsg, server_info, SerialNumber, bios_info, perf)

sys.exit (GlobalStatus)
Now test the script to make sure it works.

Code: Select all

/usr/local/nagios/libexec/check_esxi_hardware.py -H 192.168.107.44 -U esxiuser -P esxipassword -V ibm
The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

sudo systemctl restart nagios

Post Reply