My Notes for Installing Nagios on Ubuntu Server 16.04 LTS

Post Reply
User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

My Notes for Installing Nagios on Ubuntu Server 16.04 LTS

Post: # 491Post LHammonds
Mon Feb 20, 2017 7:17 pm

Greetings and salutations,

I hope this thread will be helpful to those who follow in my foot steps as well as getting any advice based on what I have done / documented.

This is a Work-In-Progress topic so I will be updating this thread as I complete/update my notes.

To discuss this thread, please participate here: Ubuntu Forums (NEED TO INSERT DISCUSSION LINK HERE)

High-level overview

This thread will cover installation of a dedicated Ubuntu server and Nagios monitoring system. The server will be installed inside a virtual machine. If you have any advice on doing things better, please let me know. I love feedback and learning better ways of doing things!

I choose to build Nagios from the source download rather than install from the repository with apt-get. The reason is that you get the newer version this way and you have full control over the installation options.

This documentation will only cover a very specific installation. Nagios was designed to be able to handle just about anything you want to monitor so it will be different for each install and even with the same hardware needing to be monitored, two administrators may decide differently on what needs to be monitored.

Tools utilized in this process

Helpful links

The list below are sources of information that helped me configure this system as well as some places that might be helpful to me later on as this process continues.
Assumptions

This documentation will need to make use of some very-specific information that will most-likely be different for each person / location. And as such, I will note some of these in this section. They will be highlighted in red throughout the document as a reminder that you should plug-in your own value rather than actually using my "place-holder" value.

Under no circumstance should you use the actual values I list below. They are place-holders for the real thing. This is just a checklist template you need to have answered before you start the install process.

Wherever you see RED in this document, you need to substitute it for what your company uses. Use the list below as a template you need to have answered before you continue.

  • Ubuntu Server name: srv-nagios
  • Internet domain: mydomain.com
  • Ubuntu Server IP address: 192.168.107.21
  • Ubuntu Admin ID: administrator
  • Ubuntu Admin Password: myadminpass
  • Nagios Admin Password: mynagiospass
  • Nagios NSClient Port #: 12489
  • Nagios NRPE Port #: 5666
  • Nagios Service Password: myservicepass
  • Email Server (remote): 192.168.107.25
  • Windows Share ID: myshare
  • Windows Share Password: mysharepass
I also assume the reader knows how to use the VI editor. If not, you will need to beef up your skill set or use a different editor in place of it.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Re: My Notes for Installing Nagios on Ubuntu Server 16.04 LT

Post: # 492Post LHammonds
Mon Feb 20, 2017 7:42 pm

Install Ubuntu Server

The Ubuntu Server Long-Term Support (LTS) is free but we have the option of buy support and that is the main reason this server was selected.

The steps for setting up the base server are covered in this article: How to install and configure Ubuntu Server

It is assumed that the server was configured according to that article with the exceptions that the assumptions in red (variables above) are used instead of the assumptions in that document since we are building a database server.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Prerequisites

Post: # 493Post LHammonds
Mon Feb 20, 2017 7:46 pm

Nagios Prerequisites
  1. Install the required programs:
    apt -y install build-essential apache2 php apache2-mod-php7.0 php-gd libgd-dev unzip
  2. Create users and groups:
    mkdir -p /etc/nagios /var/nagios groupadd --system --gid 9000 nagios groupadd --system --gid 9001 nagcmd adduser --system --gid 9000 --home /usr/local/nagios nagios usermod --groups nagcmd nagios usermod --append --groups nagcmd www-data chown nagios:nagios /usr/local/nagios /etc/nagios /var/nagios

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Build and Install Nagios

Post: # 494Post LHammonds
Mon Feb 20, 2017 7:46 pm

Build and Install Nagios from Source
  • Download Nagios software (NOTE: You can use newer links once new versions become available):

    Code: Select all

    cd /usr/local/src
    wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.3.1.tar.gz
    
  • Build and install Nagios Core:

    Code: Select all

    tar -xzvf /usr/local/src/nagios-4.3.1.tar.gz
    cd /usr/local/src/nagios-4.3.1
    ./configure --sysconfdir=/etc/nagios --localstatedir=/var/nagios --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagcmd --with-mail=/usr/bin/sendemail
    make all
    make install
    make install-init
    make install-config
    make install-commandmode
    /usr/bin/install -c -m 644 sample-config/httpd.conf /etc/apache2/sites-available/nagios.conf
    cp -R contrib/eventhandlers/ /usr/local/nagios/libexec/
    chown -R nagios:nagios /usr/local/nagios/libexec/eventhandlers
  • Edit the commands:

    Code: Select all

    vi /etc/nagios/objects/commands.cfg
  • Change both sendemail references to match the correct sendemail syntax:

    Code: Select all

    define command{
     command_name    notify-host-by-email
     command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/sendemail -s srv-mail:25 -f "admin <admin@nagios.server>" -t $CONTACTEMAIL$ -u "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **"
    }
     
    define command{
    command_name    notify-service-by-email
    command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/sendemail -s srv-mail:25 -f "admin <admin@nagios.server>" -t $CONTACTEMAIL$ -u "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **"
    }
  • Save and close commands.cfg
  • Edit the contacts:

    Code: Select all

    vi /etc/nagios/objects/contacts.cfg
  • Change the following:
    define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias John Doe ; Full name of user email John.Doe@mydomain.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** }
  • Save and close contacts.cfg
  • Type the following:
    cd /usr/local/src/nagios make install-webconf
  • Set the nagiosadmin password to mynagiospassword by typing the following:
    htpasswd -c /etc/nagios/htpasswd.users nagiosadmin service apache2 reload
  • Ensure Nagios can execute by typing chmod +x /etc/init.d/nagios (NOTE: On this version of Ubuntu and Nagios, it is already set correctly)
  • Type the following to avoid startup problems: (NOTE: This is not documented anywhere, it is just my trial, error and observation)
    mkdir -p /usr/local/nagios/var/spool/checkresults chown nagios:nagios /var/nagios/spool/checkresults chown nagios:nagios /var/nagios/spool chown nagios:nagios /var/nagios
  • If Nagios does not start up automatically whenever the server is rebooted, you can use these commands to fix it (probably do not need to do this):
    /usr/sbin/update-rc.d -f nagios defaults 99 ln -s /etc/init.d/nagios /etc/rcS.d/S99nagios
  • Check your Nagios configuration file for errors. Look for errors in red.
    /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
  • NOTE TO SELF: I need to generate (and document) an SSL certificate to enable SSL to protect the password during authentication. Self-Signed Certs
  • Start Nagios for the 1st time.
    /etc/init.d/nagios start
  • Access the web-based administration utility at http://192.168.107.21/nagios/ (use nagiosadmin for the ID and mynagiospassword for the password)
NOTE: The Nagios server is now up-and-running but doing absolutely nothing. ;) We need plugins to actually make it do something so we will install a base plugin pack. However, we will eventually need to get other plugins and maybe write our own in order to monitor everything we want.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Plugins

Post: # 495Post LHammonds
Mon Feb 20, 2017 8:45 pm

Nagios Plugin Prerequisites

Nagios Plugin Requirements for check_snmp:

Code: Select all

perl -MCPAN -e 'install Net::SNMP'
Configure as much as possible automatically? yes
apt -y install snmp
Requirements for check_mysql: (NOTE: For my site, this is not necessary because I will run it locally on MySQL server)

Code: Select all

apt -y install libmysqlclient-dev
Requirements for check_nrpe:

Code: Select all

apt -y install libssl-dev

Nagios Plugins

Download, build and install Nagios plugins (NOTE: You can use newer links once new versions become available):

Code: Select all

cd /usr/local/src
wget https://nagios-plugins.org/download/nagios-plugins-2.1.4.tar.gz
tar xzf /usr/local/src/nagios-plugins-2.1.4.tar.gz
cd /usr/local/src/nagios-plugins-2.1.4
./configure --sysconfdir=/etc/nagios --localstatedir=/var/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-openssl
make
make install

Download, build and install NRPE plugin (for 64-bit servers)

Code: Select all

cd /usr/local/src
wget https://github.com/NagiosEnterprises/nrpe/archive/3.0.1.tar.gz
mv /usr/local/src/3.0.1.tar.gz /usr/local/src/nrpe-3.0.1.tar.gz
tar xzf /usr/local/src/nrpe-3.0.1.tar.gz
cd /usr/local/src/nrpe-3.0.1
./configure --sysconfdir=/etc/nagios --libexecdir=/usr/local/nagios/libexec --prefix=/usr/local/nagios --localstatedir=/var/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-nrpe-user=nagios --with-nrpe-group=nagios --enable-ssl=yes --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/x86_64-linux-gnu
make all
make install-plugin
For 32-bit servers, do the above but change the configure line to this:

Code: Select all

./configure --sysconfdir=/etc/nagios --libexecdir=/usr/local/nagios/libexec --prefix=/usr/local/nagios --localstatedir=/var/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-nrpe-user=nagios --with-nrpe-group=nagios --enable-ssl=yes --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/i386-linux-gnu
To get the SNMP commands (sysContact.0, sysName.0, sysLocation.0, sysUpTime.0) to work without hanging up, type the following commands:

Code: Select all

cd /usr/share/mibs/netsnmp
wget ftp://ftp.cisco.com/pub/mibs/v2/SNMPv2-MIB.my
wget ftp://ftp.cisco.com/pub/mibs/v1/RFC1213-MIB.my
wget ftp://ftp.cisco.com/pub/mibs/v2/IANAifType-MIB.my
wget ftp://ftp.cisco.com/pub/mibs/v2/SNMPv2-TC.my
wget ftp://ftp.cisco.com/pub/mibs/v2/SNMPv2-SMI.my
Verify that Plugins are Working!

For all the plugins we intend on using, we need to verify they are working before trying to integrate them into Nagios. However, not all plugins will work without first configuring the target to be monitored.

Ping an IP address you know to be active:

Code: Select all

/usr/local/nagios/libexec/check_icmp -H 192.168.107.20

Check for an HTTP reply from a web server:

Code: Select all

/usr/local/nagios/libexec/check_http -H 192.168.107.20

Check for a response from an HP LaserJet printer:

Code: Select all

/usr/local/nagios/libexec/check_hpjd -H 192.168.107.51 -C public

Check the uptime of a router via SNMP:

Code: Select all

/usr/local/nagios/libexec/check_snmp -H 192.168.107.1 -C public -o sysUpTime.0
NOTE: For whatever reason, this command hangs on me. Not sure what I did wrong this time but I'll track it down, fix it and update these dox.

Check a MySQL server (if on local host):

Code: Select all

/usr/local/nagios/libexec/check_mysql -H 192.168.107.20 -P 3306 -u mysqlid -p mysqlpassword
NOTE: This will fail if you do not configure the MySQL server 1st. However, you might want to run the MySQL command remotely via NRPE instead.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

User Accounts

Post: # 507Post LHammonds
Thu May 04, 2017 5:31 pm

Managing User Accounts

It is recommended to replace the nagiosadmin with a different account and here is how you do it.

We are going to add 3 administrators with the same level of access as the nagiosadmin.

ID / Password: lhammonds / abc123
ID / Password: ddiggler / jigglier69
ID / Password: jdoe / jlow9876
  1. Login with your administrator account. At the $ prompt, temporarily grant yourself super user privilages by typing sudo su {ENTER} and then provide the administrator password (myadminpass).
  2. Type the following commands to add the users to the web interface (this will also update passwords of existing users):
    htpasswd /etc/nagios/htpasswd.users lhammonds abc123 htpasswd /etc/nagios/htpasswd.users ddiggler jiggler69 htpasswd /etc/nagios/htpasswd.users jdoe jlow9876
  3. Edit cgi.cfg

    Code: Select all

    vi /etc/nagios/cgi.cfg
    Search/replace "nagiosadmin" with "lhammonds,ddiggler,jdoe"
    For example, in VI, you type:

    Code: Select all

    :%s/nagiosadmin/lhammonds,ddiggler,jdoe/g
  4. Restart the apache service:

    Code: Select all

    service apache2 stop
    service apache2 start
  5. Now open a web browser and go to http://192.168.107.21/nagios and see if you can login with your new accounts. NOTE: There is no logout option, you will need to close the browser and re-open it to test different accounts.
  6. Once you have verified your accounts work, you can safely delete the nagiosadmin account by typing the following:

    Code: Select all

    htpasswd -D /etc/nagios/htpasswd.users nagiosadmin

To fine-tune user accounts, you can add or remove them from the following permission branches in /etc/nagios/cgi.cfg

authorized_for_system_information
authorized_for_configuration_information
authorized_for_system_commands
authorized_for_all_services
authorized_for_all_hosts
authorized_for_all_service_commands
authorized_for_all_host_commands

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Sounds

Post: # 571Post LHammonds
Thu Feb 15, 2018 9:09 am

Custom Sounds

Nagios allows custom WAV sounds to be played as alerts on the web page but does not come with any (that I could tell). So I went through my audio collection and pulled out a few clips I thought would work good for the various event types.

Each audio clip was converted to WAV format and stereo turned into mono.

Nagios-Sounds.7z (80 files, 10 MB)

To enable this, here is what you can do.

Edit /etc/nagios/cgi.cfg
Around line #313, you will find the following section commented out:

Code: Select all

#host_unreachable_sound=host-unreachable.wav
#host_down_sound=host-down.wav
#service_critical_sound=critical.wav
#service_warning_sound=warning.wav
#service_unknown_sound=warning.wav
#normal_sound=noproblem.wav
I do not know about you but I tend to have services in the red all the time (WindowsUpdate) which tend to stay that way more often than not...so I would not be interested at all in service alert sounds. However, I would like to have audible alerts for host issues and this is how I modified mine:

Code: Select all

host_unreachable_sound=host-unreachable.wav
host_down_sound=host-down.wav
#service_critical_sound=critical.wav
#service_warning_sound=warning.wav
#service_unknown_sound=warning.wav
normal_sound=noproblem.wav
I then used the following files from my audio collection and copied them to my Samba share and moved/renamed them to the proper location.

Star Trek\command-path-discontinuity.wav
Star Trek\losing-power.wav
Star Trek\i-hate-prototypes.wav

On the server, I then typed these commands:

Code: Select all

mv /srv/samba/share/command-path-discontinuity.wav /usr/local/nagios/share/media/host-unreachable.wav
mv /srv/samba/share/losing-power.wav /usr/local/nagios/share/media/host-down.wav
mv /srv/samba/share/i-hate-prototypes.wav /usr/local/nagios/share/media/noproblem.wav
chown nagios:nagios /usr/local/nagios/share/media/*.wav
chmod 0444 /usr/local/nagios/share/media/*.wav
As long as your browser can play wav files, you will be able to hear any changes in host status while you leave your web browser on the nagios monitoring page. I typically have mine on the tactical overview page.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Monitoring Remote Linux Servers

Post: # 572Post LHammonds
Thu Feb 15, 2018 10:13 am

Monitoring Remote Linux Servers

Since there are other Linux boxes that need to be monitored, the NRPE plugin and NRPE service will be installed on each Linux box.

Setup the remote Linux server to be monitored:

Create the Nagios user and group:

Code: Select all

groupadd --system --gid 9000 nagios
adduser --system --gid 9000 --home /usr/local/nagios nagios
chown nagios:nagios /usr/local/nagios
chmod 0755 /usr/local/nagios
Install Nagios standard and NRPE plugins. Rather and compiling from source, we will just use what comes with the repository.

Code: Select all

apt -y install nagios-plugins nagios-nrpe-server
Make a backup of the NRPE configuration files before modifying them:

Code: Select all

cp /etc/nagios/nrpe.cfg /etc/nagios/nrpe.cfg.bak
cp /etc/nagios/nrpe_local.cfg /etc/nagios/nrpe_local.cfg.bak
Edit the local configuration:

Code: Select all

vi /etc/nagios/nrpe_local.cfg
Add the IP of your Nagios server to the "allowed_hosts" line and list only the plugins that be used:
allowed_hosts=192.168.107.21,127.0.0.1 command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_disk_app]=/usr/lib/nagios/plugins/check_disk -p /var -w 20% -c 10% command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -p / -w 20% -c 10% command[check_disk_all]=/usr/lib/nagios/plugins/check_disk -w 15% -c 10% command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 200 -c 240 command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 15% -c 10% command[check_apt]=/usr/lib/nagios/plugins/check_apt
TIP: if you define separate disk checks like the above, you can assign different notifications. For example, you could have the Linux administrator get email notification when the root partition reaches the warning threshold (during business hours) and send an alert to his pager (at any time of the day) if the root partition reaches critical. The application manager could get a different notice for /var notices such as both warnings and criticals going to through SMS to his phone at any time of the day.

Check the status of the NRPE server:

Code: Select all

/etc/init.d/nagios-nrpe-server status
If the NRPE server is not running, this is how you can start it:

Code: Select all

/etc/init.d/nagios-nrpe-server start
If the NRPE server was already running and you made configuration changes, use this command to load the new changes:

Code: Select all

/etc/init.d/nagios-nrpe-server reload
Now see if your configured commands will run on your server (before trying to test them remotely on the Nagios server)

Code: Select all

/usr/lib/nagios/plugins/check_users -w 5 -c 10
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
/usr/lib/nagios/plugins/check_disk -w 15% -c 10%
/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
/usr/lib/nagios/plugins/check_procs -w 200 -c 240
/usr/lib/nagios/plugins/check_swap -w 15% -c 10%
/usr/lib/nagios/plugins/check_apt
Test Connectivity of NRPE Plugin

Test the connectivity of the NRPE service on your server to be monitored by trying to access the server via telnet using the NRPE port number.

If we installed the NRPE server on a machine with the address of 192.168.107.20, type the following at the console of your Nagios server:

Code: Select all

telnet 192.168.107.20 5666
If you get a response of Escape character is '^]'., then you have a good connection. Type exit to close the connection.

If the command fails with a timeout, you might need to add rules to your firewall:

Code: Select all

iptables -A INPUT -p tcp  --dport 5666 -j ACCEPT
iptables -A OUTPUT -p tcp  --dport 5666 -j ACCEPT
service iptables save
Now try executing some of the commands you have configured on your remote Linux server (that stuff in the nrpe_local.cfg file)
/usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_users /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_load /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_disk_all /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_zombie_procs /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_total_procs /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_apt
If it all looks good, you can then use commands in a server configuration file. See the sample configurations posted earlier.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Monitoring Remote Windows Servers

Post: # 573Post LHammonds
Thu Feb 15, 2018 10:20 am

Monitoring Remote Windows Servers

Monitoring Windows Servers and Workstations will requiring installing a service if you need data better than a simple ping.

For this, we will be using NSClient++. In particular, we will be downloading the Win32 and x64 "zip" files for version 0.3.9.

The reason why I chose ZIP files instead of the MSI files is that it is much more simple to configure and rollout.

Extract the Win32 ZIP file to C:\NSClient\ and edit C:\NSClient\nsc.ini

Uncomment the DLL files you will be using between lines 10 and 22. For example:

Code: Select all

FileLogger.dll
CheckSystem.dll
CheckDisk.dll
NSClientListener.dll
NRPEListener.dll
SysTray.dll
CheckEventLog.dll
CheckHelpers.dll
;CheckWMI.dll
CheckNSCP.dll
 
; Script to check external scripts and/or internal aliases.
CheckExternalScripts.dll

On line 56, set the password that will be required to access the remote functions. For example:
password=my-nsclient-password

On the Nagios server, you will need to match this password in your resource file which will then be referenced in your server config file.
/etc/nagios/resources.cfg
$USER5$=my-nsclient-password

On line 62, set the IP of the Nagios server to limit access to just that host. For example:
allowed_hosts=192.168.107.21

On line 67, tell it to use this file to obtain settings rather than the registry.

Code: Select all

use_file=1

On line 100, set the IP of the Nagios server to limit access to just that host. For example:
allowed_hosts=192.168.107.21

On line 104, set the port number that will be used for communication with Nagios via check_nt. It would be wise to use a port other than the default. This example is using the default port:
port=12489

On line 118, set the port number that will be used for communication with Nagios via check_nrpe. It would be wise to use a port other than the default. This example is using the default port:
port=5666

On line 134, enable SSL. For example:

Code: Select all

use_ssl=1

On line 144, set the IP of the Nagios server to limit access to just that host. For example:
allowed_hosts=192.168.107.21

On line 244, enable the check for Windows Update script. For example:

Code: Select all

check_updates=check_updates.vbs

Now, to make rolling this out a snap, create a couple of batch files to install / remove the NSClient service:

C:\NSClient\service-install.bat

Code: Select all

@ECHO OFF
NSCP.exe service --install
START NET START NSCP /WAIT
pause

C:\NSClient\service-uninstall.bat

Code: Select all

@ECHO OFF
START NET STOP NSCP /WAIT
NSCP.exe service --uninstall
pause

Copy the C:\NSClient folder to a network share and then go to each Windows host you want to monitor and copy the folder to C:\NSClient and run the "Service-Install.bat" file as administrator.

You will also need to add rules to your firewall to allow communication from the Nagios server.

Inbound Rule Name: Nagios 12489 TCP
- Check: Enabled
- Action: Allow the connection
- Protocol Type: TCP
- Local Port: 12489
- Remote Port: All Ports
- Profile: Domain
- Local IP address: Any IP address
- Remote IP address: These IP addresses: 192.168.107.21

Inbound Rule Name: Nagios 5666 TCP
- Check: Enabled
- Action: Allow the connection
- Protocol Type: TCP
- Local Port: 5666
- Remote Port: All Ports
- Profile: Domain
- Local IP address: Any IP address
- Remote IP address: These IP addresses: 192.168.107.21

On the Nagios server, create or copy a Windows config file and make appropriate changes such as server name and IP. See the Sample Windows config file posted earlier in the thread.

The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh

If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

service nagios stop
service nagios start

Rinse, lather repeat for the x64 version if you have 64-bit servers.

NOTE: The Win32 version will work on 64-bit servers. The only problem is if you need to check for the existence of running processes such as Explorer.exe or Notepad.exe which are 64-bit. The Win32 client cannot properly detect 64-bit programs.

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Configuration Framework

Post: # 574Post LHammonds
Thu Feb 15, 2018 10:44 am

Configuration Framework

The 1st thing I like to do is the creation of the folder structure I plan to use and then copy or rename all example configuration files to unused text files. This ensures the originals are preserved as a reference.

Code: Select all

mkdir -p /etc/nagios/servers
mkdir -p /etc/nagios/printers
mkdir -p /etc/nagios/switches
mkdir -p /etc/nagios/workstations
cp /etc/nagios/nagios.cfg /etc/nagios/example-nagios.txt
cp /etc/nagios/resource.cfg /etc/nagios/example-resource.txt
mv /etc/nagios/objects/windows.cfg /etc/nagios/servers/example-win.txt
mv /etc/nagios/objects/localhost.cfg /etc/nagios/servers/example-local.txt
mv /etc/nagios/objects/switch.cfg /etc/nagios/switches/example-sw.txt
mv /etc/nagios/objects/printer.cfg /etc/nagios/printers/example-ptr.txt
cp /etc/nagios/objects/commands.cfg /etc/nagios/objects/example-commands.txt
cp /etc/nagios/objects/contacts.cfg /etc/nagios/objects/example-contacts.txt
cp /etc/nagios/objects/templates.cfg /etc/nagios/objects/example-templates.txt
cp /etc/nagios/objects/timeperiods.cfg /etc/nagios/objects/example-timeperiods.txt
chown --recursive nagios:nagios /etc/nagios/*
chmod --recursive 0664 *.cfg
Edit nagios.cfg and uncomment/add lines 52, 53 and 54 so it looks like this:

Code: Select all

vi /etc/nagios/nagios.cfg

Code: Select all

cfg_dir=/etc/nagios/servers
cfg_dir=/etc/nagios/printers
cfg_dir=/etc/nagios/switches
cfg_dir=/etc/nagios/workstations
This allows you to place config files in those folders and they will be automatically picked up without having to edit the Nagios.cfg file. I have a file for each object...or you could place all objects into a single file but it makes it harder to edit with the more you monitor.

verify.sh

Anytime you need to make a configuration change, you should always run a verification against your changes to ensure the Nagios service will be able to start up once you restart the service for the change to take effect. This is called the pre-flight check and this script will make it easier to run.

The full command is this:

Code: Select all

/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
As you can see, it is a lot to type/remember. I prefer to have a handy little script in the configuration folder to make it easier to run a verification.

/etc/nagios/verify.sh

Code: Select all

touch /etc/nagios/verify.sh
chmod 0755 /etc/nagios/verify.sh
printf "#!/bin/bash\n" >> /etc/nagios/verify.sh
printf "/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg\n" /etc/nagios/verify.sh
Now all that has to be done is to run the verify script.

If you are in the /etc/nagios folder, you type:

Code: Select all

./verify.sh
If currently sitting in a sub-folder, just type:

Code: Select all

../verify.sh
Host Groups

I group all of my objects according to how I like to see them separated. This is done using "hostgroups" when defining a host. I keep all of these hostgroups defined in a single configuration file.

The file is referenced in /etc/nagios/nagios.cfg with the following line:

Code: Select all

cfg_file=/etc/nagios/objects/hostgroups.cfg
Here is a sample of what is contained in that file:

/etc/nagios/objects/hostgroups.cfg

Code: Select all

###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################
 
define hostgroup{
        hostgroup_name  ibm-servers
        alias           IBM Servers
        }
 
define hostgroup{
        hostgroup_name  aix-servers
        alias           IBM AIX Servers
        }
 
define hostgroup{
        hostgroup_name  ubuntu-servers
        alias           Ubuntu Servers
        }
 
define hostgroup{
        hostgroup_name  esx-servers
        alias           ESX Servers
        }
 
define hostgroup{
    hostgroup_name    windows2000-servers
    alias        Windows 2000 Servers
    }
 
define hostgroup{
    hostgroup_name    windows2003-servers
    alias        Windows 2003 Servers
    }
 
define hostgroup{
    hostgroup_name    windows2008-servers
    alias        Windows 2008 Servers
    }
 
define hostgroup{
    hostgroup_name    win7-pcs
    alias        Windows 7 PCs
    }
 
define hostgroup{
    hostgroup_name    winxp-pcs
    alias        Windows XP PCs
    }
 
define hostgroup{
    hostgroup_name    switches
    alias        Network Switches
    }
 
define hostgroup{
    hostgroup_name    wireless
    alias        Wireless Access Points
    }
 
define hostgroup{
    hostgroup_name    printers-hp
    alias        HP Printers
    }
 
define hostgroup{
    hostgroup_name    printers-brother
    alias        Brother Printers
    }
 
define hostgroup{
    hostgroup_name    copiers-toshiba
    alias        Toshiba Copiers
    }

Sample Ubuntu Server Config File

Here is my basic shell for an Ubuntu server:

/etc/nagios/servers/srv-wiki.cfg

Code: Select all

###############################################################################
#
# HOST DEFINITION
#
###############################################################################

define host{
        use             ubuntu-server
        host_name       srv-wiki
        alias           SRV-Wiki
        address         192.168.107.23
        hostgroups      ubuntu-servers
        contacts        linux-admin-pager
        parents         srv-esxi1
        }

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     PING
    check_command           check_icmp!100.0,20%!500.0,60%
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     HTTP
    check_command           check_http
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     APT Upgrade
    check_command           check_nrpe!check_apt
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     APT Upgrade MotD
    check_command           check_nrpe!check_apt_motd
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     All Disks
    check_command           check_nrpe!check_disk_all
    notifications_enabled   1
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     Current Load
    check_command           check_nrpe!check_load
    notifications_enabled   1
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     Total Processes
    check_command           check_nrpe!check_total_procs
    notifications_enabled   1
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     Swap Usage
    check_command           check_nrpe!check_swap
    notifications_enabled   1
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     Zombie Processes
    check_command           check_nrpe!check_zombie_procs
    notifications_enabled   1
    }

define service{
    use                     generic-service
    host_name               srv-wiki
    service_description     Users
    check_command           check_nrpe!check_users
    }
Sample Windows Server Config File

Here is my basic shell for a Windows server:

/etc/nagios/servers/srv-mssql.cfg

Code: Select all

define host{
    use             windows-server
    host_name       srv-mssql
    alias           Win2008-SRV-GP
    address         192.168.107.69
    hostgroups      windows2008-servers
        contacts        windows-admin-email
    parents         srv-esxi2
    }

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     NSClient++ Version
        check_command           check_nt!CLIENTVERSION -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     Uptime
        check_command           check_nt!UPTIME -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     CPU Load
        check_command           check_nt!CPULOAD!-l 5,80,90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     Memory Usage
        check_command           check_nt!MEMUSE!-w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     hd-service
        host_name               srv-mssql
        service_description     Drive C:
        check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     hd-service
        host_name               srv-mssql
        service_description     Drive D:
        check_command           check_nt!USEDDISKSPACE!-l d -w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     MS SQL Server
        check_command           check_nt!SERVICESTATE!-d SHOWALL -l MSSQLSERVER -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     SQL Server Agent
        check_command           check_nt!SERVICESTATE!-d SHOWALL -l SQLSERVERAGENT -H $HOSTADDRESS$ -p 12489 -s $USER5$
        }

define service{
        use                     generic-service
        host_name               srv-mssql
        service_description     WindowsUpdates
        check_command           check_nrpe!check_updates!1
        }

## This can be used for servers that require the console to be logged in.
#define service{
#        use                     generic-service
#        host_name               srv-mssql
#        service_description     Explorer
#        check_command           check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe -H $HOSTADDRESS$ -p 12489 -s $USER5$
#        }

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Samples continued

Post: # 575Post LHammonds
Thu Feb 15, 2018 10:48 am

Sample Network Switch Config File

Here is my basic shell for a switch:

NOTE: The MIB codes are specific to the hardware, you probably will need to research the MIB that matches your hardware.

Code: Select all

###############################################################################
# Switches.cfg
#
# Last Modified: 2012-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
        use             summit-switch
        host_name       SW-TX-IS
        alias           Texas IS Area
        address         192.168.107.230
        hostgroups      switches
        parents         SW-TX-Core
        }

define host{
        use             cisco-switch
        host_name       SW-TX-FD
        alias           Texas Front Desk
        address         192.168.107.231
        hostgroups      switches
        parents         SW-TX-FD
        }

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Ping switch

define service{
        use                     switch-critical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     PING
        check_command           check_ping!200.0,20%!600.0,60%
        }

# Monitor uptime via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Uptime
        check_command           check_snmp!-C public -o sysUpTime.0
        }

# Monitor Contact via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Contact
        check_command           check_snmp!-C public -o sysContact.0
        }

# Monitor Location via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Location
        check_command           check_snmp!-C public -o sysLocation.0
        }

# Monitor Over Temperature Alarm via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Temperature Over Alarm
        check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.7.0
        }

# Monitor Current Temperature via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Temperature Current
        check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.8.0
        }

# Monitor the Primary Software Revision Number via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Software Rev 1st
        check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.13.0
        }

# Monitor the Secondary Software Revision Number via SNMP

define service{
        use                     switch-noncritical-service
        host_name               SW-TX-IS,SW-TX-FD
        service_description     Software Rev 2nd
        check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.14.0
        }
Sample HP Printer Config File

Here is my basic shell for an HP printer:

Code: Select all

###############################################################################
# Printer-HP.cfg
#
# Last Modified: 2012-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
        use             generic-printer
        host_name       PTR-TX-ADMIN
        alias           Texas Admin
        address         192.168.107.254
        hostgroups      printers-hp
        parents         SW-TX-Core
}

define host{
        use             generic-printer
        host_name       PTR-TX-ADMIN-COLOR
        alias           Texas Admin - HPColor
        address         192.168.107.253
        hostgroups      printers-hp
        parents         SW-TX-Core
        }

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

define service{
        use                     hp-noncritical-service
        host_name               PTR-TX-ADMIN,PTR-TX-ADMIN-COLOR
        service_description     PING
        check_command           check_ping!3000.0,80%!5000.0,100%
        }

define service{
        use                     hp-noncritical-service
        host_name               PTR-TX-ADMIN,PTR-TX-ADMIN-COLOR
        service_description     Printer Status
        check_command           check_hpjd!-C public
        }
Sample Brother Printer Config File

Here is my basic shell for an Brother printer:

Code: Select all

###############################################################################
# Printer-Brother.cfg
#
# Last Modified: 2010-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
        use             generic-printer
        host_name       PTR-TX-IS
        alias           Texas IS - ISHP
        address         192.168.107.252
        hostgroups      printers-brother
        parents         SW-TX-Core
        }

define host{
        use             generic-printer
        host_name       PTR-TX-FD
        alias           Texas Front Desk
        address         192.168.107.251
        hostgroups      printers-brother
        parents         SW-TX-Core
        }

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Create a service for "pinging" the printer occassionally.  Useful for monitoring RTA, packet loss, etc.

define service{
        use                     brother-noncritical-service
        host_name               PTR-TX-IS,PTR-TX-FD
        service_description     PING
        check_command           check_ping!3000.0,80%!5000.0,100%
        normal_check_interval   10
        retry_check_interval    1
        }
Sample Toshiba Copier Config File

Here is my basic shell for a Toshiba Copier:

Code: Select all

###############################################################################
# Copier-Toshiba.cfg
#
# Last Modified: 2012-05-25
###############################################################################

###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################

define host{
        use             toshiba-copier
        host_name       TE-COPIER-01
        alias           Toshiba e-Studio255
        address         192.168.107.250
        hostgroups      copiers-toshiba
        parents         SW-TX-Core
        }

define host{
        use             toshiba-copier
        host_name       TE-COPIER-02
        alias           Toshiba e-Studio255
        address         192.168.107.249
        hostgroups      copiers-toshiba
        parents         SW-TX-Core
        }

###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################

# Create a service for "pinging" the printer occassionally.  Useful for monitoring RTA, packet loss, etc.

define service{
        use                     copier-service
        host_name               TE-COPIER-01,TE-COPIER-02
        service_description     PING
        check_command           check_ping!3000.0,80%!5000.0,100%
        }

define service{
        use                     copier-service
        host_name               TE-COPIER-01,TE-COPIER-02
        service_description     Contact
        check_command           check_snmp!-C public -o sysContact.0
        }

define service{
        use                     copier-service
        host_name               TE-COPIER-01,TE-COPIER-02
        service_description     Location
        check_command           check_snmp!-C public -o sysLocation.0
        }

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Contacts

Post: # 576Post LHammonds
Thu Feb 15, 2018 10:51 am

Contacts

It would be a good idea to define contact groups and associate services to them. Further down, define your contacts and what contact groups they belong to. This will make it easier to maintain, even if you have a small shop or grow to a large shop, maintenance is primarily handled at the contact level.

Here is an example of what I mean:

/etc/nagios/objects/contacts.cfg

Code: Select all

###############################################################################
# CONTACTS.CFG - CONTACT/CONTACTGROUP DEFINITIONS
#
# Last Modified: 2012-05-25
###############################################################################
 
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################

define contactgroup{
        contactgroup_name       windows-server-admins
        alias                   Windows Server Administrators
        }

define contactgroup{
        contactgroup_name       linux-server-admins
        alias                   Linux Server Administrators
        }

define contactgroup{
        contactgroup_name       windows-pc-admins
        alias                   Windows PC Administrators
        }

define contactgroup{
        contactgroup_name       ibm-bladecenter-admins
        alias                   IBM BladeCenter Administrators
        }

define contactgroup{
        contactgroup_name       network-admins
        alias                   Network Administrators
        }

define contactgroup{
        contactgroup_name       phone-admins
        alias                   Phone Administrators
        }

define contactgroup{
        contactgroup_name       printer-admins
        alias                   Printer Administrators
        }

define contactgroup{
        contactgroup_name       copier-admins
        alias                   Copier/Fax/Scanner Administrators
        }

###############################################################################
#
# CONTACTS
#
###############################################################################

define contact{
        contact_name    dirkdiggler-email
        use             generic-contact
        alias           Dirk Diggler
        email           dirk.diggler@mydomain.com
        contactgroups   windows-server-admins,windows-pc-admins,linux-server-admins,ibm-bladecenter-admins,network-admins,phone-admins,printer-admins,copier-admins
        }

define contact{
        contact_name    johndoe-email
        use             generic-contact
        alias           John Doe
        email           john.doe@mydomain.com
        contactgroups   windows-server-admins,windows-pc-admins,ibm-bladecenter-admins,network-admins,printer-admins,copier-admins
        }

define contact{
        contact_name    maryjane-email
        use             generic-contact
        alias           Mary Jane
        email           mary.jane@mydomain.com
        contactgroups   windows-server-admins,windows-pc-admins,network-admins,phone-admins,printer-admins,copier-admins
        }

define contact{
        contact_name    dirkdiggler-pager
        use             generic-pager
        alias           Dirk Diggler Pager
        pager           8005551234@txt.att.net
        }

define contact{
        contact_name    johndoe-pager
        use             generic-pager
        alias           John Doe Pager
        pager           8005555678@txt.att.net
        }

define contact{
        contact_name    maryjane-pager
        use             generic-pager
        alias           Mary Jane Pager
        pager           8005559876@txt.att.net
        }
Timeperiods

The /etc/nagios/objects/timeperiods.cfg file is fairly straight-forward. You can make adjustments to what is defined as "work hours" and leave it at that or add in some custom settings depending on your situation.

Templates

The file is also fairly straight-forward. You can use what is defined in there or make changes to them or add new templates as I have done. In case you spotted references to them in earlier examples, I will share some of the custom ones here (this is only part of the entire file)

/etc/nagios/objects/templates.cfg

Code: Select all

define host{
        name                            template-server-critical
        notifications_enabled           1
        check_period                    24x7
        check_interval                  5
        retry_interval                  1
        check_command                   check-host-alive
        max_check_attempts              10
        event_handler_enabled           1
        flap_detection_enabled          1
        failure_prediction_enabled      1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_period             24x7
        notification_interval           120
        notification_options            d,u,r
        register                        0
        }

        name                            template-server-non-critical
        notifications_enabled           1
        check_period                    workhours
        check_interval                  5
        retry_interval                  1
        check_command                   check-host-alive
        max_check_attempts              10
        event_handler_enabled           1
        flap_detection_enabled          1
        failure_prediction_enabled      1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_period             workhours
        notification_interval           120
        notification_options            d,u,r
        register                        0
        }

define host{
        name                    linux-server
        use                     template-server-critical
        contact_groups          linux-server-admins
        register                0
        }

define host{
        name                    esx-server
        use                     template-server-critical
        contact_groups          ibm-bladecenter-admins
        register                0
        }

define host{
        name                    toshiba-copier
        use                     template-server-non-critical
        notification_options    n
        contact_groups          copier-admins
        register                0
        }

define host{
        name                    wirelessap
        use                     template-server-non-critical
        notification_options    n
        contact_groups          network-admins
        register                0
        }

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Monitoring MySQL Server

Post: # 577Post LHammonds
Thu Feb 15, 2018 10:56 am

Monitoring MySQL Server

The script will be executed on the remote Linux server so we will be making use of NRPE.

On the remote MySQL server, install the Nagios plugins, NRPE server and NRPE plugin as mentioned earlier for remote Linux servers.

An extra step to allow the check_mysql plugin to work is to grant the nagios user access to a database. Rather than granting access to an existing database (for security reasons), let's create an empty database just for Nagios.

Type the following commands to create a nagios database, nagios user and read-only access to just the empty Nagios database:
mysql CREATE DATABASE nagiosdb; CREATE USER 'nagiosuser'@'%' IDENTIFIED BY 'nagiosuserpass'; GRANT SELECT ON nagiosdb.* TO 'nagiosuser'@'%'; FLUSH PRIVILEGES; exit
Now see if the command will run on your server (before trying to test them remotely on the Nagios server)
/usr/lib/nagios/plugins/check_mysql -w 20 -c 10 -d nagiosdb -u nagios -p nagiosuserpass
Add the plugin to the trusted NRPE commands to be executed.

Code: Select all

vi /etc/nagios/nrpe_local.cfg
command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -w 20 -c 10 -d nagiosdb -u nagios -p nagiosuserpass
Even though we are using a low-acces and read-only ID, the password is exposed in the config file so make sure the file ownership and permissions are set accordingly:

Code: Select all

chown root:nagios /etc/nagios/nrpe_local.cfg
chmod 0640 /etc/nagios/nrpe_local.cfg
The NRPE Server now needs to reload the configuration for the changes to take affect.

Code: Select all

/etc/init.d/nagios-nrpe-server reload
On the Nagios server, add the following command to the remote MySQL Linux server's configuration file:

/etc/nagios/servers/srv-mysql.cfg
define service{ use generic-service host_name srv-mysql service_description Server Health check_command check_mysql }
The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

/etc/init.d/nagios stop
/etc/init.d/nagios start

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Plugin - Check HTTPS

Post: # 578Post LHammonds
Thu Feb 15, 2018 11:08 am

Custom Plugin - Check HTTPS

On one of my Linux servers, I have a web mail service that I wanted to keep an eye on. However, the check_http did not work because the server only uses SSL (HTTPS) on port 443. I did not see a check_https command so I tried my hand at making one and it works like a champ.

Here is how I made and implemented custom HTTPS checking function.

The first thing was to create a script that would communicate to the server. We already have WGET installed as one of the prerequisite programs so I used that program. Here is what the script looks like:

/usr/local/nagios/libexec/check_https

Code: Select all

#!/bin/bash
###########################################
## Name         : check_https
## Version      : 1.0
## Date         : 2012-01-03
## Author       : LHammonds
## Purpose      : Check for response from HTTPS server
## Requirements : WGET
## Parameters   :
##    1 = Server IP Address (Required)
##    2 = Port Number (Optional)
## Exit Codes   :
##    0 = Success
##    1 = Failure
##    2 = Error, missing required parameter
###########################################
OUTFILE="/tmp/check_https_out.$$"
ERRFILE="/tmp/check_https_err.$$"
WGETCMD="$(which wget)"

## Do basic check on arguments passed to the script.
if [ "$1" = "" ]; then
  echo "Missing required parameter"
  exit 2
fi
if [ "$2" = "" ]; then
  ## Assume default port.
  SSLPORT="443"
else
  SSLPORT=$2
fi
${WGETCMD} --no-check-certificate --output-document=${OUTFILE} -S https://$1:${SSLPORT} 2> ${ERRFILE}
RETURNVALUE=$?
if [ ${RETURNVALUE} -eq 0 ];  then
  echo "HTTPS OK"
  EXITCODE=0
else
  echo "Connection refused. Code=${RETURNVALUE}"
  EXITCODE=1
fi
if [ -f ${OUTFILE} ]; then
  rm ${OUTFILE}
fi
if [ -f ${ERRFILE} ]; then
  rm ${ERRFILE}
fi
exit ${EXITCODE}
After creating the file, you need to set the correct ownership and permissions as follows:

Code: Select all

chown nagios:nagios /usr/local/nagios/libexec/check_https
chmod 0755 /usr/local/nagios/libexec/check_https
To test it out, run the command against a server running HTTPS and then against a server not running HTTPS. Example:

Code: Select all

/usr/local/nagios/libexec/check_https 192.168.107.25 443
Next, we add this script to the commands file.

Code: Select all

vi /etc/local/nagios/etc/objects/commands.cfg

Find the existing "check_http" command and you basically just copy the definition and add "s" to the end of http and remove the "-I" option.

Find this:

Code: Select all

define command{
        command_name     check_http
        command_line     $USER1$/check_http -I $HOSTADDRESS $ARG1$
        }
Change to this:

Code: Select all

define command{
       command_name     check_https
       command_line     $USER1$/check_https $HOSTADDRESS 443
       }
Now we can add a service to monitor HTTPS by adding the following to the server configuration file:

Code: Select all

define service{
       use                     generic-service
       host_name               srv-securewebserver
       service_description     web mail server
       check_command           check_https
       }

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Plugin - Check APT MotD

Post: # 579Post LHammonds
Thu Feb 15, 2018 11:08 am

Custom Plugin - Check APT MotD

Reference: Original source

This plugin is a bit different from the built-in APT check for Linux servers. This plugin was designed to give the same kind of messages that you get when you login to an Ubuntu console.

One thing this script will catch that the built-in APT will not is the "reboot required" state of the server.

The script will be executed on the remote Linux server so we will be making use of NRPE.

On the remote Linux server, create the script:

Code: Select all

touch /usr/lib/nagios/plugins/check_apt_motd.sh
chown root:root /usr/lib/nagios/plugins/check_apt_motd.sh
chmod 0755 /usr/lib/nagios/plugins/check_apt_motd.sh
vi /usr/lib/nagios/plugins/check_apt_motd.sh
/usr/lib/nagios/plugins/check_apt_motd.sh

Code: Select all

#!/bin/sh
#
# check_apt_packages - nagios plugin
#
# Checks for any packages to be applied
# Built for Ubuntu 10 (LTS), see following URL for further info
# - http://www.sandfordit.com/vwiki/index.php/Nagios#Ubuntu_Software_Updates_Monitor
#
# By Simon Strutt
# Version 1 - Jan 2012

# Include standard Nagios library
. /usr/lib/nagios/plugins/utils.sh || exit 3

if [ ! -f /usr/lib/update-notifier/apt-check ]; then
        exit $STATE_UNKNOWN
fi

APTRES=$(/usr/lib/update-notifier/apt-check 2>&1)
PKGS=$(echo $APTRES | cut -f1 -d';')
SEC=$(echo $APTRES | cut -f2 -d';')

if [ -f /var/run/reboot-required ]; then
        REBOOT=1
        TOAPPLY=`cat /var/run/reboot-required.pkgs`
else
        REBOOT=0
fi

if [ "${PKGS}" -eq 0 ]; then
        if [ "${REBOOT}" -eq 1 ]; then
                RET=$STATE_WARNING
                RESULT="Reboot required to apply ${TOAPPLY}"
        else
                RET=$STATE_OK
                RESULT="No packages to be updated"
        fi
elif [ "${SEC}" -eq 0 ]; then
        RET=$STATE_WARNING
        RESULT="${PKGS} packages to update (no security updates)"
else
        RET=$STATE_CRITICAL
        RESULT="${PKGS} packages (including ${SEC} security) packages to update"
fi

echo $RESULT
exit $RET
Test the script to see if it is working:

Code: Select all

/usr/lib/nagios/plugins/check_apt_motd.sh
The output should look something like one of these:

Code: Select all

Reboot required to apply libssl0.9.8
or

Code: Select all

1 packages to update (no security updates)
or

Code: Select all

No packages to be updated
Add the script to the trusted NRPE commands to be executed.

Code: Select all

vi /etc/nagios/nrpe_local.cfg

Code: Select all

command[check_apt_motd]=/usr/lib/nagios/plugins/check_apt_motd.sh
The NRPE Server now needs to reload the configuration for the changes to take affect.

Code: Select all

/etc/init.d/nagios-nrpe-server reload
On the Nagios server, add the following command to the remote Linux server's configuration file:

/etc/nagios/servers/srv-wiki.cfg
define service{ use generic-service host_name srv-wiki service_description APT Upgrade MotD check_command check_apt_motd }
The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

/etc/init.d/nagios stop
/etc/init.d/nagios start

User avatar
LHammonds
Site Admin
Site Admin
Posts: 670
Joined: Fri Jul 31, 2009 6:27 pm
Are you a filthy spam bot?: No
Location: Behind You
Contact:

Custom Plugin - Check ESXi Hardware

Post: # 580Post LHammonds
Thu Feb 15, 2018 11:09 am

Custom Plugin - Check ESXi Hardware

Reference: Original source

I use this custom script to check the health of my ESXi servers. It is run directly from the Nagios server.

This script requires the PyWBEM Python library. Here is how to install it:

Code: Select all

aptitude -y install python-pywbem
You then need to add a command to call the script. Edit /etc/nagios/objects/commands.cfg and add the following:

Code: Select all

# 'check_esxi_hardware' command definition
 
define command{
      command_name    check_esxi_hardware
      command_line    $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$ $ARG4$
      }
To access the ESXi data, you will need to supply and ID/password. The password can be placed in the "resources.cfg" file but let's make sure it is secured first.

Code: Select all

chmod 0600 /etc/nagios/resources.cfg
chown nagios:nagios /etc/nagios/resources.cfg
Edit /etc/nagios/resources.cfg and add the following:
# Password to access ESXi servers. $USER6$=your-esxi-password-here
To add this command to an ESXi configuration file, add the following to its config file:

/etc/nagios/servers/srv-esxi1.cfg
define service{ use generic-service host_name srv-esxi1 service_description Server Health check_command check_esxi_hardware!your-esxi-userid-here!$USER6$!ibm }
Now it is time to create the script:

Code: Select all

touch /usr/local/nagios/libexec/check_esxi_hardware.py
chown nagios:nagios /usr/local/nagios/libexec/check_esxi_hardware.py
chmod 0755 /usr/local/nagios/libexec/check_esxi_hardware.py
vi /usr/local/nagios/libexec/check_esxi_hardware.py
/usr/local/nagios/libexec/check_esxi_hardware.py

Code: Select all

#!/usr/bin/python
# -*- coding: UTF-8 -*-
#
# Script for checking global health of host running VMware ESX/ESXi
#
# Licence : GNU General Public Licence (GPL) http://www.gnu.org/
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301, USA.
#
# Pre-req : pywbem
#
# Copyright (c) 2008 David Ligeret
# Copyright (c) 2009 Joshua Daniel Franklin
# Copyright (c) 2010 Branden Schneider
# Copyright (c) 2010-2012 Claudio Kuenzler
# Copyright (c) 2010 Samir Ibradzic
# Copyright (c) 2010 Aaron Rogers
# Copyright (c) 2011 Ludovic Hutin
# Copyright (c) 2011 Carsten Schoene
# Copyright (c) 2011-2012 Phil Randal
# Copyright (c) 2011 Fredrik Aslund
# Copyright (c) 2011 Bertrand Jomin
# Copyright (c) 2011 Ian Chard
# Copyright (c) 2012 Craig Hart
#
# The VMware 4.1 CIM API is documented here:
#
#   http://www.vmware.com/support/developer/cim-sdk/4.1/smash/cim_smash_410_prog.pdf
#
#   http://www.vmware.com/support/developer/cim-sdk/smash/u2/ga/apirefdoc/
#
# This Nagios plugin is maintained here:
# http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php
#
#@---------------------------------------------------
#@ History
#@---------------------------------------------------
#@ Date   : 20080820
#@ Author : David Ligeret
#@ Reason : Initial release
#@---------------------------------------------------
#@ Date   : 20080821
#@ Author : David Ligeret
#@ Reason : Add verbose mode
#@---------------------------------------------------
#@ Date   : 20090219
#@ Author : Joshua Daniel Franklin
#@ Reason : Add try/except to catch AuthError and CIMError
#@---------------------------------------------------
#@ Date   : 20100202
#@ Author : Branden Schneider
#@ Reason : Added HP Support (HealthState)
#@---------------------------------------------------
#@ Date   : 20100512
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Combined different versions (Joshua and Branden)
#@ Reason : Added hardware type switch (dell or hp)
#@---------------------------------------------------
#@ Date   : 20100626/28
#@ Author : Samir Ibradzic www.brastel.com
#@ Reason : Added basic server info
#@ Reason : Wanted to have server name, serial number & bios version at output
#@ Reason : Set default return status to Unknown
#@---------------------------------------------------
#@ Date   : 20100702
#@ Author : Aaron Rogers www.cloudmark.com
#@ Reason : GlobalStatus was incorrectly getting (re)set to OK with every CIM element check
#@---------------------------------------------------
#@ Date   : 20100705
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Due to change 20100702 all Dell servers would return UNKNOWN instead of OK...
#@ Reason : ... so added Aaron's logic at the end of the Dell checks as well
#@---------------------------------------------------
#@ Date   : 20101028
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Changed text in Usage and Example so people dont forget to use https://
#@---------------------------------------------------
#@ Date   : 20110110
#@ Author : Ludovic Hutin (Idea and Coding) / Claudio Kuenzler (Bugfix)
#@ Reason : If Dell Blade Servers are used, Serial Number of Chassis was returned
#@---------------------------------------------------
#@ Date   : 20110207
#@ Author : Carsten Schoene carsten.schoene.cc
#@ Reason : Bugfix for Intel systems (in this case Intel SE7520) - use 'intel' as system type
#@---------------------------------------------------
#@ Date   : 20110215
#@ Author : Ludovic Hutin
#@ Reason : Plugin now catches Socket Error (Timeout Error) and added a timeout parameter
#@---------------------------------------------------
#@ Date   : 20110217/18
#@ Author : Ludovic Hutin / Tom Murphy
#@ Reason : Bugfix in Socket Error if clause
#@---------------------------------------------------
#@ Date   : 20110221
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Remove recently added Timeout due to incompabatility on Windows
#@ Reason : and changed name of plugin to check_esxi_hardware
#@---------------------------------------------------
#@ Date   : 20110426
#@ Author : Claudio Kuenzler www.claudiokuenzler.com
#@ Reason : Added 'ibm' hardware type (compatible to Dell output). Tested by Keith Erekson.
#@---------------------------------------------------
#@ Date   : 20110426
#@ Author : Phil Randal
#@ Reason : URLise Dell model and tag numbers (as in check_openmanage)
#@ Reason : Return performance data (as in check_openmanage, using similar names where possible)
#@ Reason : Minor code tidyup - use elementName instead of instance['ElementName']
#@---------------------------------------------------
#@ Date   : 20110428
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : If hardware type is specified as 'auto' try to autodetect vendor
#@ Reason : Return performance data for some HP models
#@ Reason : Indent 'verbose' output to make it easier to read
#@ Reason : Use OptionParser to give better parameter parsing (retaining compatability with original)
#@---------------------------------------------------
#@ Date   : 20110503
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Fix bug in HP Virtual Fan percentage output
#@ Reason : Slight code reorganisation
#@ Reason : Sort performance data
#@ Reason : Fix formatting of current output
#@---------------------------------------------------
#@ Date   : 20110504
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Minor code changes and documentation improvements
#@ Reason : Remove redundant mismatched ' character in performance data output
#@ Reason : Output non-integral values for all sensors to fix problem seen with system board voltage sensors
#@          on an IBM server (thanks to Attilio Drei for the sample output)
#@---------------------------------------------------
#@ Date   : 20110505
#@ Author : Fredrik Aslund
#@ Reason : Added possibility to use first line of a file as password (file:)
#@---------------------------------------------------
#@ Date   : 20110505
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Simplfy 'verboseoutput' to use 'verbose' as global variable instead of as parameter
#@ Reason : Don't look at performance data from CIM_NumericSensor if we're not using it
#@ Reason : Add --no-power, --no-volts, --no-current, --no-temp, and --no-fan options
#@---------------------------------------------------
#@ Date   : 20110506
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Reinstate timeouts with --timeout parameter (but not on Windows)
#@ Reason : Allow file:passwordfile in old-style arguments too
#@---------------------------------------------------
#@ Date   : 20110507
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : On error, include numeric sensor value in output
#@---------------------------------------------------
#@ Date   : 20110520
#@ Author : Bertrand Jomin
#@ Reason : Plugin had problems to handle some S/N from IBM Blade Servers
#@---------------------------------------------------
#@ Date   : 20110614
#@ Author : Claudio Kuenzler (www.claudiokuenzler.com)
#@ Reason : Rewrote file handling and file can now be used for user AND password
#@---------------------------------------------------
#@ Date   : 20111003
#@ Author : Ian Chard (ian@chard.org)
#@ Reason : Allow a list of unwanted elements to be specified, which is useful
#@          in cases where hardware isn't well supported by ESXi
#@---------------------------------------------------
#@ Date   : 20120402
#@ Author : Claudio Kuenzler (www.claudiokuenzler.com)
#@ Reason : Making plugin GPL compatible (Copyright) and preparing for OpenBSD port
#@---------------------------------------------------
#@ Date   : 20120405
#@ Author : Phil Randal (phil.randal@gmail.com)
#@ Reason : Fix lookup of warranty info for Dell
#@---------------------------------------------------
#@ Date   : 20120501
#@ Author : Craig Hart
#@ Reason : Bugfix in manufacturer discovery when cim entry not found or empty
#@---------------------------------------------------
 
 
import sys
import time
import pywbem
import re
import string
from optparse import OptionParser,OptionGroup
 
version = '20120501'
 
NS = 'root/cimv2'
 
# define classes to check 'OperationStatus' instance
ClassesToCheck = [
  'OMC_SMASHFirmwareIdentity',
  'CIM_Chassis',
  'CIM_Card',
  'CIM_ComputerSystem',
  'CIM_NumericSensor',
  'CIM_Memory',
  'CIM_Processor',
  'CIM_RecordLog',
  'OMC_DiscreteSensor',
  'OMC_Fan',
  'OMC_PowerSupply',
  'VMware_StorageExtent',
  'VMware_Controller',
  'VMware_StorageVolume',
  'VMware_Battery',
  'VMware_SASSATAPort'
]
 
sensor_Type = {
  0:'unknown',
  1:'Other',
  2:'Temperature',
  3:'Voltage',
  4:'Current',
  5:'Tachometer',
  6:'Counter',
  7:'Switch',
  8:'Lock',
  9:'Humidity',
  10:'Smoke Detection',
  11:'Presence',
  12:'Air Flow',
  13:'Power Consumption',
  14:'Power Production',
  15:'Pressure',
  16:'Intrusion',
  32768:'DMTF Reserved',
  65535:'Vendor Reserved'
}
 
data = []
 
perf_Prefix = {
  1:'Pow',
  2:'Vol',
  3:'Cur',
  4:'Tem',
  5:'Fan',
  6:'FanP'
}
 
 
# parameters
 
# host name
hostname=''
 
# user
user=''
 
# password
password=''
 
# vendor - possible values are 'unknown', 'auto', 'dell', 'hp', 'ibm', 'intel'
vendor='unknown'
 
# verbose
verbose=False
 
# Produce performance data output for nagios
perfdata=False
 
# timeout
timeout = 0
 
# elements to ignore (full SEL, broken BIOS, etc)
ignore_list=[]
 
# urlise model and tag numbers (currently only Dell supported, but the code does the right thing for other vendors)
urlise_country=''
 
# collect perfdata for each category
get_power   = True
get_volts   = True
get_current = True
get_temp    = True
get_fan     = True
 
# define exit codes
ExitOK = 0
ExitWarning = 1
ExitCritical = 2
ExitUnknown = 3
 
def urlised_server_info(vendor, country, server_info):
  #server_inf = server_info
  if vendor == 'dell' :
    # Dell support URLs (idea and tables borrowed from check_openmanage)
    du = 'http://support.dell.com/support/edocs/systems/pe'
    if (server_info is not None) :
      p=re.match('(.*)PowerEdge (.*) (.*)',server_info)
      if (p is not None) :
        md=p.group(2)
        if (re.match('M',md)) :
          md = 'm'
        server_info = p.group(1) + '<a href="' + du + md + '/">PowerEdge ' + p.group(2)+'</a> ' + p.group(3)
  elif vendor == 'hp':
    return server_info
  elif vendor == 'ibm':
    return server_info
  elif vendor == 'intel':
    return server_info
 
  return server_info
 
# ----------------------------------------------------------------------
 
def system_tag_url(vendor,country):
  url = {'xx':''}
  if vendor == 'dell':
    # Dell support sites
    supportsite = 'http://www.dell.com/support/troubleshooting/'
    dellsuffix = 'nodhs1/Index?t=warranty&servicetag='
 
    # warranty URLs for different country codes
    # EMEA
    url['at'] = supportsite + 'at/de/' + dellsuffix  # Austria
    url['be'] = supportsite + 'be/nl/' + dellsuffix  # Belgium
    url['cz'] = supportsite + 'cz/cs/' + dellsuffix  # Czech Republic
    url['de'] = supportsite + 'de/de/' + dellsuffix  # Germany
    url['dk'] = supportsite + 'dk/da/' + dellsuffix  # Denmark
    url['es'] = supportsite + 'es/es/' + dellsuffix  # Spain
    url['fi'] = supportsite + 'fi/fi/' + dellsuffix  # Finland
    url['fr'] = supportsite + 'fr/fr/' + dellsuffix  # France
    url['gr'] = supportsite + 'gr/en/' + dellsuffix  # Greece
    url['it'] = supportsite + 'it/it/' + dellsuffix  # Italy
    url['il'] = supportsite + 'il/en/' + dellsuffix  # Israel
    url['me'] = supportsite + 'me/en/' + dellsuffix  # Middle East
    url['no'] = supportsite + 'no/no/' + dellsuffix  # Norway
    url['nl'] = supportsite + 'nl/nl/' + dellsuffix  # The Netherlands
    url['pl'] = supportsite + 'pl/pl/' + dellsuffix  # Poland
    url['pt'] = supportsite + 'pt/en/' + dellsuffix  # Portugal
    url['ru'] = supportsite + 'ru/ru/' + dellsuffix  # Russia
    url['se'] = supportsite + 'se/sv/' + dellsuffix  # Sweden
    url['uk'] = supportsite + 'uk/en/' + dellsuffix  # United Kingdom
    url['za'] = supportsite + 'za/en/' + dellsuffix  # South Africa
    # America
    url['br'] = supportsite + 'br/pt/' + dellsuffix  # Brazil
    url['ca'] = supportsite + 'ca/en/' + dellsuffix  # Canada
    url['mx'] = supportsite + 'mx/es/' + dellsuffix  # Mexico
    url['us'] = supportsite + 'us/en/' + dellsuffix  # USA
    # Asia/Pacific
    url['au'] = supportsite + 'au/en/' + dellsuffix  # Australia
    url['cn'] = supportsite + 'cn/zh/' + dellsuffix  # China
    url['in'] = supportsite + 'in/en/' + dellsuffix  # India
    # default fallback
    url['xx'] = supportsite + 'us/en/' + dellsuffix  # default
  # elif vendor == 'hp':
  # elif vendor == 'ibm':
  # elif vendor == 'intel':
 
  return url.get(country,url['xx'])
 
# ----------------------------------------------------------------------
 
def urlised_serialnumber(vendor,country,SerialNumber):
  if SerialNumber is not None :
    tu = system_tag_url(vendor,country)
    if tu != '' :
      SerialNumber = '<a href="' + tu + SerialNumber + '">' + SerialNumber + '</a>'
  return SerialNumber
 
# ----------------------------------------------------------------------
 
def verboseoutput(message) :
  if verbose:
    print "%s %s" % (time.strftime("%Y%m%d %H:%M:%S"), message)
 
# ----------------------------------------------------------------------
 
def getopts() :
  global hosturl,user,password,vendor,verbose,perfdata,urlise_country,timeout,ignore_list,get_power,get_volts,get_current,get_temp,get_fan
  usage = "usage: %prog  https://hostname user password system [verbose]\n" \
    "example: %prog https://my-shiny-new-vmware-server root fakepassword dell\n\n" \
    "or, using new style options:\n\n" \
    "usage: %prog -H hostname -U username -P password [-V system -v -p -I XX]\n" \
    "example: %prog -H my-shiny-new-vmware-server -U root -P fakepassword -V auto -I uk\n\n" \
    "or, verbosely:\n\n" \
    "usage: %prog --host=hostname --user=username --pass=password [--vendor=system --verbose --perfdata --html=XX]\n"
 
  parser = OptionParser(usage=usage, version="%prog "+version)
  group1 = OptionGroup(parser, 'Mandatory parameters')
  group2 = OptionGroup(parser, 'Optional parameters')
 
  group1.add_option("-H", "--host", dest="host", help="report on HOST", metavar="HOST")
  group1.add_option("-U", "--user", dest="user", help="user to connect as", metavar="USER")
  group1.add_option("-P", "--pass", dest="password", \
      help="password, if password matches file:<path>, first line of given file will be used as password", metavar="PASS")
 
  group2.add_option("-V", "--vendor", dest="vendor", help="Vendor code: auto, dell, hp, ibm, intel, or unknown (default)", \
      metavar="VENDOR", type='choice', choices=['auto','dell','hp','ibm','intel','unknown'],default="unknown")
  group2.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, \
      help="print status messages to stdout (default is to be quiet)")
  group2.add_option("-p", "--perfdata", action="store_true", dest="perfdata", default=False, \
      help="collect performance data for pnp4nagios (default is not to)")
  group2.add_option("-I", "--html", dest="urlise_country", default="", \
      help="generate html links for country XX (default is not to)", metavar="XX")
  group2.add_option("-t", "--timeout", action="store", type="int", dest="timeout", default=0, \
      help="timeout in seconds - no effect on Windows (default = no timeout)")
  group2.add_option("-i", "--ignore", action="store", type="string", dest="ignore", default="", \
      help="comma-separated list of elements to ignore")
  group2.add_option("--no-power", action="store_false", dest="get_power", default=True, \
      help="don't collect power performance data")
  group2.add_option("--no-volts", action="store_false", dest="get_volts", default=True, \
      help="don't collect voltage performance data")
  group2.add_option("--no-current", action="store_false", dest="get_current", default=True, \
      help="don't collect current performance data")
  group2.add_option("--no-temp", action="store_false", dest="get_temp", default=True, \
      help="don't collect temperature performance data")
  group2.add_option("--no-fan", action="store_false", dest="get_fan", default=True, \
      help="don't collect fan performance data")
 
  parser.add_option_group(group1)
  parser.add_option_group(group2)
 
  # check input arguments
  if len(sys.argv) < 2:
    print "no parameters specified\n"
    parser.print_help()
    sys.exit(-1)
  # if first argument starts with 'https://' we have old-style parameters, so handle in old way
  if re.match("https://",sys.argv[1]):
    # check input arguments
    if len(sys.argv) < 5:
      print "too few parameters\n"
      parser.print_help()
      sys.exit(-1)
    if len(sys.argv) > 5 :
      if sys.argv[5] == "verbose" :
        verbose = True
    hosturl = sys.argv[1]
    user = sys.argv[2]
    password = sys.argv[3]
    vendor = sys.argv[4]
  else:
    # we're dealing with new-style parameters, so go get them!
    (options, args) = parser.parse_args()
 
    # Making sure all mandatory options appeared.
    mandatories = ['host', 'user', 'password']
    for m in mandatories:
      if not options.__dict__[m]:
        print "mandatory parameter '--" + m + "' is missing\n"
        parser.print_help()
        sys.exit(-1)
 
    hostname=options.host.lower()
    # if user has put "https://" in front of hostname out of habit, do the right thing
    # hosturl will end up as https://hostname
    if re.match('^https://',hostname):
      hosturl = hostname
    else:
      hosturl = 'https://' + hostname
 
    user=options.user
    password=options.password
    vendor=options.vendor.lower()
    verbose=options.verbose
    perfdata=options.perfdata
    urlise_country=options.urlise_country.lower()
    timeout=options.timeout
    ignore_list=options.ignore.split(',')
    get_power=options.get_power
    get_volts=options.get_volts
    get_current=options.get_current
    get_temp=options.get_temp
    get_fan=options.get_fan
 
  # if user or password starts with 'file:', use the first string in file as user, second as password
  if (re.match('^file:', user) or re.match('^file:', password)):
        if re.match('^file:', user):
          filextract = re.sub('^file:', '', user)
          filename = open(filextract, 'r')
          filetext = filename.readline().split()
          user = filetext[0]
          password = filetext[1]
          filename.close()
        elif re.match('^file:', password):
          filextract = re.sub('^file:', '', password)
          filename = open(filextract, 'r')
          filetext = filename.readline().split()
          password = filetext[0]
          filename.close()
 
# ----------------------------------------------------------------------
 
getopts()
 
# if running on Windows, don't use timeouts and signal.alarm
on_windows = True
os_platform = sys.platform
if os_platform != "win32":
  on_windows = False
  import signal
  def handler(signum, frame):
    print 'CRITICAL: Execution time too long!'
    sys.exit(ExitCritical)
 
# connection to host
verboseoutput("Connection to "+hosturl)
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), NS)
 
# Add a timeout for the script. When using with Nagios, the Nagios timeout cannot be < than plugin timeout.
if on_windows == False and timeout > 0:
  signal.signal(signal.SIGALRM, handler)
  signal.alarm(timeout)
 
# run the check for each defined class
GlobalStatus = ExitUnknown
server_info = ""
bios_info = ""
SerialNumber = ""
ExitMsg = ""
 
# if vendor is specified as 'auto', try to get vendor from CIM
# note: the default vendor is 'unknown'
if vendor=='auto':
  c=wbemclient.EnumerateInstances('CIM_Chassis')
  man=c[0][u'Manufacturer']
  if re.match("Dell",man):
    vendor="dell"
  elif re.match("HP",man):
    vendor="hp"
  elif re.match("IBM",man):
    vendor="ibm"
  elif re.match("Intel",man):
    vendor="intel"
  else:
    vendor='unknown'
 
for classe in ClassesToCheck :
  verboseoutput("Check classe "+classe)
  try:
    instance_list = wbemclient.EnumerateInstances(classe)
  except pywbem.cim_operations.CIMError,args:
    if ( args[1].find('Socket error') >= 0 ):
      print "CRITICAL: %s" %args
      sys.exit (ExitCritical)
    else:
      verboseoutput("Unknown CIM Error: %s" % args)
  except pywbem.cim_http.AuthError,arg:
    verboseoutput("Global exit set to CRITICAL")
    GlobalStatus = ExitCritical
    ExitMsg = " : Authentication Error! "
  else:
    # GlobalStatus = ExitOK #ARR
    for instance in instance_list :
      sensor_value = ""
      elementName = instance['ElementName']
      elementNameValue = elementName
      verboseoutput("  Element Name = "+elementName)
 
      # Ignore element if we don't want it
      if elementName in ignore_list :
        verboseoutput("    (ignored)")
        continue
 
      # BIOS & Server info
      if elementName == 'System BIOS' :
        bios_info =     instance[u'Name'] + ': ' \
            + instance[u'VersionString'] + ' ' \
            + str(instance[u'ReleaseDate'].datetime.date())
        verboseoutput("    VersionString = "+instance[u'VersionString'])
 
      elif elementName == 'Chassis' :
        man = instance[u'Manufacturer']
    if man is None :
      man = 'Unknown Manufacturer'
        verboseoutput("    Manufacturer = "+man)
        SerialNumber = instance[u'SerialNumber']
        if SerialNumber:
          verboseoutput("    SerialNumber = "+SerialNumber)
        server_info = man + ' '
        if vendor != 'intel':
          model = instance[u'Model']
          if model:
            verboseoutput("    Model = "+model)
            server_info +=  model + ' s/n:'
 
      elif elementName == 'Server Blade' :
        SerialNumber = instance[u'SerialNumber']
        if SerialNumber:
          verboseoutput("    SerialNumber = "+SerialNumber)
 
      # Report detail of Numeric Sensors and generate nagios perfdata
 
      if classe == "CIM_NumericSensor" :
        sensorType = instance[u'sensorType']
        sensStr = sensor_Type.get(sensorType,"Unknown")
        if sensorType:
          verboseoutput("    sensorType = %d - %s" % (sensorType,sensStr))
        units = instance[u'BaseUnits']
        if units:
          verboseoutput("    BaseUnits = %d" % units)
        # grab some of these values for Nagios performance data
        scale = 10**instance[u'UnitModifier']
        verboseoutput("    Scaled by = %f " % scale)
        cr = int(instance[u'CurrentReading'])*scale
        verboseoutput("    Current Reading = %f" % cr)
        elementNameValue = "%s: %g" % (elementName,cr)
        ltnc = 0
        utnc = 0
        ltc  = 0
        utc  = 0
        if instance[u'LowerThresholdNonCritical'] is not None:
          ltnc = instance[u'LowerThresholdNonCritical']*scale
          verboseoutput("    Lower Threshold Non Critical = %f" % ltnc)
        if instance[u'UpperThresholdNonCritical'] is not None:
          utnc = instance[u'UpperThresholdNonCritical']*scale
          verboseoutput("    Upper Threshold Non Critical = %f" % utnc)
        if instance[u'LowerThresholdCritical'] is not None:
          ltc = instance[u'LowerThresholdCritical']*scale
          verboseoutput("    Lower Threshold Critical = %f" % ltc)
        if instance[u'UpperThresholdCritical'] is not None:
          utc = instance[u'UpperThresholdCritical']*scale
          verboseoutput("    Upper Threshold Critical = %f" % utc)
        #
        if perfdata:
          perf_el = elementName.replace(' ','_')
 
          # Power and Current
          if sensorType == 4:               # Current or Power Consumption
            if units == 7:            # Watts
              if get_power:
                data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),1) )
            elif units == 6:          # Current
              if get_current:
                data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),3) )
 
          # PSU Voltage
          elif sensorType == 3:               # Voltage
            if get_volts:
              data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),2) )
 
          # Temperatures
          elif sensorType == 2:               # Temperature
            if get_temp:
              data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),4) )
 
          # Fan speeds
          elif sensorType == 5:               # Tachometer
            if get_fan:
              if units == 65:           # percentage
                data.append( ("%s=%g%%;%g;%g " % (perf_el, cr, utnc, utc),6) )
              else:
                data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),5) )
 
      elif classe == "CIM_Processor" :
        verboseoutput("    Family = %d" % instance['Family'])
        verboseoutput("    CurrentClockSpeed = %dMHz" % instance['CurrentClockSpeed'])
 
 
      # HP Check
      if vendor == "hp" :
        if instance['HealthState'] is not None :
          elementStatus = instance['HealthState']
          verboseoutput("    Element HealthState = %d" % elementStatus)
          interpretStatus = {
            0  : ExitOK,    # Unknown
            5  : ExitOK,    # OK
            10 : ExitWarning,  # Degraded
            15 : ExitWarning,  # Minor
            20 : ExitCritical,  # Major
            25 : ExitCritical,  # Critical
            30 : ExitCritical,  # Non-recoverable Error
          }[elementStatus]
          if (interpretStatus == ExitCritical) :
            verboseoutput("GLobal exit set to CRITICAL")
            GlobalStatus = ExitCritical
            ExitMsg += " CRITICAL : %s " % elementNameValue
          if (interpretStatus == ExitWarning and GlobalStatus != ExitCritical) :
            verboseoutput("GLobal exit set to WARNING")
            GlobalStatus = ExitWarning
            ExitMsg += " WARNING : %s " % elementNameValue
          # Added the following for when GlobalStatus is ExitCritical and a warning is detected
          # This way the ExitMsg gets added but GlobalStatus isn't changed
          if (interpretStatus == ExitWarning and GlobalStatus == ExitCritical) : # ARR
            ExitMsg += " WARNING : %s " % elementNameValue #ARR
          # Added the following so that GlobalStatus gets set to OK if there's no warning or critical
          if (interpretStatus == ExitOK and GlobalStatus != ExitWarning and GlobalStatus != ExitCritical) : #ARR
            GlobalStatus = ExitOK #ARR
 
 
 
      # Dell, Intel, IBM and unknown hardware check
      elif (vendor == "dell" or vendor == "intel" or vendor == "ibm" or vendor=="unknown") :
        if instance['OperationalStatus'] is not None :
          elementStatus = instance['OperationalStatus'][0]
          verboseoutput("    Element Op Status = %d" % elementStatus)
          interpretStatus = {
            0  : ExitOK,            # Unknown
            1  : ExitCritical,      # Other
            2  : ExitOK,            # OK
            3  : ExitWarning,       # Degraded
            4  : ExitWarning,       # Stressed
            5  : ExitWarning,       # Predictive Failure
            6  : ExitCritical,      # Error
            7  : ExitCritical,      # Non-Recoverable Error
            8  : ExitWarning,       # Starting
            9  : ExitWarning,       # Stopping
            10 : ExitCritical,      # Stopped
            11 : ExitOK,            # In Service
            12 : ExitWarning,       # No Contact
            13 : ExitCritical,      # Lost Communication
            14 : ExitCritical,      # Aborted
            15 : ExitOK,            # Dormant
            16 : ExitCritical,      # Supporting Entity in Error
            17 : ExitOK,            # Completed
            18 : ExitOK,            # Power Mode
            19 : ExitOK,            # DMTF Reserved
            20 : ExitOK             # Vendor Reserved
          }[elementStatus]
          if (interpretStatus == ExitCritical) :
            verboseoutput("Global exit set to CRITICAL")
            GlobalStatus = ExitCritical
            ExitMsg += " CRITICAL : %s " % elementNameValue
          if (interpretStatus == ExitWarning and GlobalStatus != ExitCritical) :
            verboseoutput("GLobal exit set to WARNING")
            GlobalStatus = ExitWarning
            ExitMsg += " WARNING : %s " % elementNameValue
          # Added same logic as in 20100702 here, otherwise Dell servers would return UNKNOWN instead of OK
          if (interpretStatus == ExitWarning and GlobalStatus == ExitCritical) : # ARR
            ExitMsg += " WARNING : %s " % elementNameValue #ARR
          if (interpretStatus == ExitOK and GlobalStatus != ExitWarning and GlobalStatus != ExitCritical) : #ARR
            GlobalStatus = ExitOK #ARR
        if elementName == 'Server Blade' :
                if SerialNumber :
                        if SerialNumber.find(".") != -1 :
                                SerialNumber = SerialNumber.split('.')[1]
 
 
# Munge the ouptput to give links to documentation and warranty info
if (urlise_country != '') :
  SerialNumber = urlised_serialnumber(vendor,urlise_country,SerialNumber)
  server_info = urlised_server_info(vendor,urlise_country,server_info)
 
# Output performance data
perf = '|'
if perfdata:
  sdata=[]
  ctr=[0,0,0,0,0,0,0]
  # sort the data so we always get perfdata in the right order
  # we make no assumptions about the order in which CIM returns data
  # first sort by element name (effectively) and insert sequence numbers
  for p in sorted(data):
    p1 = p[1]
    sdata.append( ("P%d%s_%d_%s") % (p1,perf_Prefix[p1], ctr[p1], p[0]) )
    ctr[p1] += 1
  # then sort perfdata into groups and output perfdata string
  for p in sorted(sdata):
    perf += p
 
# sanitise perfdata - don't output "|" if nothing to report
if perf == '|':
  perf = ''
 
if GlobalStatus == ExitOK :
  print "OK - Server: %s %s %s%s" % (server_info, SerialNumber, bios_info, perf)
 
elif GlobalStatus == ExitUnknown :
  print "UNKNOWN: %s" % (ExitMsg) #ARR
 
else:
  print "%s- Server: %s %s %s%s" % (ExitMsg, server_info, SerialNumber, bios_info, perf)
 
sys.exit (GlobalStatus)
Now test the script to make sure it works.

Code: Select all

/usr/local/nagios/libexec/check_esxi_hardware.py -H 192.168.107.44 -U esxiuser -P esxipassword -V ibm
The final step is to verify that nothing is broken in the configuration:

Code: Select all

/etc/nagios/verify.sh
If there were no errors or warnings, restart Nagios to load the new configuration:

Code: Select all

/etc/init.d/nagios stop
/etc/init.d/nagios start

Post Reply