sobota, 6 października 2012

NAGIOS - VMware ESXi


This article describes how to monitor a VMWare ESXi or vSphere host with Nagios, using the OP5 Check ESX Plugin written in PERL. The plugin can monitor either a single ESXi/vSphere server or a VirtualCenter/vCenter Server and individual virtual machines. We’ll see here how to monitor an ESXi 4 host.
The following tutorial has been made on a CentOS server, you may have to adapt some paths with other distributions.

Installation

The prerequisite for the plugin to work is to install the VMWare Perl SDK available on the manufacturer website.
Download the file on your server, for example in the root directory, untar it and run the installer that way :
# cd /root
# tar xvzf VMware-vSphere-Perl-SDK-4.1.0-254719.i386.tar.gz 
# cd vmware-vsphere-cli-distrib/
# ./vmware-install.pl
Follow the instructions given by the script. Depending on your setup, some PERL dependencies must be installed prior for the SDK to work correctly. When it’s done, we can get the plugin here, and copy it to/usr/lib/nagios/plugins/. Make it executable :
# cd /usr/lib/nagios/plugins/
# chmod a+x check_esx

Configuration

Now, we can start the real configuration for Nagios. We’ll need a username and password to access the ESXi host, let’s define those Nagios variables in a safe place in /etc/nagios/resource.cfg, so that this information will be hidden from the CGIs :
$USER11$=username
$USER12$=password
In this tutorial, we’ll be monitoring these resources : CPU, memory usage, net usage, runtime status and IO/read/write. But some more are available, see the references here. Below are the new commands related to ESXi to add in the /etc/nagios/objects/command.cfg file (these are the ESXi related commands only, NOT the full command.cfg, you may append this at the end of the file) :
# check vmware esxi machine
# check cpu
define command{
        command_name check_esx_cpu
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l cpu -s usage -w $ARG1$ -c $ARG2$
        }
 
# check memory usage
define command{
        command_name check_esx_mem
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l mem -s usage -w $ARG1$ -c $ARG2$
        }
 
# check net usage
define command{
        command_name check_esx_net
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l net -s usage -w $ARG1$ -c $ARG2$
        }
 
# check runtime status
define command{
        command_name check_esx_runtime
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l runtime -s status
        }
 
# check io read
define command{
        command_name check_esx_ioread
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s read -w $ARG1$ -c $ARG2$
        }
 
# check io write
define command{
        command_name check_esx_iowrite
        command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s write -w $ARG1$ -c $ARG2$
        }
And an example of the configuration for a Nagios host called esxi01 in /etc/nagios/hosts/esxi01.cfg :
# Host esx01
define host{
        use                     linux-server
        host_name               esxi01
        alias                   VMWare ESXi 01
        address                 192.168.1.100
        }
 
# Define a service to "ping" the local machine
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        }
 
# VMWare
# check cpu
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi CPU Load
        check_command                   check_esx_cpu!80!90
        }
 
# check memory usage
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi Memory usage
        check_command                   check_esx_mem!80!90
        }
 
# check net
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi Network usage
        check_command                   check_esx_net!102400!204800
        }
 
# check runtime status
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi Runtime status
        check_command                   check_esx_runtime
        }
 
# check io read
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi IO read
        check_command                   check_esx_ioread!40!90
        }
 
# check io write
define service{
        use                             generic-service
        host_name                       esxi01
        service_description             ESXi IO write
        check_command                   check_esx_iowrite!40!90
        }
It’s done. Restart Nagios and wait a while (or re-schedule) for the new resources to be monitored.

Brak komentarzy:

Prześlij komentarz