Kyle Brandt

Original computing and productivity articles by a Linux administrator

Archive for the ‘Nagios’ tag

Book Review: Learning Nagios 3.0

with one comment

Wojciech Kocjan’s Learning Nagios 3.0 is a clear and gentle introduction that takes readers through the basics and introduces them to the more advanced topics of infrastructure monitoring with Nagios. The writing has a good balance of concise technical information while also providing thorough examples in a tutorial style. This keeps the book from being too dense while not being so repetitive that it comes off as condescending.

The installation and initial configuration are covered together in the same chapter. The installation instructions are thorough; different package managers as well as compiling from source code are included. There are also troubleshooting instructions that cover common mistakes that people tend to encounter when first installing Nagios. One troubleshooting detail that the author neglected to include is a short CPAN tutorial. This would be useful because when standard plug-ins are missing necessary Perl dependencies, CPAN can be used to install them. The configuration of Nagios involves an inheritance engine that can often lead to a high level of complexity. This book includes illustrations for this and many other concepts that are more easily understood visually, and each illustration is explained well.

The more advanced topics covered include distributed monitoring, automated responses to problems (event handlers), and options to reduce the performance impact that monitoring can have. These chapters have inspiring introductions to taking Nagios to a level beyond just a mechanism for the notification of problems. For example, event handlers can be created to restart services that have failed automatically. Also, something such as configuring Nagios to escalate issues to certain people can improve the organization of an IT administration team. The book also explores different organizational styles for configuration files.

Probably the most universal monitoring protocol is Simple Network Management Protocol. This book has one of the clearest explanations of SNMP I have read, as well as a very clear explanation of how to use SNMP with Nagios. I would recommend this chapter to anyone looking for a good SNMP introduction, even if Nagios is not the primary interest.

The one chapter I felt was lacking in thoroughness was ‘Extending Nagios,’ which gets into writing your own plug-ins. The first simple example is a thirty line Python script, but an effective Nagios plug-in can be a shell script that is only a few lines. Also there are standards to writing Nagios plug-ins (see http://nagiosplug.sourceforge.net/developer-guidelines.html) which are discussed in Chapter 4, but this part is glossed over.

Overall I found this to be a well written and informative book that guides an administrator through Nagios with more clarity than Nagios’ own documentation does.

Written by Kyle

January 13th, 2009 at 10:17 am

Posted in Linux

Tagged with ,

OpenVZ Bean Counters Nagios Script

with one comment

“OpenVZ is container-based virtualization for Linux. OpenVZ creates multiple secure, isolated containers (otherwise known as VEs or VPSs) on a single physical server enabling better server utilization and ensuring that applications do not conflict.”

For each of these containers or VEs, there are resource limits. The psuedo file system, /proc, tracks various process and kernel information. The OpenVZ kernel provides the file /proc/user_beancounters that tells us if any of these limits have been reached (amoung other information). This is important because a process may fail to start (i.e. tomcat) if the limits have been reached. I wrote a script in python designed to be executed on the OpenVZ host machine by Nagios.

The script parses /proc/user_beancounters and will exit with appropriate Nagios exit status if one of these limits has been reached. If you don’t want to run this script as root, I recommend compiling a shell script with shc to copy the bean_counters file, own it as a unprivilaged user, and then make that a setuid root script (Linux won’t usually allow setuid shell scripts, which is why shc can be used to compile it. Does anyone think if the script only copys the file to tmp that this might be dangerious?). This is what the script expects with its current configuration. The script is easy to modify to make it check for other parameters besides the fail count (failcnt) as well.

You can get the script here: nagios_vz_bean.py

Written by Kyle

October 28th, 2008 at 4:54 am

Posted in Linux, Networking, Python, Scripting

Tagged with ,