Archive for the ‘Scripting’ Category
Debuging a script that parses /proc/net/dev
A Intermittent Problem:
I wrote a Perl script for Nagios that would figure out the bandwidth of an interface by parsing TX (transmit) and RX (receive) bytes from /proc/net/dev. The proc file system is a virtual file system that provides the ability to view various kernel statistics as well as modify some kernel parameters. My script parses the file twice at a specified interval, and then subtracts the old value from the new value to return bytes per second. I realized that this wasn’t the most accurate method, but it was good enough for my purposes and I didn’t have to install snmp. Also, the larger the interval, the smaller the error would generally be assuming light load.
The problem was that this script would fail every so often with ‘Not numeric subtraction’. So I started saving snapshots of /proc/net/dev and noticed that the script would fail after when the values were around 4 billion something. This I knew to be in the neighborhood of 2^32 (The max of a positive only 32-bit integer value). To confirm my thoughts that this was the max value for this counter, I decided to have a poke around the kernel source code.
Into the Kernel:
I didn’t know where to look in the source for this, but /proc/net/dev has the string ‘Inter-|’ which I figured would be a unique enough string to give me a place to start. Sure enough, a recursive grep for this string returned only 3 lines of code. The function I wanted was dev_seq_printf_stats in dev/core/dev.c:
static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
{
struct net_device_stats *stats = dev->get_stats(dev);
seq_printf(seq, "%6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu "
"%8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n",
dev->name, stats->rx_bytes, stats->rx_packets,
stats->rx_errors,
///.....
Looking at the printf specifiers for this they were %ul — unsigned long integer, which on my system was indeed a max of 4294967295 ( 32^2 – 1). I wanted to be extra sure, so I traced the net_device_stats struct to include/linux/netdevice.h and confirmed that the net_device_stats->rx_bytes member was in fact an unsigned long integer. So now I knew the error happened when the counter maxed out and then reset to zero, but why a non-numeric subtraction error?
Problem Found:
%8lu as a ANSI C standard library printf specifier defaults to 8 characters wide, and also defaults to right justify since there is no hyphen flag. To find out if the kernel did the same I traced seq_printf to lib/vsprintf.c and saw that the Linux kernel version formatted this in the same way. When the bytes value was less than 8 characters long, there was leading white space that threw off my parser. All I needed was to add the extra line at line 9 to eliminate any leading whitespace:
sub parseBandwidth {
my $interface = shift;
my @ifconfigOutput = @_;
foreach my $line (@ifconfigOutput) {
if ( $line =~ /:/ ) {
my @interfaceLine = split( /:/, $line);
if ($interfaceLine[0] =~ /$interface/) {
# Next line is to sanitize leading whitespace
$interfaceLine[1] =~ s/^\s+//;
my @interfaceStats = split( /\s+/, $interfaceLine[1] );
print( LOG "DEBUG I have parsed out: @interfaceStats\n") if $debug;
return @interfaceStats;
}
}
}
}
How to Cross Compile the Bash shell for Android 1.5
Introduction:
I just got a new G1 Android phone, and since it runs Linux I just had to get the Bash shell running on it, the built in shell would just not do. I do need my tab completion after all. Cross compilation is the process of compiling software on one platform that is meant to run on another. With the following an ARM executable is compiled on a x86 Linux machine.
Requirements (Not sure if all this is needed, but it is what I used):
- Cupcake 1.5 JS Build with root access: http://androidandme.com/2009/05/guides/beginners-guide-for-rooting-your-android-g1-to-install-cupcake/
- ARM Toolkit (The Cross Compiler): http://www.codesourcery.com/gnu_toolchains/arm/download.html
- Android SDK installed on Linux: http://developer.android.com/sdk/1.5_r2/index.htm
- A G1
- The source code for Bash 4.0: ftp://ftp.cwru.edu/pub/bash/bash-4.0.tar.gz
Step 1: Connect your pc to your phone with the SDK
You first have to be able to connect to your computer with adp which is included with the sdk. To do this with Ubuntu Januty Jackalope you first need to create a /etc/udev/rules.d/51-android.rules file with the following contents:
SUBSYSTEM=="usb", SYSFS{idVendor}=="0bb4", MODE="0666"
After this run the following to restart udev: ’sudo /etc/init.d/udev reload’. Lastly, on your phone make sure Settings :: Applications :: Development :: USB Debugging is enabled and the plug in your phone. When you run ‘./adp devices’ you should see a device listed.
Step 2: Build the Bash Shell
After installing ARM toolkit in /home/kbrandt/bin/arm-toolkit (used for this example) set the following environment variables in your shell.
CC='/home/kbrandt/bin/arm-toolkit/bin/arm-none-linux-gnueabi-gcc' PATH="$PATH:$HOME/bin/"
Then cd to the directory where you extracted the Bash source and run the following:
./configure --prefix=/opt/arm_bash/ -host=arm-linux --enable-static-link --without-bash-malloc
Assuming that worked, edit the ‘Makefile’ file and change ‘CFLAGS = -g -O2′ to ‘CFLAGS = -g -O2 -static’ and then run ‘make’. If this works, it should create a bash executable in the current directory. You can verify that this has been compiled for the ARM architecture with ‘file bash’. This should return:
bash: ELF 32-bit LSB executable, ARM, version 1 (SYSV), statically linked, for GNU/Linux 2.6.14, not stripped
Step 3: Copy the File to your android.
From the host computer in the tools directory of the SDK run ‘./adb push ~/src/bash-4.0/bash /data/’ to copy the executable to the phone. If you try to copy it to your sdcard, make sure the sdcard is not mounted with the noexec mount option as this disables the executable permission bit.
Step 4: Run, Enjoy, and Find Bugs.
You can now connect to your phone with ‘./adp shell’ and cd to the data directory and run ‘./bash’ and you should get a bash prompt. You might need to ‘chmod 555 bash’ if you get permission denied.
References:
http://jiggawatt.org/badc0de/android/index.html
My Not-So-Shabby Screen and Gnome-Terminal Setup
Introduction
For a system administrator it is important to have an efficient and comfortable interface to all your servers. GNU Screen is an excellent utility to be able to have a single terminal connected to multiple servers that won’t disappear when you close the window. I have a set up that allows me to spawn gnome-terminal with different screen sessions for each location I administer in a different tab. Then each screen session has a named ‘tab’ that automatically logs into each server at that location. It ends up looking like this:

My main two recommendations for screen are to set up the meta character as back-tick ( ` ) and to give screen a ‘tab bar’. You can read how to do these two things here.
Setting up Screen
Once you have screen set up the way you like, you can the specify an additional screenrc file with the -c switch and your settings in the ~/.screenrc will still be used. The secondary screenrc is where you can list different server groups. This file will make it so there are named ‘tabs’ for each server, and each tab will log into the server you specify. Each line in the file should be something like ’screen -t myServer ssh myServer’, the first mySever is the name of the tab, and then ssh myServer is the command that will be run. To simplify doing this in the future, I made a little Perl script that reads a file that has one server name per line and prints the rc file to standard out.
#!/usr/bin/perl
#===============================================================================
# FILE: makeScreenRc.pl
# USAGE: ./makeScreenRc.pl
# AUTHOR: Kyle Brandt (kb), kyle@kbrandt.com
#===============================================================================
use strict;
use warnings;
print "zombie qr\n";
while (<>) {
chomp;
my $server = $_;
print "screen -t $server ssh $server", "\n";
}
So if you called the above script with something like ‘perl makeScreenRc.pl myDmzList > myScreenDmzRc’ you can then use the created file with ’screen -R DMZ -c myScreenDmzRc’. The capital R switch looks for an existing detached session and will attempt to reattach it before creating a new one. This will be useful with gnome-terminal in case gnome-terminal crashes.
Setting Up Gnome-Terminal
The next step is to create a profile for each of the screen sessions. You can do this by going to File::New Profile and then create a profile with a relevant name for the screen session, i.e. ‘DMZ’ . After that, Go to the Title and Command tab, check ‘Run a custom Command instead of my shell’ and and edit the command to be something like ’screen -R DMZ -c myScreenDmzRc’. Then repeat this for each of the screen sessions you have set up. Then, you can run something like ‘gnome-terminal –tab-with-profile=DMZ –tab-with-profile=MyOffice’ where DMZ and MyOffice are the names of the gnome-terminal profiles you created. This automatically detaches itself from the controlling terminal, so if you close the terminal you launched this from, the new terminal will not close. Lastly, you can set up a shell alias to run the above command, so all you have to do to open up your command central is type something like ‘myservers’.
Parsing the The American Recovery and Reinvestment Act with Perl
Introduction:
I think of the American government as a democratic republic. The government is run by a small group of people, a republic, that is elected by the public to represent them, a democracy. Congress, and the bills they pass, should have oversight from the people. Although the bills are made available to the public, the size makes them somewhat inaccessible, and in my opinion, the media fails at providing enough detailed information on the content of these bills.
My goal was to take the 2009 stimulus bill and try to parse out some information about where the money is going in this massive bill. Although my parser is incomplete, it was still able to parse out information that I could not find by searching for it on the Recovery Website. So I consider it useful.
Parsing the Bill:
My first step was to get the bill into something I could parse. When I started this, I wasn’t aware of THOMAS, so I got the pdf from the recovery.org website and then converted it. I did this using pdftotext, and I then converted it to ascii to make it easier to work with the funny start and end quotes (also called left handed and right handed quotes):
pdftotext -enc UTF-8 -eol unix recoveryAct.pdf recoveryAct2.txt
cat recoveryAct2.txt | uni2ascii -e > recoveryAct3.txt
The parsing is done with regular expressions. After looking over the file, I decided the best way in was to parse the file line by line. I did however want to look ahead at certain points, so I read the file into memory to make this simple. This is known as slurping. The first two regular expressions capture the page number and the section of the document. These are important because all this script really does is make a sort of index, so I can find things that might be of interest.
The group of four regular expressions starting at line 50 do the bulk of the work. These parse the dollar amount after variations of a frequent phrase in the bill. The phrase goes something like ‘For an additional amount for … $50,000,000′. I put the regular expressions in the order of what I think will provide me the most accurate match first since ‘or’ is a short-circuit operator. I also do this on a larger scale with the if and elsif blocks.
As an example, I will explain the first regular expression at line 49. (I am going to explain the meta characters, not all the backtracking and how the Perl regular expression engine handles this, I would recommend “Pro Perl Parsing” or that you search the Perl Journal to learn about that). ‘For’ simply matches the word ‘For’ with any case variation (for example for or fOr), the /i modifier at the end makes the whole expression case insensitive. For ‘.*?’. the period means ‘any character’, the asterisk means match ‘any character’ zero or more times. The question mark makes the asterisk non-greedy. Since zero or more of any character would also ‘consume’ the word ‘additional’, making the asterisk non-greedy makes it so it will stop consuming characters if it finds the word ‘additional’. With \`\`(.*?)\’\’ I am capturing what appears between the ‘funny quotes’ I mentioned previously, this is the ascii interpretation of these funny quotes. The parenthesis around (.*?) capture what is between the funny quotes, which is what the money following is for as far as my parser is concerned. (\$[0-9,]*) captures any sort of dollar amount by looking for any combination of number and commas after a dollar sign. Lastly, the /g makes it so this will work if the pattern happens multiple times on the same line. I then print out the captured information with the page number.
Starting at line 60 I made it so it can read ahead a few lines in case my regular expression is interrupted by a new line. I did this by keeping track of which line I am at (which is the same as the index of the array), and then having a nested loop which reads ahead a few lines but without interrupting the flow of the main parsing loop.
The last sections, starting at line 84, I used to print out more basic matches and to look at them so I could see what dollar amounts were not captured, and use that information to improve my parser.
You can get the output of the script here. It has lines to show to start of each section of the bill, and then lines for the amount of money, what the parser thinks it is for, and the page of the bill that the line refers to. The page is important, because the script doesn’t understand that the following amount might not be the total amount of money, and might get confused elsewhere as well.
My Next Steps:
The next steps I would like to take are to start looking at the Lingua modules and look into incorporating natural language processing. It also might be helpful if I capture the html versions from THOMAS, as this will allow me to already have the sections parsable with HTML::TreeBuilder.
Conclusion:
I think congress should develop, and actually start using, XML markup for their bills. This will allow people to develop proper parsers that could retrieve the information, and display it visual formats so people could have a better handle on the where the money is going. Our country now has a CTO, Vivek Kundra, and I think he should lead the government to provide more open and accessible information.
#!/usr/bin/perl
#===============================================================================
#
# FILE: parseBill.pl
#
# USAGE: ./parseBill.pl
# AUTHOR: Kyle Brandt (kb), www.kbrandt.com
# COMPANY: Boston, MA
# VERSION: 1.0
# CREATED: 03/19/2009 10:16:54 AM
#===============================================================================
use strict;
use warnings;
use Roman;
#Globals
my $printit = 1;
my $delim = "\t";
my $page = 1;
my $titleSection;
my $resolution = 1;
my $total = 0;
my %causeMoney;
my %notParsed;
my $romanRegex = '';
foreach my $number (1..20) {
$romanRegex .= Roman($number);
unless ($number == 20) {
$romanRegex .= '|';
}
}
my @Bill = <>;
my $index = 0;
foreach (@Bill) {
#Get Page Number
if (/H\. R\. 1.*?([0-9]{1,3})/) {
#print $1, "\n";
$page = $1;
}
if ( m/TITLE ($romanRegex)-[A-Z ]*/) {
$titleSection = $&;
print $titleSection, $delim, $page, "\n" if $printit;
}
#For additional is a common phrase, this gets the dollar amount after it and what it is for
my @amounts;
if (
( @amounts = /For.*?additional.*?\`\`(.*?)\'\'.*?(\$[0-9,]*)/gi) or
( @amounts = /For.*?additional.*?for(.*?)(\$[0-9,]*)/gi) or
( @amounts = /For an amount for \`\`(.*?)\'\'.*?(\$[0-9,]*)/gi) or
( @amounts = /For necessary expenses for(.*?)(\$[0-9,]*)/gi)
) {
my $whatfor;
my $amount;
while (@amounts) {
$whatfor = shift @amounts;
$amount = shift @amounts;
$amount =~ tr/,$//d;
print $amount, $delim, $whatfor, $delim, $page, "\n" if $printit;
$causeMoney{$whatfor . ':' . $page} = $amount;
$total += $amount;
}
}
#Maybe if we read ahead a few lines, we will find what we are looking for
elsif ( @amounts = /For.*?additional.*?\`\`(.*?)\'\'/) {
AMOUNT:
while (@amounts) {
my $whatfor = shift @amounts;
if (length($whatfor) > 40) {
next AMOUNT;
}
if ( $index < ($#Bill - 6 )) {
for my $line (($index + 1) ... ($index + 6)) {
if (my $amount = $Bill[$line] =~ /\$[0-9,]*/) {
$amount =~ tr/,$//d;
print $amount, $delim, $whatfor, $delim, $page, "\n" if $printit;
$causeMoney{$whatfor . ':' . $page} = $&;
}
}
}
}
}
#Like above, but can't figureout what it is for
elsif ( my @unknownAmounts = /For.*?additional.*?(\$[0-9,]*)/gi) {
for my $unknownAmount (@unknownAmounts) {
$unknownAmount =~ tr/,$//d;
$causeMoney{'UnknownAtPage' . $page} = $unknownAmount;
$total += $unknownAmount;
}
}
#All money, that doesn't fit into the above, could be a portion of what is above.
elsif ( my @dontKnow = /\$[0-9,]*/gi ) {
for my $money ( @dontKnow ) {
$money =~ tr/,$//d;
if ( $money >= 1000000 ) {
$notParsed{$money} = $page;
}
}
}
$index += 1;
}
OpenVZ Bean Counters Nagios Script
“OpenVZ is container-based virtualization for Linux. OpenVZ creates multiple secure, isolated containers (otherwise known as VEs or VPSs) on a single physical server enabling better server utilization and ensuring that applications do not conflict.”
For each of these containers or VEs, there are resource limits. The psuedo file system, /proc, tracks various process and kernel information. The OpenVZ kernel provides the file /proc/user_beancounters that tells us if any of these limits have been reached (amoung other information). This is important because a process may fail to start (i.e. tomcat) if the limits have been reached. I wrote a script in python designed to be executed on the OpenVZ host machine by Nagios.
The script parses /proc/user_beancounters and will exit with appropriate Nagios exit status if one of these limits has been reached. If you don’t want to run this script as root, I recommend compiling a shell script with shc to copy the bean_counters file, own it as a unprivilaged user, and then make that a setuid root script (Linux won’t usually allow setuid shell scripts, which is why shc can be used to compile it. Does anyone think if the script only copys the file to tmp that this might be dangerious?). This is what the script expects with its current configuration. The script is easy to modify to make it check for other parameters besides the fail count (failcnt) as well.
You can get the script here: nagios_vz_bean.py
Quick Tip: Thinking about Bash Redirection and File Descriptors
A common question that comes up from people new to bash scripting is: “How do I redirect standard error to standard out?” There are a few ways to write this but the clearest way in my opinion is “command 2>&1″. File descriptor 2 is standard error, and 1 is standard out. So “2>&1″ reads in the form of the question: “File descriptor 2 is being redirected to file descriptor 1.”
However, I think that question itself causes confusion. I don’t think the phrase should be “redirecting standard error to standard out.” Rather, you are redirecting standard error to where standard out points to. You can also think “the file descriptors describe the files they point too.” To see this behavior, you can run ‘xclock 1> ~/scrap/foo 2>&1 ‘. What this does is redirect standard error to where standard out points to, and then redirects standard output to ‘~/scrap/foo’. If you run the following: ‘ls -l /proc/pid_of_xclock/fd’, you will see the above described behavior in action.
Bash: Getting Command Line Columns to Line up
Update: David Harding pointed out in his comment to this post that the column utility does exactly this. Therefore, the following is really just an academic exercise.
In my last post I showed how to get columns outputted in the command line to line up using python. In this post I am going to show you how to do it with Bash scripts (I think you could also use this same method with python using calls to the shell). Instead of padding the columns with spaces as I did in my previous post, this time we use a tab character for the delimiter and manually set the tab stops in the terminal itself. Since this is the terminal, not the shell, this will work with other shells as well (such as my favorite interactive shell, Zsh).
There is example code at the bottom. Instead of creating functions as I did with my previous post I have kept this example pretty tedious (repetitive code etc) to lessen the levels of abstraction and make the example a little clearer.
The first part finds the max width of each column of a text file. This example has 4 columns and a while loop that splits them on the tab character by setting the IFS ( Input Field Separator ) variable to tab for the loop only. Each iteration of the while loop remembers the value of each column; it saves the value in the $max# variable if the length was larger then the previous iteration ( the variable substitution ${#variable} returns the length of the variable ).
The second part, after the while loop, finds where the tab stops should be placed. The setterm command with the -tabs switch sets tab stops at absolute positions up to 160 ( each argument specifies where the tab stop is relative to the start of the line, not relative to the previous tab stop ) . So for this example, the second tab stop position is found by adding the width of the first column to the width of second column — this gives us the position relative to the start of line. Lastly, after setting the tab stops the file is displayed on the terminal with cat.
A caveat is that it is hard to find out what to set $TERM to. On my machine, when I am in screen session $TERM is equal to ’screen’, but this doesn’t work with setterm, I have to set TERM to ‘linux’.
I hope this helps someone when creating their next command line utility that uses columns.
Getting Command Line Columns to Line up with Python
I created a solution for a program I am writing that makes columns line up when outputted to the command line. I am new to Python, and am hoping I might get some input on this topic.
In the text file that the program reads, the fields (columns) are delimited by ‘$’ and the records (lines) are delimited by newlines ‘\n’. So one line looks like: 1$foo$bar
I broke this task into two separate functions. The first function, opt_output, finds the max length of each column and returns a list such as [1, 23, 14] where 1 is the max width of the first column, 23 is the max length of the second column etc. This function takes a list object as an argument; that list is the file described above with one record as an object in the list. It then iterates over the list, splitting each record into a list on its delimiter. Then the function iterates over an individual record, saving the length of each item if the length is larger then the the length of the equivalent column in previous record. This is of course clearer in the code itself:
The Second function, print_line, takes one record as a list object (already split into items) and the value returned from the previous function. It then uses the values returned from opt_output and pads each item in the list with spaces. The number of spaces to pad it with is figured out by subtracting the length of the item from the max length of the column that was found by the previous function. Finally, it rejoins the list with the delimiter ‘$’ and then splits the list again using the padded spaces as the delimiter for each column:
When calling these functions I record the value of the first function in a variable so it does not have to iterate over the list over and over again. The print line function is called for each line in the file when outputting file to the screen. As long as there is a fixed width font everything lines up nicely in the terminal.


