Friday, May 11, 2012

Text::CharWidth::PurePerl available on GitHub

I've put up a new perl module on github, Text::CharWidth::PurePerl.  As the name suggests, it's a pure-perl implementation of Text::CharWidth.  I needed it to be able to ship nowrap as a script without any dependencies.  Amusingly, it seems to be more correct than the "native" wcwidth routine.

It is packaged in the usual CPAN format, but I'm not currently planning on uploading it to CPAN (I don't have a CPAN id right now).  That seems like a real PITA right now, and I'm not sure if anyone else might even need it.  If you do want me to go through the steps, feel free to add a comment here on the blog or over on the github issue tracker.

Friday, April 15, 2011

Debugging Wily Stack Overflows in MPI Programs

I recently was debugging a problem in MPICH2 with the new MPI-3 nonblocking collectives implementation that involved a stack overflow.  The code in question was usually blowing up with a SIGABRT that came from a stack trace ending in __stack_chk_fail.  This bug was particularly nasty because it was a Heisenbug: it kept changing the failure location and mode whenever I would insert debugging code in order to track it down (hence the aforementioned "usually").

For reasons I will explain later, my usual trick for this sort of thing did not help.  Adding in those macros would cause the bug to vanish, or to reappear in a different function that had not been instrumented with MPIU_SG_ macros.

Luckily, I ran across this very helpful blog post that eventually helped me track down the real problem.  My basic approach was the same as the one outlined there, but it needed modification due to the pain inherent in debugging a parallel MPI application.  This particular bug required running at least 4 processes in order to manifest itself, and it would only appear after several iterations within the test program.  So I used mpiexec + xterm + gdb + a gdb script file to handle the job.

First, here's an example stack trace of the failure (using valgrind to easily obtain stack traces):

% mpiexec -l -n 4 valgrind -q ./icbarrier |& sort_labeled_output

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Abort trap (signal 6)
[1] ==47313== 
[1] ==47313== Process terminating with default action of signal 6 (SIGABRT)
[1] ==47313==    at 0x10027D616: __kill (in /usr/lib/libSystem.B.dylib)
[1] ==47313==    by 0x1003022CB: __stack_chk_fail (in /usr/lib/libSystem.B.dylib)
[1] ==47313==    by 0x10005D9D3: MPIDI_CH3I_Progress (ch3_progress.c:539)
[1] ==47313==    by 0x100109B0C: MPIR_Wait_impl (wait.c:67)
[1] ==47313==    by 0x100109FFA: PMPI_Wait (wait.c:171)
[1] ==47313==    by 0x100004506: MPI_Barrier (nbc_pmpi_adapter.c:24)
[1] ==47313==    by 0x10000141D: main (icbarrier.c:37)
[2] ==47314== 
[2] ==47314== Process terminating with default action of signal 6 (SIGABRT)
[2] ==47314==    at 0x10027D616: __kill (in /usr/lib/libSystem.B.dylib)
[2] ==47314==    by 0x1003022CB: __stack_chk_fail (in /usr/lib/libSystem.B.dylib)
[2] ==47314==    by 0x10005D9D3: MPIDI_CH3I_Progress (ch3_progress.c:539)
[2] ==47314==    by 0x100109B0C: MPIR_Wait_impl (wait.c:67)
[2] ==47314==    by 0x100109FFA: PMPI_Wait (wait.c:171)
[2] ==47314==    by 0x100004506: MPI_Barrier (nbc_pmpi_adapter.c:24)
[2] ==47314==    by 0x10000143B: main (icbarrier.c:45)
[3] ==47315== 
[3] ==47315== Process terminating with default action of signal 6 (SIGABRT)
[3] ==47315==    at 0x10027D616: __kill (in /usr/lib/libSystem.B.dylib)
[3] ==47315==    by 0x1003022CB: __stack_chk_fail (in /usr/lib/libSystem.B.dylib)
[3] ==47315==    by 0x10005D9D3: MPIDI_CH3I_Progress (ch3_progress.c:539)
[3] ==47315==    by 0x100109B0C: MPIR_Wait_impl (wait.c:67)
[3] ==47315==    by 0x100109FFA: PMPI_Wait (wait.c:171)
[3] ==47315==    by 0x100004506: MPI_Barrier (nbc_pmpi_adapter.c:24)
[3] ==47315==    by 0x10000143B: main (icbarrier.c:45)

So I took a quick gander at the assembly for MPIDI_CH3I_Progress to find the canary value initialization and comparison code:

% objdump -d icbarrier > icbarrier.s
% sed -n -e '/MPIDI_CH3I_Progress>:/,/^$/ p' icbarrier.s | head -n 15

000000010005c7f0 <_MPIDI_CH3I_Progress>:
   10005c7f0:   55                      push   %rbp
   10005c7f1:   48 89 e5                mov    %rsp,%rbp
   10005c7f4:   41 57                   push   %r15
   10005c7f6:   41 56                   push   %r14
   10005c7f8:   41 55                   push   %r13
   10005c7fa:   41 54                   push   %r12
   10005c7fc:   53                      push   %rbx
   10005c7fd:   48 81 ec 08 02 00 00    sub    $0x208,%rsp
   10005c804:   48 89 bd 10 fe ff ff    mov    %rdi,-0x1f0(%rbp)
   10005c80b:   89 b5 0c fe ff ff       mov    %esi,-0x1f4(%rbp)
   10005c811:   48 8b 05 60 c8 10 00    mov    0x10c860(%rip),%rax        # 100169078 <_pvars+0x78>
   10005c818:   48 8b 10                mov    (%rax),%rdx
   10005c81b:   48 89 55 c8             mov    %rdx,-0x38(%rbp)
   10005c81f:   31 d2                   xor    %edx,%edx

So in this code it looks like the offset is -0x38 rather than -0x14.  Let's find the two addresses at which we will set our breakpoints:

% sed -n -e '/MPIDI_CH3I_Progress>:/,/^$/ p' icbarrier.s | grep -- '-0x38(%rbp)'

   10005c81b:   48 89 55 c8             mov    %rdx,-0x38(%rbp)
   10005ce61:   48 8b 55 c8             mov    -0x38(%rbp),%rdx

And now for our gdb script:

% cat gdb.script

break *0x10005c81b
break *0x10005ce61
commands 1
silent
set variable $cow = (unsigned long*)($rbp - 0x38)
watch *$cow
continue
end
set variable $count = 3
commands 2
silent
delete $count
set variable $count = $count + 1
continue
end
run

% mpiexec -n 4 xterm -e gdb -x gdb.script ./icbarrier

That command launches four xterm windows, each containing a single gdb instance that was running icbarrier as the inferior.  Here's the resulting xterm+gdb screenshot for the process that hit the watchpoint first:

Unlike in many situations, this backtrace alone was not sufficient for me to figure out what the real bug was.  But, given that I changed some high level code (the nonblocking version of barrier, MPIX_Ibarrier), I knew it was extremely unlikely that I had tickled some long-latent bug in the frequently used routine MPIDI_CH3U_Request_unpack_uebuf.  So after a few "WTFs" and some further head scratching, I took a hard look at the upper level code (error checking and boring code omitted for clarity):


int MPIR_Ibarrier_inter(MPID_Comm *comm_ptr, MPID_Sched_t s)
{
    int mpi_errno = MPI_SUCCESS;
    int size, rank, root;
    int buf = 0;

    /* do a barrier on the local intracommunicator */
    mpi_errno = MPIR_Ibarrier_intra(comm_ptr->local_comm, s);
    MPID_SCHED_BARRIER(s);

    /* rank 0 on each group does an intercommunicator broadcast to the
       remote group to indicate that all processes in the local group
       have reached the barrier. We do a 1-byte bcast because a 0-byte
       bcast will just return without doing anything. */

    /* first broadcast from left to right group, then from right to
       left group */
    if (comm_ptr->is_low_group) {
        root = (rank == 0) ? MPI_ROOT : MPI_PROC_NULL;
        mpi_errno = MPIR_Ibcast_inter(&buf, 1, MPI_BYTE, root, comm_ptr, s);
        MPID_SCHED_BARRIER(s);
        /* receive bcast from right */
        root = 0;
        mpi_errno = MPIR_Ibcast_inter(&buf, 1, MPI_BYTE, root, comm_ptr, s);
    }
    else {
        /* receive bcast from left */
        root = 0;
        mpi_errno = MPIR_Ibcast_inter(&buf, 1, MPI_BYTE, root, comm_ptr, s);
        MPID_SCHED_BARRIER(s);
        /* bcast to left */

        root = (rank == 0) ? MPI_ROOT : MPI_PROC_NULL;
        mpi_errno = MPIR_Ibcast_inter(&buf, 1, MPI_BYTE, root, comm_ptr, s);
    }


fn_exit:
    return mpi_errno;
fn_fail:
    goto fn_exit;
}

This function is called, adding operations to a communication schedule, then an MPI_Wait call is made later to actually force the bulk of the schedule to complete.

Do you see the bug?  I'd be surprised if you did, since there are few people familiar with the style of MPICH2 code, and even fewer people who know how the NBC implementation works.  But if you do, give yourself a pat on the back, and then please send me bug reports and patches for my other bugs  :)

Take a closer look at the buffer used for the intercomm broadcast.  It's a stack allocated variable, which worked great in the blocking version of this routine.  Unfortunately, this is a nonblocking barrier, so this routine exits before the ibcast is actually ever performed.  So now we've managed to schedule a broadcast to occur that will overwrite a location on the stack that no longer actually refers to int buf;.  Instead, if we are unlucky it will refer to some random piece of a different function's stack frame.  In the backtraces above, this random piece was the -fstack-protector/SSP canary memory.  The bug moved around when I would insert debugging variables because they would affect the specific location of the canary value in memory, sometimes yielding a corrupted parameter or local variable instead.

Changing that buffer to be malloced instead fixed the problem.

Sunday, April 3, 2011

Roku stands behind their product

My older Roku player (this model, N1000-B) died a couple of months ago.  The wireless NIC went out to lunch for some reason, showing 00:00:00:00:00:00 as its MAC address and totally unable to see any wireless networks, just as described here.  I was busy and lazy at the time, so we just switched over to watching Netflix streaming content on our Wii instead.

Last Friday I finally got around to calling Roku about the issue and the technician that I got was very helpful.  He didn't give me the "reboot runaround"; he just asked me what troubleshooting steps I had taken, and ensured that my MAC address was showing all zeroes.  After a bit more discussion, he offered to replace the unit for free, even though it was nearly a past its warranty expiration.  They mailed the replacement unit (looks like it might be factory refurb?) via FedEx and 3 business days later the new unit was in my hands.  The new unit setup was painless (as expected), and a return shipping label was included so even the round-trip shipping was free.

I was extremely pleased that Roku stood behind their product, even after it was out of warranty.  They did exactly the right thing to change my mind from "I'll never buy another one of these" to "I'd happily buy something from them again."

(full disclosure: I have no affiliation with Roku, other than being a fairly satisfied customer)

Saturday, March 5, 2011

m4_include and aclocal

I ran into some surprising behavior a little while back while hacking on the MPICH2 build system, that I thought I should mention here.  But first, some background info is in order.  We have numerous subsystems that are currently configured (in the autoconf sense) by running an autoconf-generated configure script from various points in the top level configure.  Think AC_CONFIG_SUBDIRS, but the subconfig happens at the place where the macro is used, rather than being deferred to config.status-time.  There are some advantages to this scheme, but there are two fairly severe disadvantages:
  1. The subconfigure scripts tend to re-test a whole bunch of information about the build environment (e.g., six repeated checks for "how to run the C preprocessor").  Not only is this wasteful in terms of time, but there is always unlikely possibility that two configure scripts will come to subtly different conclusions about the build environment, which could cause wacky bugs later in the build.
  2. There's no easy mechanism to utilize the results of subconfigure tests that should be global.  That is, if we add -I/path/to/includes to CPPFLAGS that should be used for all compilation, then this won't be seen in the top-level configure without some additional work.  We currently "pass" these values back via a "localdefs" file that is listed in the subconfigure's AC_CONFIG_FILES and contains some @FOO@ substitutions that are either AC_SUBSTed or "precious" (which implies AC_SUBST for that var).  This stinks because it's a fairly manual process, and it's impenetrable to outsiders who are unfamiliar with our build system because the sourcing of "localdefs" is hidden inside a custom autoconf macro.
So I'm taking a stab at converting these configure files to be configure.in fragments that are then m4_included.  This will cause those tests to be run in the same shell as the top-level configure and will only run the additional tests that are needed by the subsystem.  In the process of that conversion, I ran into some odd error messages from the autotools that are represented by the following example program.

Quick Quiz: The code in Listing A will successfully autoreconf and configure.  The code in Listing B will fail during autoreconf's invocation of aclocal.  Why?


Common code:
## contents of configure.in
AC_INIT([foo],[1.0])
AM_INIT_AUTOMAKE([-Wall -Werror foreign 1.11 subdir-objects])
LT_INIT([])
AC_PROG_CC
var="yes"
AM_CONDITIONAL([COND],[test "$var" = "yes"])
dnl this makes m4 bail, claiming that AM_COND_IF is undefined
m4_foreach([subsys_i],[[subconfigure]],[m4_include(subsys_i[.m4])])
AC_OUTPUT([Makefile]) 

Listing A
## contents of subconfigure.m4
AC_PROG_GREP

Listing B
## contents of subconfigure.m4
AM_COND_IF([COND],[echo COND is true],[echo COND is false])


Do you see the problem here?  I sure didn't, even after some moderate debugging.  Go ahead and think about it for a minute, I'll wait...
.
.
.
.
.
.
.
.
.
Luckily, the nice folks on the autoconf list helped me understand what was going on here.  There are two things happening here:
  1. Not all macros are created equal.  In my original world view, AC_PROG_GREP and AM_COND_IF are both just AC_DEFUNed m4 macros and should behave more or less identically.  However this assumption is totally false.  AC_PROG_GREP is a macro that is built-in to autoconf, and therefore does not get distributed with your package via aclocal.m4, while AM_COND_IF comes with automake and is managed by aclocal, and must be placed into your package's aclocal.m4 in order to be properly expanded.
  2. It turns out that aclocal isn't as smart as I thought it was.  When it traces the m4 files looking for macro definitions that it must add to aclocal.m4, it doesn't perform real m4_includes at that time.  Instead it uses some regex heuristics in order to determine what file is being included and then traces that file separately.  Since the name of the file that is being included is being computed via an m4 macro in the example above, aclocal gets rather confused and just doesn't trace the subconfigure.m4 file.
The upshot of all of this is that I can't use m4 code to programmatically include subsystem m4 fragments.  They must either be hard-coded m4 file names or be autogenerated by some other non-autotools step, such as a quick shell script run before autoreconf.

    Friday, January 21, 2011

    Midwest Ice Climbers Ice Festival, 2011



    Last weekend I got to try ice climbing out at the Ice Pit near Green Bay, Wisconsin.  It was a very cool setup, sort of an outdoor ice climbing gym in an active quarry.  Normally you must be a member to climb there, but for the festival it was just $20 for the whole weekend.  I did the trip with five other guys, all rock climbers, three of which also work at the Lab.

    I found the climbing itself surprisingly difficult, despite my (at least modest) rock climbing abilities.  According to others that I spoke with at the festival, most of the routes that we were climbing were in the WI3-WI4 range, all about 100 feet high.  The hardest part was getting accustomed to the crampons and really trusting my feet.  When I didn't trust my feet enough, I would put too much weight on my hands which led to totally pumped out forearms after 100 feet of ice.  We were each able to get three climbs in each of the two days, which was enough to tire me out pretty effectively.

    There was a fair amount of demo gear to borrow from various vendors, including boots, crampons, and ice tools.  However, we got in touch with Down Wind Sports (in Michigan's UP) via email before the festival and arranged to rent gear from them on Friday evening instead.  That way we were able to save a lot of time each morning and were certain that we would have gear that fit.  Bill @ Down Wind Sports was extremely nice and helpful, I'd do business with them again without hesitation.  For $40/person we had boots and crampons for the whole weekend.  Ice tools were easy to come by as demo gear, and someone in our party already had a set as well.

    Overall it was a very enjoyable trip.  I might do a little easy ice climbing in the future as the opportunity presents itself, but for now I'm not planning on picking it up as a regular hobby.  I already have too many hobbies as it is, and this isn't the cheapest or most accessible hobby around.  A brand new set of top-rope ice equipment (boots, crampons, tools) adds up to $800-$1000, while a used set is probably half of that plus more time hunting for a decent deal.  And northern Illinois isn't exactly teeming with waterfall ice to climb.

    You can find a few more photos of the event on my Smugmug page.

    Thursday, January 20, 2011

    sort_labeled_output script

    Quick MPICH2 tip: passing the "-l" option to MPICH2's mpiexec (hydra) yields rank-labled output like so:

    % mpiexec -l -n 2 ./examples/cpi
    [0] Process 0 of 2 is on g2-2.local
    [1] Process 1 of 2 is on g2-2.local
    [0] pi is approximately 3.1415926544231318, Error is 0.0000000008333387
    [0] wall clock time = 0.000210


    This is handy for figuring out which process is printing what without reaching in and modifying all of your printf statements to explicitly print the rank as well.

    For more complicated output, and especially multiline output that you would get from running valgrind, this interleaved output can be hard to follow. Enter sort_labeled_output:

    #!/bin/zsh
    # assumes output that matches the following regex: /^\[\d+\] .*$/
    exec sort --stable --key=1.2,1.0 -n


    The script just stably sorts the output by rank prefix, which makes it much easier to read multiline output.

    Given that this is just a thin wrapper around GNU sort, I actually didn't build this script for a long time. But it turns out that this is a task that I do all the time and reconstructing the sort arguments always distracts me a bit from the debugging task at hand. So the script has been a big productivity win.

    Climbing in Agio Farango

    Dave following on "Το ονειρο και ο φοβος" (Google trans. "The dream and the fear"), 5b (approx. 5.8+), in Agio Farango, Crete, Greece.



    I had business on Crete, Greece and was fortunate enough to be able to stay an extra day for some sport climbing.  We were staying east of Heraklion, near Hersonissos, so Agio Farango, on the south side of the island seemed the best option for a day trip.  We picked up a guide book online ahead of time, rented a car, and headed southwest about 100 km or so, probably 2.5 hours each way, to Kali Limenes.

    We took the approach from Kali Limenes, a hike from the east of the gorge a few hundred yards inland from the Mediterranean.  Despite online directions and directions in the guide book, it was extremely unclear where we should park and where the actual marked path (with blazes and cairns) started.  Below I've included a map based on my several-months-old recollection of the route and my interpretation of the satellite photos:


    View Agio Farango approach from Kali Limenes in a larger map

    If I were to do it all again, I would probably try the alternative approach from Moni Odigitrias at the north end of the gorge, even though this website advises the Kali Limenes route.  We burned up a bit of time just trying to figure out where the trail was.

    The climbs were fun, although I found the difficulty a bit stiff in some cases.  Also, many of the routes had sections of very sharp limestone rock, which was difficult to hold onto.  Good footwork was essential for such routes.  While most routes were well bolted, I also followed on my first trad route, an easy 4c (French scale, approx. 5.7 YDS, even though it felt easier than that) named Ξωτικο that started right on the beach and had a great view of the water and beach from the top.

    Goats are all over the area, especially down in the gorge itself, as well as soloing up on the cliffs above you.  They are amazing climbers, able to scale walls and routes that look difficult for us humans with little fear or difficulty.  However, watch out for other tourists and their dogs.  While I was top roping one of the routes someone's dog chased a goat between the rock and my belayer 30 feet below me, which was a rather unsettling experience.

    Overall, a great trip.  It's a pretty intense day trip from the Heraklion area, and the driving and trail finding can really eat into your climbing time.  If you have the time, I'd recommend bringing a tent and some food and staying overnight on or near the beach in order to fit more climbing into a more relaxed outing.

    More photos from the trip can be found here on Smugmug.