cuda-gdb walkthrough not working on OS X Lion

I am following the cuda-gdb walkthrough at http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/cuda-gdb.pdf (chapter 011), and am not able to single step through the code:

% cuda-gdb bitreverse

NVIDIA (R) CUDA Debugger

4.0 release

Portions Copyright (C) 2007-2011 NVIDIA Corporation

GNU gdb 6.3.50.20050815-cvs (Fri May 13 10:38:44 UTC 2011)

Copyright 2004 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "--host=i686-apple-darwin10.0.0 --target="...unable to read unknown load command 0x24

unable to read unknown load command 0x26

unable to read unknown load command 0x24

unable to read unknown load command 0x26

unable to read unknown load command 0x24

unable to read unknown load command 0x26

unable to read unknown load command 0x24

unable to read unknown load command 0x26

unable to read unknown load command 0x24

unable to read unknown load command 0x26

unable to read unknown load command 0x24

unable to read unknown load command 0x26

unable to read unknown load command 0x24

unable to read unknown load command 0x26

Reading symbols for shared libraries ... done

unable to read unknown load command 0x24

unable to read unknown load command 0x26

(cuda-gdb) b bitreverse

Breakpoint 1 at 0x281c: file tmpxft_00007d16_00000000-1_bitreverse.cudafe1.stub.c, line 8.

(cuda-gdb) r

Starting program: /Users/geordan/src/cuda/bitreverse 

Reading symbols for shared libraries + done

unable to read unknown load command 0x24

unable to read unknown load command 0x26

(lots of this)

Reading symbols for shared libraries +.++........................................................................................................ done

Reading symbols for shared libraries .. done

Reading symbols for shared libraries .. done

[Context Create of context 0x69980e00 on Device 0]

[Launch of CUDA Kernel 0 (bitreverse<<<(1,1,1),(256,1,1)>>>) on Device 0]

[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Breakpoint 1, bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x110000) at bitreverse.cu:9

9	   unsigned int *idata = (unsigned int*)data;

(cuda-gdb) bt

#0  bitreverse<<<(1,1,1),(256,1,1)>>> (data=0x110000) at bitreverse.cu:9

(cuda-gdb) next

[Termination of CUDA Kernel 0 (bitreverse<<<(1,1,1),(256,1,1)>>>) on Device 0]

[Switching to process 32169]

0x99b959c7 in __mtx_droplock ()

Why is the debugger allowing the kernel to exit instead of stepping through?

This is on OS X 10.7 preview, CUDA 4.0.19, toolkit 4.0.17.