cuda-gdb hangs in the CUDA 2.3 beta

I installed the CUDA 2.3 beta on the following Linux x86_64 system:

nvi:/proc/driver/nvidia>cat /etc/*release
CentOS release 5.2 (Final)
nvi:/proc/driver/nvidia>uname -a
Linux nvidia0.totalviewtech.com 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
nvi:/proc/driver/nvidia>cat version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 190.09 Mon Jun 15 16:53:35 PDT 2009
GCC version: gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)
nvi:/proc/driver/nvidia>

The system has two graphics cards installed:

nvi:/proc/driver/nvidia>more cards/*
::::::::::::::
cards/0
::::::::::::::
Model: nForce 980a/780a SLI
IRQ: 185
Video BIOS: ??.??.??.??.??
Card Type: PCI
DMA Size: 40 bits
DMA Mask: 0xffffffffff
Bus Location: 02.00.0
::::::::::::::
cards/1
::::::::::::::
Model: Tesla C1060
IRQ: 169
Video BIOS: ??.??.??.??.??
Card Type: PCI-E
DMA Size: 40 bits
DMA Mask: 0xffffffffff
Bus Location: 05.00.0
nvi:/proc/driver/nvidia>

I have a simple matrix multiple example taken from the programmer’s guide. I can run it outside the debugger and it works OK:

% ./tx_cuda_matmul
A:
[ 0][ 0] 0.000000
[ 0][ 1] 1.000000
[ 1][ 0] 10.000000
[ 1][ 1] 11.000000
B:
[ 0][ 0] 0.000000
[ 0][ 1] 1.000000
[ 1][ 0] 10.000000
[ 1][ 1] 11.000000
C:
[ 0][ 0] 10.000000
[ 0][ 1] 11.000000
[ 1][ 0] 110.000000
[ 1][ 1] 131.000000
%

However, if I run it under CUDA-GDB, it hangs:

% cuda-gdb ./tx_cuda_matmul
NVIDIA ® CUDA Debugger
BETA release
Portions Copyright © 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”…
Using host libthread_db library “/lib64/libthread_db.so.1”.
(cuda-gdb) run
Starting program: /nfs/jabod0/scratch/home/jdelsign/tvbld/linux-x86-64/nvidia0/totalview.mainline/debugger/src/tests/bld/nvcc_2.2_64/tx_cuda_matmul
[Thread debugging using libthread_db enabled]
[New process 13395]
[New Thread 47139247271856 (LWP 13395)]
Warning: a GPU was made unavailable to the application due to debugging
constraints. This may change the application behaviour!

The program hangs here, and I have to type CTRL+C to get back to GDB:

Program received signal SIGINT, Interrupt.
[Switching to Thread 47139247271856 (LWP 13395)]
0x00002adf761c4e50 in ?? () from /usr/lib64/libcuda.so
(cuda-gdb) where
#0 0x00002adf761c4e50 in ?? () from /usr/lib64/libcuda.so
#1 0x00002adf761ae313 in ?? () from /usr/lib64/libcuda.so
#2 0x00002adf761adf8a in ?? () from /usr/lib64/libcuda.so
#3 0x00002adf7619c6b8 in ?? () from /usr/lib64/libcuda.so
#4 0x00002adf76192e26 in ?? () from /usr/lib64/libcuda.so
#5 0x00002adf761e66bf in ?? () from /usr/lib64/libcuda.so
#6 0x00002adf75f541a7 in ?? () from /usr/local/cuda/lib/libcudart.so.2
#7 0x00002adf75f43742 in cudaLaunch () from /usr/local/cuda/lib/libcudart.so.2
#8 0x00000000004011dd in cudaLaunch (…)
#9 0x0000000000400c2d in _device_stub__Z12MatMulKernel6MatrixS_S (
__par0=@0x7fff34b9a3f0, __par1=@0x7fff34b9a408, __par2=@0x7fff34b9a420)
at /tmp/tmpxft_00001ceb_00000000-1_tx_cuda_matmul.cudafe1.stub.c:13
#10 0x0000000000400c45 in MatMulKernel__entry (__cuda_0=
{width = 2, height = 2, stride = 2, elements = 0x1000700}, __cuda_1=
{width = 2, height = 2, stride = 2, elements = 0x1000800}, __cuda_2=
{width = 2, height = 2, stride = 2, elements = 0x1000900})
at /tmp/tmpxft_00001ceb_00000000-1_tx_cuda_matmul.cudafe1.stub.c:17
#11 0x0000000000400fc5 in MatMul (A=
{width = 2, height = 2, stride = 2, elements = 0x5f351e0}, B=
{width = 2, height = 2, stride = 2, elements = 0x5f35200}, C=
{width = 2, height = 2, stride = 2, elements = 0x5f35220})
at …/…/src/tx_cuda_matmul.cu:77
#12 0x0000000000401125 in main (argc=1, argv=0x7fff34b9a6a8)
at …/…/src/tx_cuda_matmul.cu:167
(cuda-gdb)

Before I upgraded the system to the CUDA 2.3 beta, it was running the CUDA 2.2 beta, and the debugger was working OK.

Any ideas what the problem might be?