CUSPARSE v2 sparse matrix-matrix multiply issue (CUDA 5.0 preview)

I’m running into some issues with CUSPARSE (version 2) in the CUDA 5.0 preview. I have a cusparseScsrmm() call, which performs C = alpha * A * B + beta * C, that seems to run just fine in most cases. However, if my sparse matrix size increases past a certain point, increasing from the following dimensions:

(Case 1 - runs fine)
Sparse matrix A dimensions: 262144 x 65536
Sparse matrix A CSR row array length: 65537
Sparse matrix A CSR column array length: 5831316
Sparse matrix A CSR value array length: 5831316
Total size: 66 MB

To these dimensions:

(Case 2)
Sparse matrix A dimensions: 262144 x 65536
Sparse matrix A CSR row array length: 65537
Sparse matrix A CSR column array length: 6692228
Sparse matrix A CSR value array length: 6692228
Total size: 76 MB

I receive the following error message :

CUDA error unspecified launch failure in ucsf/CSystemMatrixCUSPARSEDeviceThrust.h at line 1335

Line 1335 corresponds to my cusparseScsrmm() call. In both cases, the CUSPARSE operations is CUSPARSE_OPERATION_NON_TRANSPOSE. When I use the CUSPARSE v1 code (making only the changes described in NVIDIA’s CUSPARSE manual documentation), I don’t encounter any error. In addition I have ran this code in CUDA 4.0 (but with CUSPARSE version 1) and not encountered this issue.

Any suggestions?

Note that the dense matrices have the same dimensions in cases 1 and 2:
Dense matrix B dimensions: 65536 x 60 = 3932160
Dense matrix C dimensions: 262144 x 60 = 15728640

System description:
CUDA toolkit release version: 5.0 preview
Compiler for CPU host code: g++
Operating System: Ubuntu 11.10 64-bit
CPU, memory: Dual AMD Opteron 6128 2.0 GHz, 32 GB DDR3 RAM
(2) Tesla M2070 cards

Please file a bug, attaching self-contained repro code. There is a link to the bug reporting form on the registered developer website. Thank you for your help.