64bit 5.5 cuda compiles SDK project using -m32 flag, cudaMalloc error!

Hi. I compile SDK 5.5 MatrixMul under 64-bit toolkit by adding -m32 flag and linking to 32-bit lib. Compilation passes. But when running, the cudaMalloc() reports error. My GPU is Tesla K20c, sm35. Ubuntu 12.10. x86_64. Cuda 5.5.

/--------------------------/
compile command:

/usr/local/cuda-5.5/bin/nvcc -ccbin g++ -I…/…/common/inc -m32 -gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=“sm_35,compute_35” -o matrixMul.o -c matrixMul.cu

/usr/local/cuda-5.5/bin/nvcc -ccbin g++ -m32 -o matrixMul matrixMul.o

/--------------------------/
error message:

GPU Device 0: “Tesla K20c” with compute capability 3.5

MatrixA(320,320), MatrixB(640,320)
cudaMalloc d_A returned error code 46, line(164)
/--------------------------/

I tried cuda 5.0 on K20c, the problem remains. ERROR MSG: “(error code all CUDA-capable devices are busy or unavailable)”.
But it works well on Tesla C2050. cuda 5.5 and 5.0. Is it K20c’s problem?

Any suggestion? Thanks.

nobody?

Maybe a device related problem?
Try cudaDeviceReset() before all the device code

cudaDeviceReset() doesn’t work. Problem remains.