Matrix multiplication fails (Tesla C2070, CUBLAS, Linux SLES 11sp1)

Hi, I was using Tesla GPU for linear algebra half a year ago and it worked fine. Recently I began using GPU and my application fails. For example, square matrix multiplication fails if a matrix size exceeds 1000x1000. It works fine for small matrices.

Details: to find the root cause of the issue, I downloaded this sample:

changed matrices size to be 320x320 or 1920x1920

In the first case it works, in the latter case it fails wit hthis message:
Failed to synchronize on the stop event (error code unknown error)!

I checked, there is no error before cudaEventRecord(stop, NULL) call.

More details:
Linux core 2.6.32.24-0.2 SUSE Linux Enterprise Server 11 (x86_64) SP1
GPU Tesla C2070 (S/N: 0322211066837)
NVRM version: NVIDIA UNIX x86_64 Kernel Module 310.32 Mon Jan 14 14:41:13 PST 2013
GCC version: gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221

Any help will be greatly appreciated!