CUBLAS problem

I do not now what occurs… but CUBLAS stopped working…

Whenever I try to execute simpleCUBLAS it starts and does not responds. the same situtation with my own code - whenever it application comes to use function from cublas it stops working

I have cubals.h included and also in Linker cublas.lib

Any suggestions ?
A.

Can you explain what you mean? Does your application crash?

Can you explain what you mean? Does your application crash?

When it reaches the point when CUBLAS function should be called - it stops and there is black window of project_name.exe . If I want to close it - there is another window sayng that the application was stoped and I may break or contiunue.

f.i. when I execute simpleCUBLAS there is

simpleCUBLAS is running…

_

and nothing happen for anoter few minutes?!

I have already reinstalled driver/toolkit/sdk to CUDA 3.1 - it did not help.

When it reaches the point when CUBLAS function should be called - it stops and there is black window of project_name.exe . If I want to close it - there is another window sayng that the application was stoped and I may break or contiunue.

f.i. when I execute simpleCUBLAS there is

simpleCUBLAS is running…

_

and nothing happen for anoter few minutes?!

I have already reinstalled driver/toolkit/sdk to CUDA 3.1 - it did not help.

I have tested it once again… and it works…after 3-4minutes!!!

allocation/initialization/computations does not take any time…as opposed to

status = cublasInit();

if (status != CUBLAS_STATUS_SUCCESS) {

    fprintf (stderr, "!!!! CUBLAS initialization error\n");

    return EXIT_FAILURE;

}

which every time takes about 3-4minutes…any ideas why ?

I found it that if in project there is no CUBLAS functions there is none of such 3-4 minutes delay, but if there is even single CUBLAS function there is such a delay?!

Y.

I have tested it once again… and it works…after 3-4minutes!!!

allocation/initialization/computations does not take any time…as opposed to

status = cublasInit();

if (status != CUBLAS_STATUS_SUCCESS) {

    fprintf (stderr, "!!!! CUBLAS initialization error\n");

    return EXIT_FAILURE;

}

which every time takes about 3-4minutes…any ideas why ?

I found it that if in project there is no CUBLAS functions there is none of such 3-4 minutes delay, but if there is even single CUBLAS function there is such a delay?!

Y.

Please help me - because it drives me mad when I need to wait 3 minutes until single execution occurs…

Please help me - because it drives me mad when I need to wait 3 minutes until single execution occurs…

Help us help you - what operating system, driver version, hardware and host compiler are you using? It sounds a lot like there is driver level recompilation of PTX going on, which is why it is slow (especially of you host CPU isn’t very fast or you don’t have a lot of free memory). Are you running a Fermi GPU or something else?

Help us help you - what operating system, driver version, hardware and host compiler are you using? It sounds a lot like there is driver level recompilation of PTX going on, which is why it is slow (especially of you host CPU isn’t very fast or you don’t have a lot of free memory). Are you running a Fermi GPU or something else?

Windows 7 x64

CPU : Intel Xeon 3.33GHz (12GB of RAM)

GPU GeForce GTX 485

already installed from http://developer.nvidia.com/object/cuda_3_1_downloads.html :

driver 257.21, toolkit and sdk

Device 0: “GeForce GTX 480”

CUDA Driver Version: 3.10

CUDA Runtime Version: 3.10

CUDA Capability Major revision number: 2

CUDA Capability Minor revision number: 0

Total amount of global memory: 1576468480 bytes

Number of multiprocessors: 15

Number of cores: 480

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Clock rate: 0.81 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device

simultaneously)

Concurrent kernel execution: Yes

Device has ECC support enabled: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs

= 1, Device = GeForce GTX 480

Windows 7 x64

CPU : Intel Xeon 3.33GHz (12GB of RAM)

GPU GeForce GTX 485

already installed from http://developer.nvidia.com/object/cuda_3_1_downloads.html :

driver 257.21, toolkit and sdk

Device 0: “GeForce GTX 480”

CUDA Driver Version: 3.10

CUDA Runtime Version: 3.10

CUDA Capability Major revision number: 2

CUDA Capability Minor revision number: 0

Total amount of global memory: 1576468480 bytes

Number of multiprocessors: 15

Number of cores: 480

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Clock rate: 0.81 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device

simultaneously)

Concurrent kernel execution: Yes

Device has ECC support enabled: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs

= 1, Device = GeForce GTX 480

I am betting it is the same problem discussed in this thread. Make sure that you are building all your code for sm_20 and that you haven’t got JIT compilation forced on.

I am betting it is the same problem discussed in this thread. Make sure that you are building all your code for sm_20 and that you haven’t got JIT compilation forced on.

Yes, indeed - it solved my problem. Thank you very much indeed!

Y.

Yes, indeed - it solved my problem. Thank you very much indeed!

Y.