Problem with cublasSetVector Wrong values of vector in CUDA memory

Hi All,

I got a problem with cublasSetVector function. Hope that someone can help me to find out my error.

In my code, I have to compute an equation like this:

d_X4 = d_sW3 * d_X3


d_X4 (5 x 125,700), d_sW3 (5 x 512), d_X3 (512 x 125,700) are allocated as 1D vectors in the following code fragment:

#define X4_SIZE 5 * 125700

#define MEM_SIZE_X3 125700 * 512 * sizeof(float)

#define SW3_SIZE 5 * 512

float* d_X4;

status = cublasAlloc(X4_SIZE, sizeof(d_X4[0]), (void**)&d_X4);

if (status != CUBLAS_STATUS_SUCCESS) {

fprintf (stderr, "!!!! Device memory allocation error (d_X4)\n");



float* d_X3;

CUDA_SAFE_CALL( cudaMalloc( (void**) &d_X3, MEM_SIZE_X3));

float* d_sW3;

status = cublasAlloc(SW3_SIZE, sizeof(d_sW3[0]), (void**)&d_sW3);

if (status != CUBLAS_STATUS_SUCCESS) {

fprintf (stderr, "!!!! Device memory allocation error (d_sW3)\n");



d_sW3 is initialized by copying data from sW3 array:

constant float sW3 = {


sW3 has 2560 elements (1 x 2560).

The problem is when I try to copy data from sW3 to d_sW3 by:

// Load sW3 to CUDA memory for use with CUBLAS.

status = cublasSetVector(SW3_SIZE, sizeof(sW3[0]), sW3, 1, d_sW3, 1);

if (status != CUBLAS_STATUS_SUCCESS) {

fprintf (stderr, "!!!! device access error (write A)\n");



I always get the all 0 values for d_sW3. This leads to the result of the following call (d_X4) is always 0:

cublasSgemm(‘n’, ‘n’, 5, 125700, 512, 1.0, d_sW3, 5, d_X3, 512, 0.0, d_X4, 5);


Why are you using constant float sW3 declaration?
This will declare the array in constant memory on the GPU.

Remove the “constant” and see if it works.

Hi mfatica,

I removed the “constant” declaration and tried to run

the code again. The problem is still the same.

Can you post a self-contained source code?

Hi mfatica,

I found the problem. There was nothing wrong with cublasSetVector(). The bug is inside my code to copy back the values of d_sW3 to host memory to check for its validation.

However, may you explain to me why the code gave different results with “constant” and without it. If it relates to the allocation place (on graphics card memory or host memory), it should only affect the performance, not the result of computation.