Hi All,
I got a problem with cublasSetVector function. Hope that someone can help me to find out my error.
In my code, I have to compute an equation like this:
d_X4 = d_sW3 * d_X3
Here:
d_X4 (5 x 125,700), d_sW3 (5 x 512), d_X3 (512 x 125,700) are allocated as 1D vectors in the following code fragment:
#define X4_SIZE 5 * 125700
#define MEM_SIZE_X3 125700 * 512 * sizeof(float)
#define SW3_SIZE 5 * 512
float* d_X4;
status = cublasAlloc(X4_SIZE, sizeof(d_X4[0]), (void**)&d_X4);
if (status != CUBLAS_STATUS_SUCCESS) {
fprintf (stderr, "!!!! Device memory allocation error (d_X4)\n");
return EXIT_FAILURE;
}
float* d_X3;
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_X3, MEM_SIZE_X3));
float* d_sW3;
status = cublasAlloc(SW3_SIZE, sizeof(d_sW3[0]), (void**)&d_sW3);
if (status != CUBLAS_STATUS_SUCCESS) {
fprintf (stderr, "!!!! Device memory allocation error (d_sW3)\n");
return EXIT_FAILURE;
}
d_sW3 is initialized by copying data from sW3 array:
constant float sW3 = {
-0.01191230948139,0.17926046900097,0.11025187906035,0.05754000839781,-0.03893107139793,0.36864162244246,0.18421623579791,0.13214617171056,...};
sW3 has 2560 elements (1 x 2560).
The problem is when I try to copy data from sW3 to d_sW3 by:
// Load sW3 to CUDA memory for use with CUBLAS.
status = cublasSetVector(SW3_SIZE, sizeof(sW3[0]), sW3, 1, d_sW3, 1);
if (status != CUBLAS_STATUS_SUCCESS) {
fprintf (stderr, "!!!! device access error (write A)\n");
return EXIT_FAILURE;
}
I always get the all 0 values for d_sW3. This leads to the result of the following call (d_X4) is always 0:
cublasSgemm(‘n’, ‘n’, 5, 125700, 512, 1.0, d_sW3, 5, d_X3, 512, 0.0, d_X4, 5);
Khanh