So there is no issues with compiling for double precision from a 32bit linux distro, nor am I using the wrong version of NVCC?

OK then, moving on to the more complicated stuff…

This is just a matrix vector multiply of sparse matrix stored in vectors. I allocated the vectors using

cublasAlloc(…) with element sizes of doubles, and initiated their values to 1 just for testing.

I read the vectors back using cublasGetVector(…) with element sizes of double, etc and all works OK, so I’m assuming my vectors are correctly stored.

Now, the last line where storing back to b[index], it used to be b[index]=res; but that wasn’t working so I changed it to 3.0 just to make sure I could store 3 to the output vector elements.

```
__global__ void Mat1x1VecMultKernel ( double * matrix, unsigned int size_matrix,
uint2 * rowptr, unsigned int size_rowptr,
unsigned int * colind, unsigned int size_colind,
double * x, double * b, unsigned int size_vec ) {
// Thread index
const unsigned int index = compute_thread_index ();
if ( index < size_vec ) {
uint2 rowptr_bounds = rowptr[index];
double res = 0.0;
// for each block of the block_row, mult
for ( unsigned int i=rowptr_bounds.x; i<rowptr_bounds.y; i++ ) {
res += matrix[i]*x[colind[i]];
}
b[index] = 3.0; // res;
}
}
```

(this kernel was not originally my own creation , but from the cnc number cruncher project)

What comes out is all 1’s in the vector still. Any ideas? The code works fine in floating point, and I don’t see anything fundamentally that I changed from going to double precision.