Compiling with NVCC for Double Precision

I’m trying to change a kernel from single precision to double precision floating point, and I’m having trouble getting the right results to my vector.

Let’s start off with the basics, I’m using 32bit linux (ubuntu). When I type nvcc --version it says Cuda Compilation tools, release 2.2 V0.2.1221. My graphics card is a GTX 285. Stop me here if this isn’t going to work.

Continuing on I compile with the following:
nvcc -arch sm_13 -o my_cuda_blas.o -c -I/usr/local/cuda/include -I/usr/local/cuda/common/inc

Anything not look right so far?

Those compilation arguments should be correct.

So there is no issues with compiling for double precision from a 32bit linux distro, nor am I using the wrong version of NVCC?

OK then, moving on to the more complicated stuff…

This is just a matrix vector multiply of sparse matrix stored in vectors. I allocated the vectors using

cublasAlloc(…) with element sizes of doubles, and initiated their values to 1 just for testing.

I read the vectors back using cublasGetVector(…) with element sizes of double, etc and all works OK, so I’m assuming my vectors are correctly stored.

Now, the last line where storing back to b[index], it used to be b[index]=res; but that wasn’t working so I changed it to 3.0 just to make sure I could store 3 to the output vector elements.

__global__ void Mat1x1VecMultKernel ( double * matrix, unsigned int size_matrix,

									  uint2 * rowptr, unsigned int size_rowptr,

									  unsigned int * colind, unsigned int size_colind,

									  double * x, double * b, unsigned int size_vec ) {

	// Thread index

	const unsigned int index = compute_thread_index ();

	if ( index < size_vec ) {

		uint2 rowptr_bounds = rowptr[index];

		double res = 0.0;

		// for each block of the block_row, mult

		for ( unsigned int i=rowptr_bounds.x; i<rowptr_bounds.y; i++ ) { 

			res += matrix[i]*x[colind[i]];


		b[index] = 3.0;   // res;



(this kernel was not originally my own creation , but from the cnc number cruncher project)

What comes out is all 1’s in the vector still. Any ideas? The code works fine in floating point, and I don’t see anything fundamentally that I changed from going to double precision.

The kernel probably isn’t running at all. Do you have host side error checking code around the kernel launch and memory management? { Remember that the size of everything doubles with 64 bit floating point, so your execution parameters might need to change, and accidental mistakes with type sizes get exposed when the type size increases from 4 bytes to 8 }