Apparent conflict between cuBLAS and thrust::sort when using K20c

I recently upgraded my Windows 7 x64 development system by replacing 3 C2050 boards with 3 K20c boards. I left one C2050 in for display.

After doing so, I discovered a very strange error. When I attempt to run code that includes both cuBLAS and thrust::sort, cublasCreate() will hang for a very long time (minutes at least). This only occurs when thrust::sort is in the code - other thrust calls don’t cause problems. This happens regardless of whether I set the CUDA device to be a K20c or the C2050.

I did try replacing thrust::sort with cub::DeviceRadixSort but encountered the same issue, so perhaps thrust is using cub underneath?

A simple example of code that has this problem is below (examples of other thrust calls that do not cause problems are commented out). I am using Visual Studio 2010 for Windows 7 x64 with Nsight 3.0, CUDA Toolkit 5.0, NVIDIA Tesla Driver version 320.49. I compiled using all of the default options for a CUDA 5.0 project using Nsight.


#include “cublas_v2.h”

#include <thrust\device_vector.h>
#include <thrust\scan.h>
#include <thrust\sort.h>
#include <thrust\transform.h>
#include <thrust\extrema.h>

#include

int main()
{
cublasHandle_t h;
cublasCreate(&h);

thrust::device_vector a(1000);
thrust::fill( a.begin(), a.end(), 22 );
//thrust::inclusive_scan( a.begin(), a.end(), a.begin() );
//thrust::transform( a.begin(), a.end(), a.begin(), thrust::negate() );
thrust::sort( a.begin(), a.end() );

std::cout << “all done\n” << std::endl;

return 0;
}


Is there anyone out there with any idea what’s going on? Is this a known issue?