[URGENT]thrust::sort crashed M5000

Hi,

A call to thrust::sort() is failing(randomly but frequently) for M5000 card, CUDA Toolkit 8.0, Windows 7.
With K20 it works successfully.

void thrust::sort (RandomAccessIterator first, RandomAccessIterator last)
fails with data :
512512512 size and type unsigned int
with Quadro M5000 card (8GB RAM, availabe GPU RAM before call 5048MB, cudaMemGetInfo() )
(CPU RAM = 8GB, 4.5 GB available at the time of crash).

Code:
thrust::device_vector Elements(512512512);
//The vector is populated with values.
thrust::sort(Elements.begin(), Elements.end());
// e.what produces the string
“after cub_::DeviceRadixSort::SortKeys(1): unknown error”

Kindly provide some help.

Thanks,
Ram Prasad

Are you building a release x64 project?

Have you set the project properties to compile for the compute capability of the M5000 GPU?

Thanks for the prompt reply.
Yes, 64 bit release with
compute_50,sm_50;compute_52,sm_52

we are building for:
compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;compute_50,sm_50;compute_52,sm_52

M5000 has CC 5.2
we have another card in the system viz K600 (CC 3.0)

Old setup: same code used to work without any problem.
K20 and K600 : CUDA Toolkit 5.0
compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;

New Setup: same code crashing randomly(mostly every alternate)
M5000 and K600 : CUDA Toolkit 8.0
compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;compute_50,sm_50;compute_52,sm_52
it shows some momentary black screen and pop-up notification
“Display driver stopped responding and has recovered.”

OK that is a windows TDR timeout. It is covered in this forum thread here:

https://devtalk.nvidia.com/default/topic/459869/cuda-programming-and-performance/-quot-display-driver-stopped-responding-and-has-recovered-quot-wddm-timeout-detection-and-recovery-/

Thanks for the link.

I get the below error in the pop-up notification:

“Display Driver stopped responding and has recovered
Display driver NVIDIA Windows Kernel Mode Driver, Version 354.74 stopped
responding and has successfully recovered.”

Is this error same as the error discussed in the above mentioned forum thread ?

Yes

Tried
Nvidia Control Panel -> 3D Settings -> Set PhysX configuration -> Select a Physx processor -> M5000
crashing

and

Nvidia Control Panel -> 3D Settings -> Set PhysX configuration -> Select a Physx processor -> K600
still crashing.

I won’t be able to sort this out for you. You’ll need to read carefully the online information about how to disable the TDR timeout if you want to work around this issue. The Set PhysX configuration is not really the right way to do it, even though it is mentioned in that thread.

You can start by reading this:

Then google windows TDR timeout and find out ways that others have disabled the timeout. It varies by windows version. You’ll need to get into the windows registry to modify it, or else you can try using nsight VSE which has a setting for this.

I increased the TDR timeout value and the crash does not occur any more.
Thanks for your help…

Does increasing the TDR timeout value have some disadvantage ?