GTX Titan - Thrust - Compute 3.5

AlexanderAgathos · October 28, 2013, 10:28pm

Does anybody have problems with thrust in the GTX - Titan under Compute=35, SM=35 compilation flags? In Compute 3 it runs perfectly fine but in 3.5 a nice crash happens … any insights on what might go wrong? It happens in inclusive_scan, it crashes.

pasoleatis · October 28, 2013, 10:37pm

Do you have the latest gpu driver? I had some problems with some drivers with cufft library.

AlexanderAgathos · October 28, 2013, 10:41pm

I have the 327.23 version…

njuffa · October 29, 2013, 2:16am

What OS platform is this? I assume you are running CUDA 5.5 and the latest driver package?

How does it “crash”? Does the code fail to build when you compile it with -arch=sm_35? If so, what is the exact error message from the compiler? Does the compiler throw an internal error or segfault? Is this a debug or a release build?

If the code builds fine with -arch=sm_35, and there is a runtime error: What is the API-level error status (e.g. unspecified launch failure, out of resources, etc)? Or does the apps seg fault, hang the machine (i.e. unresponsive after 5 minute wait), blue screen, kernel panic?

If you run the application under cuda-memcheck, or inside the CUDA debugger does it diagnose any issues, such as an out-of-bounds access or a race condition? If on Linux: When you run the code under valgrind, does it report any issues with the host code?

A bug in a CUDA component cannot be excluded based on the information known so far. After doing some due diligence, you may want to consider filing a bug, attaching the smallest self-contained code that reproduced the issue.

AlexanderAgathos · October 29, 2013, 10:37am

njuffa:

What OS platform is this? I assume you are running CUDA 5.5 and the latest driver package?

How does it “crash”? Does the code fail to build when you compile it with -arch=sm_35? If so, what is the exact error message from the compiler? Does the compiler throw an internal error or segfault? Is this a debug or a release build?

If the code builds fine with -arch=sm_35, and there is a runtime error: What is the API-level error status (e.g. unspecified launch failure, out of resources, etc)? Or does the apps seg fault, hang the machine (i.e. unresponsive after 5 minute wait), blue screen, kernel panic?

If you run the application under cuda-memcheck, or inside the CUDA debugger does it diagnose any issues, such as an out-of-bounds access or a race condition? If on Linux: When you run the code under valgrind, does it report any issues with the host code?

A bug in a CUDA component cannot be excluded based on the information known so far. After doing some due diligence, you may want to consider filing a bug, attaching the smallest self-contained code that reproduced the issue.

You are right I need to do a thorough profiling. Lets see. Its a seg fault and its on CUDA5.5 and VS2012. So lets see what NSIGHT can do for me…

njuffa · October 29, 2013, 6:01pm

So the crash is a segfault at application runtime? That usually means there is a problem with the host code, which may or may not be related to anything that happens in CUDA. I would think the first order of business would be to find out where the segfault occurs and why (e.g. out of bounds access, null pointer), then take it from there. I have never used Thrust, so am unable to provide specific pointers regarding inclusive_scan.

AlexanderAgathos · October 29, 2013, 10:13pm

The interesting thing it is that it happens when I change from SM=30 to SM=35…no it happens also when sth goes wrong in a kernel and a memcpy happens…a segfault can happen also this way…I will do extensive profiling with NSIGHT…the pity is that it happens now that I want to insert dynamic parallelism…anyway it may be THRUST I usually build very good kernels but this specific software is a beast…I need to check it thoroughly…

pasoleatis · October 29, 2013, 11:51pm

Hello,

The important thing is not find out in your program which function gives the error. I know that if a kernel gives an error is only reported at the next command which make s the sync between the cpu and gpu. If you have a bi, complex program I would try to test if possible the function separately to make sure the error is really coming from the thrust library.