I prefer to take a working cuda-unaware program and convert it to cuda code with perl/bash script. This ensures absence of bugs. And very often compiler-ran-out-of-registers bug stops me (setting Olimit appears to have no effect). I failed several times before creating a variant which compiles and fits in registers completely. And the code is not optimal: if the compiler worked properly I could make it better
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
BUG: Broken register allocation, toolkit 2.3 | 15 | 6914 | May 10, 2010 | |
CUDA compile trouble | 47 | 5129 | November 8, 2010 | |
CUDA Toolkit 3.0 update GPU HW debugging tools to replace device emulation | 44 | 29470 | April 29, 2010 | |
Nvcc 12.3 with gcc 13.2 not working | 11 | 10225 | March 12, 2024 | |
Error in my code... | 11 | 2548 | December 19, 2014 | |
Problem with the compilation of Cuda source | 1 | 8893 | October 26, 2010 | |
Contents of loop failing to translate/compile/run? | 25 | 792 | February 11, 2023 | |
Cuda-gdb doesn't break and/or step into Kernels | 26 | 53795 | August 1, 2011 | |
Always got this warning when nvprof cuda file "This can happen if device ran out of memory or if a device kernel was stopped due to an assertion" on just HellowWorld GPU | 9 | 2576 | January 31, 2019 | |
Cuda 9.2 Samples 'make' failure - cannot compile example code | 2 | 3252 | July 12, 2018 |