Where / how do I set the --default-stream per-thread in the properties section for the nvcc compiler directives?
Seems easy but I can’t find it.
Where / how do I set the --default-stream per-thread in the properties section for the nvcc compiler directives?
Seems easy but I can’t find it.
I just stuck it into the “Command line pattern:” text box. Seems like the wrong place but it works
I see the command line argument for the compiler as it compiles but it does not create a CUDA stream for each thread. :( any ideas?
I am using thrust and the latest distro for all tools and libraries
I kind of feel like I am talking to myself but here goes another crack at this question.
I gave up on --default-stream per-thread and just put stream handles in my objects and dispatch thrust kernel launches to the correct stream. That works pretty well except I can’t figure out how to direct thrust::host_vector object constructors for objects I create in my methods to the stream handle. They all go to the default stream. I added an allocator that does not initialize the vector with zeros so that eliminates the fill operation but it still launches a kernel on the default stream.
I should make the stream per thread work and that would solve my problems.
Here is what my compiler options look like …
Building file: …/src/CIntegrate.cu
Invoking: NVCC Compiler
/usr/local/cuda-11.0/bin/nvcc -O3 -gencode arch=compute_61,code=sm_61 -odir “src” -M -o “src/CIntegrate.d” “…/src/CIntegrate.cu”
/usr/local/cuda-11.0/bin/nvcc -O3 --compile --relocatable-device-code=false -gencode arch=compute_61,code=compute_61 -gencode arch=compute_61,code=sm_61 --default-stream per-thread -x cu -o “src/CIntegrate.o” “…/src/CIntegrate.cu”
Finished building: …/src/CIntegrate.cu
Does anyone see a problem with those options? If not … Does anyone know how to direct the vector constructor to use a specified stream?
Just for fun I set the stream handle I am using to = cudaStreamPerThread instead of creating a stream and my code continues to act the same way. Is there any chance my per-thread compiler setting is working and still constructing a device vector while in a thread launches the kernel on the default stream?