How to set --default-stream per-thread in nsight

Where / how do I set the --default-stream per-thread in the properties section for the nvcc compiler directives?

Seems easy but I can’t find it.

I just stuck it into the “Command line pattern:” text box. Seems like the wrong place but it works

I see the command line argument for the compiler as it compiles but it does not create a CUDA stream for each thread. :( any ideas?

I am using thrust and the latest distro for all tools and libraries

I kind of feel like I am talking to myself but here goes another crack at this question.

I gave up on --default-stream per-thread and just put stream handles in my objects and dispatch thrust kernel launches to the correct stream. That works pretty well except I can’t figure out how to direct thrust::host_vector object constructors for objects I create in my methods to the stream handle. They all go to the default stream. I added an allocator that does not initialize the vector with zeros so that eliminates the fill operation but it still launches a kernel on the default stream.

I should make the stream per thread work and that would solve my problems.

Here is what my compiler options look like …
Building file: …/src/CIntegrate.cu
Invoking: NVCC Compiler
/usr/local/cuda-11.0/bin/nvcc -O3 -gencode arch=compute_61,code=sm_61 -odir “src” -M -o “src/CIntegrate.d” “…/src/CIntegrate.cu”
/usr/local/cuda-11.0/bin/nvcc -O3 --compile --relocatable-device-code=false -gencode arch=compute_61,code=compute_61 -gencode arch=compute_61,code=sm_61 --default-stream per-thread -x cu -o “src/CIntegrate.o” “…/src/CIntegrate.cu”
Finished building: …/src/CIntegrate.cu

Does anyone see a problem with those options? If not … Does anyone know how to direct the vector constructor to use a specified stream?

Just for fun I set the stream handle I am using to = cudaStreamPerThread instead of creating a stream and my code continues to act the same way. Is there any chance my per-thread compiler setting is working and still constructing a device vector while in a thread launches the kernel on the default stream?