CUDA_PROFILER_CONFIG Can't get profile config to work

I created the CUDA_PROFILER_CONFIG variable and set it to “c:\cuda\cuda_profile_config.txt” (tried both, w/ and w/o quotations) but it does not seem to work. The documentation txt file isn’t specific in how it should be set up, but someone else on the forums said “one entry per line” (fyi those docs could definitely use a second pass :). I’ve tried both, entry per line as well as comma separated, nothing seems to work. Am I missing anything? I get this type of output no matter what:

method=[ _Z28computeNormalizedColorKernelPK6uchar3S1_PfP5uint3f ] gputime=[ 11991.425 ] cputime=[ 12004.890 ] occupancy=[ 0.500 ]

I really need to see # of incoherent reads.

how does the content of your cuda_profile_config.txt look like?

I’m using it like this:

   timestamp

    #---------

    #This option tells the profiler to log timestamps before kernel

    #launches and memory operations so the user can do timeline analysis.

   gld_incoherent    

    #gld_coherent      

    gst_incoherent    

    #gst_coherent      

    #--------------

    #These options tell the profiler to record information about whether global

    #memory loads/stores are coalesced (coherent) or non-coalesced (incoherent). 

   #local_load         

    #local_store       

    #-----------

    #These options are used to record the number of local loads/stores 

    #that take place during kernel execution.

   #branch            

    #divergent_branch

    #----------------

    #These options tell the profiler to record the number of total branches

    #and divergent branches taken by threads executing a kernel.

  

    #instructions 

    #------------

    #This options records the instruction count for a given kernel.

     

    #warp_serialize 

    #--------------

    #This options records the number of thread warps that serialize on address

    #conflicts to either shared or constant memory.

   

    #cta_launched 

    #------------

    #This option records the number of executed thread blocks.    

Just remove the ‘#’ infront of the option if you want to profile it.

Remember that you can only use 4 options at a once.

That’s basically what mine looks like (I’ve tried many different formats, that was one of them.) Anything else in specific I have to do? What system variables are the bare minimum required? Just the path? I put it in System Properties/Advanced?Environment Variables/System Variables, is that the correct place?

Mine are in /User Variables instead of /System Variables
and are as followed:

CUDA_PROFILE 1
CUDA_PROFILE_CONFIG D:\profiler.conf
CUDA_PROFILE_CSV 0
CUDA_PROFILE_LOG D:\profiler.log