Command-line profiler returns -1 for some events

Hi all,

I’m using the command-line profiler to profile some OpenCL applications (couldn’t get any of NVIDIA’s other tools to work with OpenCL, it’s probably user error). I’m profiling execution of an application on a Tesla C2075 (driver version: 340.76) on Ubuntu 14.04 LTS. Here’s how I’m invoking the command (I’m running NPB compute benchmarks):

COMPUTE_PROFILE=1 COMPUTE_PROFILE_CONFIG=./config_file.txt OPENCL_DEVICE_TYPE=gpu ./cg.C.x ../CG

The configuration file for the profiler looks like the following:

gpustarttimestamp
gpuendtimestamp
gridsize3d
threadblocksize
memtransferdir
memtransfersize
memtransferhostmemtype
profilelogformat CSV
countermodeaggregate
sm_cta_launched
inst_executed
thread_inst_executed_0

Execution finishes and validates, meaning the application correctly executed on the Tesla. However, the profiler spits out the following in the log:

# OPENCL_PROFILE_LOG_VERSION 2.0
# OPENCL_DEVICE 0 Tesla C2075
# OPENCL_CONTEXT 1
# OPENCL_PROFILE_CSV 1
# TIMESTAMPFACTOR 13e938e16f3fbede
gpustarttimestamp,gpuendtimestamp,method,gputime,cputime,ndrangesizeX,ndrangesizeY,ndrangesizeZ,workgroupsizeX,workgroupsizeY,workgroupsizeZ,occupancy,sm_cta_launched,inst_executed,thread_inst_executed_0,memtransfersize,memtransferdir,memtransferhostmemtype
13eaabd92d360720,13eaabd92d3d8b80,init_mem_0,492.640,507.922,65535,1,1,256,1,1,1.000,66573,4718520,75496320
13eaabd92d4f30e0,13eaabd92d56b1c0,init_mem_0,491.744,500.705,65535,1,1,256,1,1,1.000,66398,4718520,75496320
13eaabd92d676c80,13eaabd92d699ea0,init_mem_0,143.904,152.535,18930,1,1,256,1,1,1.000,19057,1362960,21807360
13eaabd92d7c7600,13eaabd92d7c9e80,init_mem_0,10.368,19.303,586,1,1,256,1,1,1.000,581,42192,675072
13eaabd92d936bc0,13eaabd92da26540,init_mem_1,981.376,990.391,65535,1,1,256,1,1,1.000,65642,4718520,75496320
13eaabd92db33340,13eaabd92dc22580,init_mem_1,979.520,988.199,65535,1,1,256,1,1,1.000,65670,4718520,75496320
13eaabd92dd2e340,13eaabd92dd73f40,init_mem_1,285.696,294.233,18930,1,1,256,1,1,1.000,18980,1362960,21807360
13eaabd92de9b1c0,13eaabd92de9e640,init_mem_1,13.440,22.416,586,1,1,256,1,1,1.000,570,42192,675072
13eaabd92dfc4e00,13eaabd92dfc83e0,init_mem_1,13.792,22.596,586,1,1,256,1,1,1.000,591,42192,675072
13eaabd92e0ed4c0,13eaabd92e0f0980,init_mem_1,13.504,22.336,586,1,1,256,1,1,1.000,577,42192,675072
13eaabd92e218860,13eaabd92e21bb40,init_mem_1,13.024,21.936,586,1,1,256,1,1,1.000,588,42192,675072
13eaabd92e340ac0,13eaabd92e344220,init_mem_1,14.176,23.057,586,1,1,256,1,1,1.000,577,42192,675072
13eaabd92e4a7e40,13eaabdaa98afcc0,makea_0,6362791.500,6362633.000,32,1,1,32,1,1,0.167,35,5591398884,120749562225
13eaabdaa99d0140,13eaabdacc6e1a60,makea_1,584128.750,584124.250,32,1,1,32,1,1,0.167,35,1041618934,10416189120
13eaabdacc7f5160,13eaabdacc83b7c0,makea_2,288.352,297.113,32,1,1,32,1,1,0.167,35,76132,1646850
13eaabdacc94e660,13eaabdacc94f400,memcpyDtoHasync,3.488,17.465,,,,,,,,,,,4,2,0
13eaabdacc9ed240,13eaabdf8f0ee320,makea_3,20441994.000,20441462.000,1172,1,1,128,1,1,0.667,1179,-1,-1
13eaabdf8f2715a0,13eaabdf8f2c2f80,makea_4,334.304,348.643,32,1,1,32,1,1,0.167,35,-1,-1
13eaabdf8f42f760,13eaabdf8f475b40,makea_5,287.712,296.270,32,1,1,32,1,1,0.167,35,-1,-1
...

Notice that starting with the kernel “makea_3”, the profiler spits out -1 for both inst_executed thread_inst_executed_0 events. Here’s the strange part - because I can run different problem sizes with NPB, smaller versions (e.g. cg.B.x) execute correctly AND spit out counter values > 0 for those two events. I’m confused because execution seems correct, but the profiler can’t produce event values. Does anybody know what’s going on? Are the counters overflowing? Thanks for any help!