clGetEventProfilingInfo yields zero on Tesla

Hey,

I’m having trouble with the clGetEventProfilingInfo command giving zero for most calls for CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END on a tesla. The occasionally returned non-zero values seem reasonable with little variations between them for the same kernel. I experience this behaviour not only with my own code but also the SDK examples (I used the MatrixMul example for testing).

oclDeviceQuery yields:

OpenCL SW Info:

CL_PLATFORM_NAME: 	NVIDIA

 CL_PLATFORM_VERSION: 	OpenCL 1.0 

 OpenCL SDK Revision: 	7027912

OpenCL Device Info:

2 devices found supporting OpenCL:

---------------------------------

 Device Tesla T10 Processor

 ---------------------------------

  CL_DEVICE_NAME: 			Tesla T10 Processor

  CL_DEVICE_VENDOR: 			NVIDIA Corporation

  CL_DRIVER_VERSION: 			190.29

  CL_DEVICE_VERSION: 			OpenCL 1.0

  CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU

  CL_DEVICE_MAX_COMPUTE_UNITS:		30

  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3

  CL_DEVICE_MAX_WORK_ITEM_SIZES:	512 / 512 / 64 

  CL_DEVICE_MAX_WORK_GROUP_SIZE:	512

  CL_DEVICE_MAX_CLOCK_FREQUENCY:	1296 MHz

  CL_DEVICE_ADDRESS_BITS:		32

  CL_DEVICE_MAX_MEM_ALLOC_SIZE:		1023 MByte

  CL_DEVICE_GLOBAL_MEM_SIZE:		4095 MByte

  CL_DEVICE_ERROR_CORRECTION_SUPPORT:	no

  CL_DEVICE_LOCAL_MEM_TYPE:		local

  CL_DEVICE_LOCAL_MEM_SIZE:		16 KByte

  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE

  CL_DEVICE_IMAGE_SUPPORT:		1

  CL_DEVICE_MAX_READ_IMAGE_ARGS:	128

  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8

  CL_DEVICE_SINGLE_FP_CONFIG:		INF-quietNaNs round-to-nearest 

CL_DEVICE_IMAGE <dim>			2D_MAX_WIDTH	 8192

					2D_MAX_HEIGHT	 8192

					3D_MAX_WIDTH	 2048

					3D_MAX_HEIGHT	 2048

					3D_MAX_DEPTH	 2048

CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store

					cl_nv_compiler_options

					cl_nv_device_attribute_query

					cl_khr_global_int32_base_atomics

					cl_khr_global_int32_extended_atomics

					cl_khr_local_int32_base_atomics

					cl_khr_local_int32_extended_atomics

CL_DEVICE_COMPUTE_CAPABILITY_NV:	1.3

  NUMBER OF MULTIPROCESSORS:		30

  NUMBER OF CUDA CORES:			240

  CL_DEVICE_REGISTERS_PER_BLOCK_NV:	16384

  CL_DEVICE_WARP_SIZE_NV:		32

  CL_DEVICE_GPU_OVERLAP_NV:		CL_TRUE

  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE

  CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_FALSE

  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0

---------------------------------

 Device Tesla T10 Processor

 ---------------------------------

  CL_DEVICE_NAME: 			Tesla T10 Processor

  CL_DEVICE_VENDOR: 			NVIDIA Corporation

  CL_DRIVER_VERSION: 			190.29

  CL_DEVICE_VERSION: 			OpenCL 1.0

  CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU

  CL_DEVICE_MAX_COMPUTE_UNITS:		30

  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3

  CL_DEVICE_MAX_WORK_ITEM_SIZES:	512 / 512 / 64 

  CL_DEVICE_MAX_WORK_GROUP_SIZE:	512

  CL_DEVICE_MAX_CLOCK_FREQUENCY:	1296 MHz

  CL_DEVICE_ADDRESS_BITS:		32

  CL_DEVICE_MAX_MEM_ALLOC_SIZE:		1023 MByte

  CL_DEVICE_GLOBAL_MEM_SIZE:		4095 MByte

  CL_DEVICE_ERROR_CORRECTION_SUPPORT:	no

  CL_DEVICE_LOCAL_MEM_TYPE:		local

  CL_DEVICE_LOCAL_MEM_SIZE:		16 KByte

  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE

  CL_DEVICE_IMAGE_SUPPORT:		1

  CL_DEVICE_MAX_READ_IMAGE_ARGS:	128

  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8

  CL_DEVICE_SINGLE_FP_CONFIG:		INF-quietNaNs round-to-nearest 

CL_DEVICE_IMAGE <dim>			2D_MAX_WIDTH	 8192

					2D_MAX_HEIGHT	 8192

					3D_MAX_WIDTH	 2048

					3D_MAX_HEIGHT	 2048

					3D_MAX_DEPTH	 2048

CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store

					cl_nv_compiler_options

					cl_nv_device_attribute_query

					cl_khr_global_int32_base_atomics

					cl_khr_global_int32_extended_atomics

					cl_khr_local_int32_base_atomics

					cl_khr_local_int32_extended_atomics

CL_DEVICE_COMPUTE_CAPABILITY_NV:	1.3

  NUMBER OF MULTIPROCESSORS:		30

  NUMBER OF CUDA CORES:			240

  CL_DEVICE_REGISTERS_PER_BLOCK_NV:	16384

  CL_DEVICE_WARP_SIZE_NV:		32

  CL_DEVICE_GPU_OVERLAP_NV:		CL_TRUE

  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE

  CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_FALSE

  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0

---------------------------------

  2D Image Formats Supported (11)

  ---------------------------------

  #	 Channel Order   Channel Type		  

1	 CL_RGBA		 CL_FLOAT			  

  2	 CL_RGBA		 CL_HALF_FLOAT		 

  3	 CL_BGRA		 CL_UNORM_INT8		 

  4	 CL_RGBA		 CL_UNORM_INT8		 

  5	 CL_RGBA		 CL_UNORM_INT16		

  6	 CL_RGBA		 CL_SIGNED_INT8		

  7	 CL_RGBA		 CL_SIGNED_INT16	   

  8	 CL_RGBA		 CL_SIGNED_INT32	   

  9	 CL_RGBA		 CL_UNSIGNED_INT8	  

  10	CL_RGBA		 CL_UNSIGNED_INT16	 

  11	CL_RGBA		 CL_UNSIGNED_INT32	 

---------------------------------

  3D Image Formats Supported (11)

  ---------------------------------

  #	 Channel Order   Channel Type		  

1	 CL_RGBA		 CL_FLOAT			  

  2	 CL_RGBA		 CL_HALF_FLOAT		 

  3	 CL_BGRA		 CL_UNORM_INT8		 

  4	 CL_RGBA		 CL_UNORM_INT8		 

  5	 CL_RGBA		 CL_UNORM_INT16		

  6	 CL_RGBA		 CL_SIGNED_INT8		

  7	 CL_RGBA		 CL_SIGNED_INT16	   

  8	 CL_RGBA		 CL_SIGNED_INT32	   

  9	 CL_RGBA		 CL_UNSIGNED_INT8	  

  10	CL_RGBA		 CL_UNSIGNED_INT16	 

  11	CL_RGBA		 CL_UNSIGNED_INT32	 

oclDeviceQuery, Platform Name = NVIDIA, Platform Version = OpenCL 1.0 , SDK Revision = 7027912, NumDevs = 2, Device = Tesla T10 Processor, Device = Tesla T10 Processor

System Info: 

Local Time/Date =  19:31:59, 10/22/2010

 CPU Name: Quad-Core AMD Opteron(tm) Processor 8380 

 # of CPU processors: 16

 Linux version 2.6.18-194.17.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Sep 29 12:50:31 EDT 2010

Hey,

I’m having trouble with the clGetEventProfilingInfo command giving zero for most calls for CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END on a tesla. The occasionally returned non-zero values seem reasonable with little variations between them for the same kernel. I experience this behaviour not only with my own code but also the SDK examples (I used the MatrixMul example for testing).

oclDeviceQuery yields:

OpenCL SW Info:

CL_PLATFORM_NAME: 	NVIDIA

 CL_PLATFORM_VERSION: 	OpenCL 1.0 

 OpenCL SDK Revision: 	7027912

OpenCL Device Info:

2 devices found supporting OpenCL:

---------------------------------

 Device Tesla T10 Processor

 ---------------------------------

  CL_DEVICE_NAME: 			Tesla T10 Processor

  CL_DEVICE_VENDOR: 			NVIDIA Corporation

  CL_DRIVER_VERSION: 			190.29

  CL_DEVICE_VERSION: 			OpenCL 1.0

  CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU

  CL_DEVICE_MAX_COMPUTE_UNITS:		30

  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3

  CL_DEVICE_MAX_WORK_ITEM_SIZES:	512 / 512 / 64 

  CL_DEVICE_MAX_WORK_GROUP_SIZE:	512

  CL_DEVICE_MAX_CLOCK_FREQUENCY:	1296 MHz

  CL_DEVICE_ADDRESS_BITS:		32

  CL_DEVICE_MAX_MEM_ALLOC_SIZE:		1023 MByte

  CL_DEVICE_GLOBAL_MEM_SIZE:		4095 MByte

  CL_DEVICE_ERROR_CORRECTION_SUPPORT:	no

  CL_DEVICE_LOCAL_MEM_TYPE:		local

  CL_DEVICE_LOCAL_MEM_SIZE:		16 KByte

  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE

  CL_DEVICE_IMAGE_SUPPORT:		1

  CL_DEVICE_MAX_READ_IMAGE_ARGS:	128

  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8

  CL_DEVICE_SINGLE_FP_CONFIG:		INF-quietNaNs round-to-nearest 

CL_DEVICE_IMAGE <dim>			2D_MAX_WIDTH	 8192

					2D_MAX_HEIGHT	 8192

					3D_MAX_WIDTH	 2048

					3D_MAX_HEIGHT	 2048

					3D_MAX_DEPTH	 2048

CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store

					cl_nv_compiler_options

					cl_nv_device_attribute_query

					cl_khr_global_int32_base_atomics

					cl_khr_global_int32_extended_atomics

					cl_khr_local_int32_base_atomics

					cl_khr_local_int32_extended_atomics

CL_DEVICE_COMPUTE_CAPABILITY_NV:	1.3

  NUMBER OF MULTIPROCESSORS:		30

  NUMBER OF CUDA CORES:			240

  CL_DEVICE_REGISTERS_PER_BLOCK_NV:	16384

  CL_DEVICE_WARP_SIZE_NV:		32

  CL_DEVICE_GPU_OVERLAP_NV:		CL_TRUE

  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE

  CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_FALSE

  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0

---------------------------------

 Device Tesla T10 Processor

 ---------------------------------

  CL_DEVICE_NAME: 			Tesla T10 Processor

  CL_DEVICE_VENDOR: 			NVIDIA Corporation

  CL_DRIVER_VERSION: 			190.29

  CL_DEVICE_VERSION: 			OpenCL 1.0

  CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU

  CL_DEVICE_MAX_COMPUTE_UNITS:		30

  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3

  CL_DEVICE_MAX_WORK_ITEM_SIZES:	512 / 512 / 64 

  CL_DEVICE_MAX_WORK_GROUP_SIZE:	512

  CL_DEVICE_MAX_CLOCK_FREQUENCY:	1296 MHz

  CL_DEVICE_ADDRESS_BITS:		32

  CL_DEVICE_MAX_MEM_ALLOC_SIZE:		1023 MByte

  CL_DEVICE_GLOBAL_MEM_SIZE:		4095 MByte

  CL_DEVICE_ERROR_CORRECTION_SUPPORT:	no

  CL_DEVICE_LOCAL_MEM_TYPE:		local

  CL_DEVICE_LOCAL_MEM_SIZE:		16 KByte

  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

  CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE

  CL_DEVICE_IMAGE_SUPPORT:		1

  CL_DEVICE_MAX_READ_IMAGE_ARGS:	128

  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8

  CL_DEVICE_SINGLE_FP_CONFIG:		INF-quietNaNs round-to-nearest 

CL_DEVICE_IMAGE <dim>			2D_MAX_WIDTH	 8192

					2D_MAX_HEIGHT	 8192

					3D_MAX_WIDTH	 2048

					3D_MAX_HEIGHT	 2048

					3D_MAX_DEPTH	 2048

CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store

					cl_nv_compiler_options

					cl_nv_device_attribute_query

					cl_khr_global_int32_base_atomics

					cl_khr_global_int32_extended_atomics

					cl_khr_local_int32_base_atomics

					cl_khr_local_int32_extended_atomics

CL_DEVICE_COMPUTE_CAPABILITY_NV:	1.3

  NUMBER OF MULTIPROCESSORS:		30

  NUMBER OF CUDA CORES:			240

  CL_DEVICE_REGISTERS_PER_BLOCK_NV:	16384

  CL_DEVICE_WARP_SIZE_NV:		32

  CL_DEVICE_GPU_OVERLAP_NV:		CL_TRUE

  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE

  CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_FALSE

  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0

---------------------------------

  2D Image Formats Supported (11)

  ---------------------------------

  #	 Channel Order   Channel Type		  

1	 CL_RGBA		 CL_FLOAT			  

  2	 CL_RGBA		 CL_HALF_FLOAT		 

  3	 CL_BGRA		 CL_UNORM_INT8		 

  4	 CL_RGBA		 CL_UNORM_INT8		 

  5	 CL_RGBA		 CL_UNORM_INT16		

  6	 CL_RGBA		 CL_SIGNED_INT8		

  7	 CL_RGBA		 CL_SIGNED_INT16	   

  8	 CL_RGBA		 CL_SIGNED_INT32	   

  9	 CL_RGBA		 CL_UNSIGNED_INT8	  

  10	CL_RGBA		 CL_UNSIGNED_INT16	 

  11	CL_RGBA		 CL_UNSIGNED_INT32	 

---------------------------------

  3D Image Formats Supported (11)

  ---------------------------------

  #	 Channel Order   Channel Type		  

1	 CL_RGBA		 CL_FLOAT			  

  2	 CL_RGBA		 CL_HALF_FLOAT		 

  3	 CL_BGRA		 CL_UNORM_INT8		 

  4	 CL_RGBA		 CL_UNORM_INT8		 

  5	 CL_RGBA		 CL_UNORM_INT16		

  6	 CL_RGBA		 CL_SIGNED_INT8		

  7	 CL_RGBA		 CL_SIGNED_INT16	   

  8	 CL_RGBA		 CL_SIGNED_INT32	   

  9	 CL_RGBA		 CL_UNSIGNED_INT8	  

  10	CL_RGBA		 CL_UNSIGNED_INT16	 

  11	CL_RGBA		 CL_UNSIGNED_INT32	 

oclDeviceQuery, Platform Name = NVIDIA, Platform Version = OpenCL 1.0 , SDK Revision = 7027912, NumDevs = 2, Device = Tesla T10 Processor, Device = Tesla T10 Processor

System Info: 

Local Time/Date =  19:31:59, 10/22/2010

 CPU Name: Quad-Core AMD Opteron(tm) Processor 8380 

 # of CPU processors: 16

 Linux version 2.6.18-194.17.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Sep 29 12:50:31 EDT 2010