Hey,
I’m having trouble with the clGetEventProfilingInfo command giving zero for most calls for CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END on a tesla. The occasionally returned non-zero values seem reasonable with little variations between them for the same kernel. I experience this behaviour not only with my own code but also the SDK examples (I used the MatrixMul example for testing).
oclDeviceQuery yields:
OpenCL SW Info:
CL_PLATFORM_NAME: NVIDIA
CL_PLATFORM_VERSION: OpenCL 1.0
OpenCL SDK Revision: 7027912
OpenCL Device Info:
2 devices found supporting OpenCL:
---------------------------------
Device Tesla T10 Processor
---------------------------------
CL_DEVICE_NAME: Tesla T10 Processor
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 190.29
CL_DEVICE_VERSION: OpenCL 1.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1296 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1023 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 4095 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest
CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 8192
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.3
NUMBER OF MULTIPROCESSORS: 30
NUMBER OF CUDA CORES: 240
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 16384
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0
---------------------------------
Device Tesla T10 Processor
---------------------------------
CL_DEVICE_NAME: Tesla T10 Processor
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 190.29
CL_DEVICE_VERSION: OpenCL 1.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1296 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1023 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 4095 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest
CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 8192
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.3
NUMBER OF MULTIPROCESSORS: 30
NUMBER OF CUDA CORES: 240
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 16384
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0
---------------------------------
2D Image Formats Supported (11)
---------------------------------
# Channel Order Channel Type
1 CL_RGBA CL_FLOAT
2 CL_RGBA CL_HALF_FLOAT
3 CL_BGRA CL_UNORM_INT8
4 CL_RGBA CL_UNORM_INT8
5 CL_RGBA CL_UNORM_INT16
6 CL_RGBA CL_SIGNED_INT8
7 CL_RGBA CL_SIGNED_INT16
8 CL_RGBA CL_SIGNED_INT32
9 CL_RGBA CL_UNSIGNED_INT8
10 CL_RGBA CL_UNSIGNED_INT16
11 CL_RGBA CL_UNSIGNED_INT32
---------------------------------
3D Image Formats Supported (11)
---------------------------------
# Channel Order Channel Type
1 CL_RGBA CL_FLOAT
2 CL_RGBA CL_HALF_FLOAT
3 CL_BGRA CL_UNORM_INT8
4 CL_RGBA CL_UNORM_INT8
5 CL_RGBA CL_UNORM_INT16
6 CL_RGBA CL_SIGNED_INT8
7 CL_RGBA CL_SIGNED_INT16
8 CL_RGBA CL_SIGNED_INT32
9 CL_RGBA CL_UNSIGNED_INT8
10 CL_RGBA CL_UNSIGNED_INT16
11 CL_RGBA CL_UNSIGNED_INT32
oclDeviceQuery, Platform Name = NVIDIA, Platform Version = OpenCL 1.0 , SDK Revision = 7027912, NumDevs = 2, Device = Tesla T10 Processor, Device = Tesla T10 Processor
System Info:
Local Time/Date = 19:31:59, 10/22/2010
CPU Name: Quad-Core AMD Opteron(tm) Processor 8380
# of CPU processors: 16
Linux version 2.6.18-194.17.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Sep 29 12:50:31 EDT 2010