Commandline Profiling Trying to use the profiler via the command line

maringanti · August 1, 2011, 7:53am

Hi,

Crossposting the same topic from the OpenCL forum - hoping to get more eyeballs on the topic and hopefully a solution.

I am trying to use the commandline profiler for OpenCL. IF you want to use the profiler via the commandline for CUDA, just replace OPENCL with CUDA in the environment variables given in the following link (ie OPENCL_PROFILE = 1 => CUDA_PROFILE = 1 etc)

I am trying to use the commandline profiler to profile my OpenCL code. I am using it in as suggested in the following link :

http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/visual_profiler_opencl/OpenCL_Profiler_3.0.txt

I used the latest version of Visual Profiler and dumped the data into a csv to see what all profiler data can be obtained. This is the list of all the parameters

gpustarttimestamp,method,gputime,cputime,occupancy,ndrangesizeX,ndrangesizeY,ndrangesizeZ,workgroupsizeX,workgroupsizeY,workgroupsizeZ,stapmemperworkgroup,regperworkitem,streamID,localworkgroupsize,memTransferSize,memtransferhostmemtype,gld_inst_8bit,gld_inst_16bit,gld_inst_32bit,gld_inst_64bit,gld_inst_128bit,gst_inst_8bit,gst_inst_16bit,gst_inst_32bit,gst_inst_64bit,gst_inst_128bit,local_load,local_store,gld_request,gst_request,shared_load,shared_store,sm_cta_launched,l1_local_load_hit,branch,l1_local_load_miss,l1_local_store_hit,divergent_branch,l1_local_store_miss,l1_global_load_hit,inst_issued,l1_global_load_miss,inst_executed,uncached_global_load_transaction,global_store_transaction,warps_launched,threads_launched,l1_shared_bank_conflict,active_warps,active_cycles,l2 read requests,l2 read texture requests,l2 write requests,l2 read misses,l2 write misses,tex cache requests,tex cache misses,threads instruction executed,dram reads,dram writes

I then put the above list in OPENCL_PROFILE_CONFIG file and run the executable multiple times to profile different parameters at different runs.

The problem I have is that I cannot get all the parameters. The profiler simply outputs invalid profiler option for some of them

These are the ones for which I cannot obtain the data outright

I don’t want to use the Visual Profiler because I want to automate the whole procedure.

Anyone has any ideas on how to obtain this data ? Specifically, what should be written in config file for the profiler to profile this data .

thanh_tuan · August 2, 2011, 5:35am

Hi,
Are you using NVIDIA runtime?
I had the same problem before. You can try using cuda keyword instead of using opencl keywords, which you can find in the Compute_Profiler.txt
For example “gridsize” and “threadblocksize” will be replaced with “ndrangesizeX, ndrangesizeY, ndrangesizeZ, workgroupsizeX, workgroupsizeY, workgroupsizeZ” or so.
Hope this helps.
Tuan

maringanti · August 2, 2011, 8:45am

Thanks. I had actually not looked into doc folder of the visual profiler. All the information I required was in the pdf. Your suggestion prompted to look into the document folder.

maringanti · August 3, 2011, 6:56am

there is still one unresolved issue.

When I try to profile tex1_cache_sector_queries / tex1_cache_sector_misses - I get this as an invalid config option. I am using a Tesla C2050 (Fermi architecture 2.0) and CUDA 4.0 / OpenCL latest version - Is this an architecture limitation or something else ?

Topic		Replies	Views
OpenCL commandline profiling need to know how to extract certain parameters CUDA Programming and Performance	1	5094	August 1, 2011
Question about OpenCL Profiling CUDA Programming and Performance	5	11164	August 24, 2011
NVIDIA profiler not working for OpenCL even for SDK samples CUDA Programming and Performance	2	10378	January 16, 2011
Error in reading profiler output CUDA Programming and Performance	16	23344	September 27, 2010
command line profiling CUDA Programming and Performance	10	5621	August 23, 2010
OpenCL NVIDIA Command Line Profiling Unable to profile OpenCL program with NVIDIA Command Line Profi CUDA Programming and Performance	2	1392	May 31, 2012
Opencl Visual profiling CUDA Programming and Performance	3	5356	April 23, 2010
CL_INVALID_COMMAND_QUEUE on clFinish on second run in profiler CUDA Programming and Performance	3	7382	September 1, 2011
visual profiler with compute capability 1.0 cards? CUDA Programming and Performance	9	5200	September 12, 2008
profiling with no output CUDA Programming and Performance	6	3722	August 25, 2010

Commandline Profiling Trying to use the profiler via the command line

Related topics