Get Parameter List for kernel

Hi, I’m working on some middleware for the CUDA driver API. According to the documentation for cuLaunchKernel, kernels compiled with CUDA 4.0+ contain their parameter list information. I need to find a way to access this information programmatically. Specifically, given a CUfunction handle, I need the number of parameters and the size of each parameter. In order for the middleware I’m working on to work properly for calls to cuLaunchKernel, it needs to be able to make a copy of the parameters to send to the cuLaunchKernel function. However, the kernelParams argument is just a double pointer which doesn’t include any information on how many pointers are in the array or the size of the buffers pointed to by each pointer. I need both of these in order to copy the parameters into the appropriate address space for the eventual call to the CUDA cuLaunchKernel function.

The information has always been available at the CUDA Runtime API level by intercepting cudaSetupArgument before the cudaLaunch. I’ll let someone else comment about the driver API.

Yes, prior to CUDA 4.0, there was a similar call in the driver API. However, in CUDA 4.0, those calls were deprecated in favor of the new cuLaunchKernel function which just takes an array of pointers to all of the arguments. Neither the size of the array nor the size of the individual arguments are passed in. According to the documentation for the cuLaunchKernel function, the driver API determines this information from metadata that is compiled into the kernel. I have not, however, seen a public API to access that information. Alternatively, the user can pass the parameter list to cuLaunchKernel with the “extra” parameter that does require the user to specify the size of the arguments buffer, but I need to be able to support either method of passing the parameters into cuLaunchKernel.

I have had to do this before, but it has always been part of a system that performs compiler analysis on the kernel before execution, so the argument information is always available from the compiler.

From what I can tell, there isn’t any specific metadata embedded in cuda binaries that encodes the parameter list explicitly. For example,

__global__ void kernel(int* v)
{
	*v = 1;
}
nvcc -arch sm_20 -fatbin main.cu
cuobjdump --dump-elf-symbols main.fatbin 

Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = linux
compile_size = 64bit
identifier = main.cu

symbols:
STT_SECTION      STB_LOCAL    .text._Z6kernelPi
STT_SECTION      STB_LOCAL    .nv.constant0._Z6kernelPi
STT_FUNC         STB_GLOBAL   _Z6kernelPi

As far as I can tell, these are all of the symbols in the binary, and none of them
specify a parameter list.

cuobjdump --dump-sass main.fatbin 

Fatbin elf code:
================
arch = sm_20
code version = [1,6]
producer = cuda
host = linux
compile_size = 64bit
identifier = main.cu

	code for sm_20
		Function : _Z6kernelPi
	/*0000*/     /*0x00005de428004404*/ 	MOV R1, c [0x1] [0x100];
	/*0008*/     /*0x80009de428004000*/ 	MOV R2, c [0x0] [0x20];
	/*0010*/     /*0x9000dde428004000*/ 	MOV R3, c [0x0] [0x24];
	/*0018*/     /*0x04001de218000000*/ 	MOV32I R0, 0x1;
	/*0020*/     /*0x00201c8594000000*/ 	ST.E [R2], R0;
	/*0028*/     /*0x00001de780000000*/ 	EXIT;
		............................

Although I’m not sure if it works like this, it would be possible for the driver to look
at the first few instructions in the kernel and refer to the ABI to determine the number and type
of arguments. It would probably be easier to do this for PTX than SASS.

I may be wrong and someone else can correct me if there is some easier way to get this information.

Hmmm… that’s interesting. That seems to directly contradict the documentation for cuLaunchKernel:

“Note that to use cuLaunchKernel(), the kernel f must either have been compiled with toolchain version 3.2 or later so that it will contain kernel parameter information, or have no kernel parameters.”

Also, I looked through a PTX file last night and it did indeed contain the parameter list on each kernel in the PTX assembler source. That’s odd that it didn’t spit it out from your cuobjdump, though.

The PTX does include an argument list. I’m not sure if that information is stored in another format though. I suspect that it would probably be available in the debugging information as well (nvcc -G …), but that might be even harder to read than PTX, and you couldn’t count on it being there.

If I dump the entire elf structure (not only the symbols) of one of my cubin kernels, I get the entire list of arguments:

cuobjdump.exe -elf d:\test.cubin

(...)
.nv.info.testKernel
	
	Attribute:	EIATTR_SMEM_PARAM_SIZE
	Format:	EIFMT_HVAL
	Value:	0x34
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x9	Offset  : 0x30	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x8	Offset  : 0x2c	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x7	Offset  : 0x28	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x6	Offset  : 0x20	Size    : 0x8
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x5	Offset  : 0x1c	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x4	Offset  : 0x18	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x3	Offset  : 0x10	Size    : 0x8
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x2	Offset  : 0x8	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x1	Offset  : 0x4	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_KPARAM_INFO
	Format:	EIFMT_SVAL
	Value:	Index : 0x0	Ordinal : 0x0	Offset  : 0x0	Size    : 0x4
		Pointee's logAlignment : 0x0	Space : 0x0	cbank : 0x1f	Parameter Space : SMEM	
	
	Attribute:	EIATTR_IMAGE_SLOT
	Format:	EIFMT_SVAL
	Value:	0x18 0x0 
	
	Attribute:	EIATTR_MAXREG_COUNT
	Format:	EIFMT_HVAL
	Value:	0x2
	
	Attribute:	EIATTR_SYNC_STACK
	Format:	EIFMT_SVAL
	Value:	0x230118 0x20