Intercept cudaMallocPitch

Hi,
I’m developing a hook library for some cuda calls to add extra features for GPU usage control.

I followed the sample released with CUDA toolkit (7_CUDALibraries/cuHook) to create customized versions of cuMemAlloc and cuMemFree, which can successfully intercept cudaMalloc and cudaFree calls.

However, for cudaMallocPitch (in cuda runtime library), I cannot find any corresponding functions to intercept in cuda driver library. I’ve tried to intercept cuMemAllocPitch, but when user calling cudaMallocPitch, the call was not being intercepted by hook library.

Below is the code I wrote to intercept cuMemAllocPitch:

CUresult CUDAAPI cuMemAllocPitch(CUdeviceptr* dptr, size_t* pPitch, size_t WidthInBytes, size_t Height, unsigned int  ElementSizeBytes) {
  printf("intercepted!\n");
  return CUDA_SUCCESS;
}

And I use the code below to test if the function call is intercepted:

#include <cuda_runtime.h>
#include <unistd.h>
int main() {
  int *ptr;
  size_t pitch;
  cudaMallocPitch((void**)&ptr, &pitch, 1000 * sizeof(int), 1000);
  return 0;
}

Nothing is printed on the standard output.

Trying to override cudaMallocPitch in the hook library will cause ‘multiple definition of cudaMallocPitch’ during linking time.

Testing environment:

  • OS: CentOS 7.6
  • Compiler: gcc-4.8.5
  • CUDA version: CUDA 10.0
  • GPU: NVIDIA Tesla V100

Thanks for reading and appreciate for your help!

Intercept cuMemAllocPitch_v2() ?

See what function parameters have changed compared to cuMemAllocPitch() by inspecting header files.

I bet that the cuda runtime is now exclusively using the newer version

In /usr/local/cuda-10.0/include/cuda.h, I found the following line (line 101):

#define cuMemAllocPitch                     cuMemAllocPitch_v2

And for the code in question, I’ve included <cuda.h>, so this define must be working.

I’ve tried intercepting cuMemAllocPitch_v2() too. Still not intercepted by my LD_PRELOAD library.

The function definition in cuda.h is (line 4702):

CUresult CUDAAPI cuMemAllocPitch(CUdeviceptr *dptr, size_t *pPitch, size_t WidthInBytes, size_t Height, unsigned int ElementSizeBytes);

Which matches my definition.

If I call cuMemAllocPitch directly, the call can be intercepted normally. So it must be cudaMallocPitch calling something neither cuMemAllocPitch nor cuMemAlloc (also intercepted by my code) in its implementation.

According to older CUDA documentation, the v1 (deprecated) version of this function used unsigned int in place of size_t

http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUMEM_gc114ed3ec34256d446b8a2bc43ab7e3e.html

So the deprecated function signature is the following

cuMemAllocPitch 	( 	CUdeviceptr *  	dptr,
		unsigned int *  	pPitch,
		unsigned int  	WidthInBytes,
		unsigned int  	Height,
		unsigned int  	ElementSizeBytes	 
	)

Maybe it is important to hook into this API call with the correct (deprecated) signature.

However this #define

#define cuMemAllocPitch                     cuMemAllocPitch_v2

would be in the way of doing things differently for both versions. So be sure to #undef this in the right locations.