Hi,
I’m developing a hook library for some cuda calls to add extra features for GPU usage control.
I followed the sample released with CUDA toolkit (7_CUDALibraries/cuHook) to create customized versions of cuMemAlloc and cuMemFree, which can successfully intercept cudaMalloc and cudaFree calls.
However, for cudaMallocPitch (in cuda runtime library), I cannot find any corresponding functions to intercept in cuda driver library. I’ve tried to intercept cuMemAllocPitch, but when user calling cudaMallocPitch, the call was not being intercepted by hook library.
Below is the code I wrote to intercept cuMemAllocPitch:
CUresult CUDAAPI cuMemAllocPitch(CUdeviceptr* dptr, size_t* pPitch, size_t WidthInBytes, size_t Height, unsigned int ElementSizeBytes) {
printf("intercepted!\n");
return CUDA_SUCCESS;
}
And I use the code below to test if the function call is intercepted:
#include <cuda_runtime.h>
#include <unistd.h>
int main() {
int *ptr;
size_t pitch;
cudaMallocPitch((void**)&ptr, &pitch, 1000 * sizeof(int), 1000);
return 0;
}
Nothing is printed on the standard output.
Trying to override cudaMallocPitch in the hook library will cause ‘multiple definition of cudaMallocPitch’ during linking time.
Testing environment:
- OS: CentOS 7.6
- Compiler: gcc-4.8.5
- CUDA version: CUDA 10.0
- GPU: NVIDIA Tesla V100
Thanks for reading and appreciate for your help!
Intercept cuMemAllocPitch_v2() ?
See what function parameters have changed compared to cuMemAllocPitch() by inspecting header files.
I bet that the cuda runtime is now exclusively using the newer version
In /usr/local/cuda-10.0/include/cuda.h, I found the following line (line 101):
#define cuMemAllocPitch cuMemAllocPitch_v2
And for the code in question, I’ve included <cuda.h>, so this define must be working.
I’ve tried intercepting cuMemAllocPitch_v2() too. Still not intercepted by my LD_PRELOAD library.
The function definition in cuda.h is (line 4702):
CUresult CUDAAPI cuMemAllocPitch(CUdeviceptr *dptr, size_t *pPitch, size_t WidthInBytes, size_t Height, unsigned int ElementSizeBytes);
Which matches my definition.
If I call cuMemAllocPitch directly, the call can be intercepted normally. So it must be cudaMallocPitch calling something neither cuMemAllocPitch nor cuMemAlloc (also intercepted by my code) in its implementation.
According to older CUDA documentation, the v1 (deprecated) version of this function used unsigned int in place of size_t
http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUMEM_gc114ed3ec34256d446b8a2bc43ab7e3e.html
So the deprecated function signature is the following
cuMemAllocPitch ( CUdeviceptr * dptr,
unsigned int * pPitch,
unsigned int WidthInBytes,
unsigned int Height,
unsigned int ElementSizeBytes
)
Maybe it is important to hook into this API call with the correct (deprecated) signature.
However this #define
#define cuMemAllocPitch cuMemAllocPitch_v2
would be in the way of doing things differently for both versions. So be sure to #undef this in the right locations.