Hey,
I’m using massif tool (part of valgrind toolchain) to profile host memory usage in my C++ code on Linux. I’m noticing that creating a cublas context alone consumes more than 400M of RAM, per valgrind report. Hence a couple of questions. Is this kind of memory footprint normal? If so, is there a SIMPLE way to reduce the footprint significantly?
Here’s my test code and run instructions (save as context-tst.cpp):
#include <iostream>
#include <unistd.h>
#include <cuda.h>
#include <cublas_v2.h>
int main() {
cublasHandle_t cublasHandle;
cublasStatus_t status = cublasCreate(& cublasHandle);
if (CUBLAS_STATUS_SUCCESS == status)
std::cerr << "CUDA BLAS context creation succeeded" << std::endl;
else {
std::cerr << "CUDA BLAS context creation failed, status: " << status
<< std::endl;
exit(1);
}
sleep(5);
cublasDestroy(cublasHandle);
sleep(5);
return 0;
}
To execute:
- Compile: nvcc -pg -lcublas context-tst.cpp -o context-tst
- Profile: valgrind --tool=massif ./context-tst
- Report: ms_print massif.out.<SUBSTITUTE_YOUR_PID_HERE> | less
The essence of the RAM allocation report is below:
MB
414.3^ :
| @:#::
| @:@:@:#::
| @@:@:@:#::
| ::::@@@:@:@:#::
| @:@@::::@@@:@:@:#::
| @@@@:@ ::::@@@:@:@:#::
| ::@@@@:@ ::::@@@:@:@:#::
| @@:::::@@@@:@ ::::@@@:@:@:#::
| @:@ : :::@@@@:@ ::::@@@:@:@:#::
| :::::@:@ : :::@@@@:@ ::::@@@:@:@:#::
| : :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| ::::@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| @:@@:: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| @@@:::@ @ :: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| @@@@@ :: @ @ :: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| @::@@ @@@ :: @ @ :: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| :: :::@: @@ @@@ :: @ @ :: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| ::: ::: @: @@ @@@ :: @ @ :: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
| @:: : ::: @: @@ @@@ :: @ @ :: :@: :: @:@ : :::@@@@:@ ::::@@@:@:@:#::
0 ±---------------------------------------------------------------------->Gi
0 1.641
I’m also noting a bunch of warnings produced by valgrind:
>
> ==1303== Massif, a heap profiler
> ==1303== Copyright (C) 2003-2017, and GNU GPL’d, by Nicholas Nethercote
> ==1303== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==1303== Command: ./context-tst
> ==1303==
> ==1303== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> ==1303== Warning: noted but unhandled ioctl 0x48 with no size/direction hints.
> ==1303== This could cause spurious value errors to appear.
> ==1303== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
> CUDA BLAS context creation succeeded
> ==1303==
> ==1303== Process terminating with default action of signal 27 (SIGPROF)
> ==1303== at 0xD464F85: pthread_cond_timedwait@@GLIBC_2.3.2 (futex-internal.h:205)
> ==1303== by 0x1CB2B0EE: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.39)
> ==1303== by 0x1CC24C35: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.39)
> ==1303== by 0xD45E6DA: start_thread (pthread_create.c:463)
> ==1303==
> Profiling timer expired
Thank you in advance for your thoughts!