Hi, I have some legacy CUDA code with a few kernels and quite a few
large arrays. In most cases these are shadowed, ie the
array exists on both host and GPU. This is to ease the orgnanisation of transfers
between the host and the GPU and (in some cases) back again.
As more powerful GPUs have been introduced,
we have been setting the size of some of the buffers by the
amount of free space on the GPU. Mostly this works. But recently the range
of size of data to be processed by the user has also increased and sometimes
the buffer size calculation has gone wrong. I guess we should just fix the
bug, but there are many arrays of different sizes and keeping track of
how much memory we think they will take compared to how much memory CUDA
thinks is free on the GPU, is getting increasingly complicated (and so error
prone).
Has anyone else experienced this problem?
I was wondering about making our code more resilient, in the sense that instead
of aborting when CUDA says there is no memory left on the card, we should free
everything, half the buffer size and try again.
Can anyone see problems with this?
Will this be ok on virtual machines running in the cloud?
Will this fragment the GPU’s memory?
Should we also reset the GPU?
Any views on downsides of doing a reset?
(increasingly the GPU is in a remote machine room)
If we have to do the buffer calculation properly, does anyone have any advice
on the best way to organise out code so that maintaining it is not such
a nightmare?
Getting the buffer size right (ie as big, but no bigger, will support) seems to
give a 10-20% performance boost, so it seems worth doing.
Any help or advice would be most welcome
Bill