I somehow isolated the problems, and it looks like a bug to me.
Independently from the size of the data transferred, calling the clEnqueueWriteBuffer function in asynchronous mode results in a memory leakage. This memory is freed only on terminating the program execution.
Here a simple code that reproduce the behaviour:
#include <iostream>
#include <time.h>
#include "CL/cl.hpp"
#include <unistd.h>
int main() {
size_t MAX_SIZE = 10;
::clock_t start, finish;
typedef double ScalarType;
ScalarType* ram_vec1 = new ScalarType[MAX_SIZE];
for (unsigned int i = 0; i < MAX_SIZE; ++i) {
ram_vec1[i] = 1.;
}
cl_int err = CL_SUCCESS;
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
if (platforms.size() == 0) {
std::cout << "Platform size 0\n";
return -1;
}
cl_context_properties properties[] = { CL_CONTEXT_PLATFORM, (cl_context_properties) (platforms[0])(), 0 };
cl::Context context(CL_DEVICE_TYPE_GPU, properties);
std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES> ();
cl::Event event;
cl::CommandQueue queue(context, devices[0], 0, &err);
cl::Buffer buffer(context, CL_MEM_WRITE_ONLY, sizeof(ScalarType) * MAX_SIZE, 0, &err);
// Calling clEnqueueWriteBuffer in async mode a million times
for (int i = 0; i < 1000000; i++) {
start = clock();
queue.enqueueWriteBuffer(buffer, CL_FALSE, 0, sizeof(ScalarType) * MAX_SIZE, ram_vec1, 0, &event);
event.wait();
finish = clock();
std::cout << "[" << i << "] Time: " << (double(finish - start) / CLOCKS_PER_SEC) << "sec" << std::endl;
}
// using up to 1.5 GB of RAM on my computer now
// the memory usage is constant in these 10 seconds
sleep(10);
std::cout << "Terminating now." << std::endl;
// all the memory is freed upon exit
return 0;
}
The problem disappears if I change the lines
queue.enqueueWriteBuffer(buffer, CL_FALSE, 0, sizeof(ScalarType) * MAX_SIZE, ram_vec1, 0, &event);
event.wait();
with
queue.enqueueWriteBuffer(buffer, CL_TRUE, 0, sizeof(ScalarType) * MAX_SIZE, ram_vec1, 0, 0);
even though I’d like to have more or less the same behaviour.
Obviously in my program I don’t call the event.wait() immediately after the clEnqueueWriteBuffer.
Still I’d like to use async transfers, the whole program is slow otherwise.
I’d really appreciate if anybody can double-check this behaviour and eventually explain it to me.
Regards,
Federico