Portable pinned memory deallocation

Oxydius · January 22, 2010, 10:54pm

I’m implementing a pool of pinned host buffers to be shared by multiple GPU contexts. To do so, context 1 allocates a heap using cuMemHostAlloc and CU_MEMHOSTALLOC_PORTABLE. When context 2 joins in and starts requiring buffers from the pool, it exceeds the capacity of the heap and decides to replace it by a larger heap, again allocating a larger segment of portable pinned memory. If context 1 is not holding any reference to the first heap, then context 2 may call cuMemFreeHost on it, returning CUDA_ERROR_UNKNOWN.

Is it possible that portable pinned memory can only be used by multiple contexts, but still needs to be freed by the same context who allocated it? Would that be a bug or an undocumented limitation? Any ideas? I noticed this behavior with both the 195.62 and 196.21 drivers.

gonnet · January 26, 2010, 5:38pm

Hi everyone,

Interestingly, it seems that’s deallocation from another thread is working, but that CUDA is broken : if the thread that allocated the memory is dead by the time you deallocate the buffer, you have a problem, otherwise it looks ok.

Appart from that issue (which we can quite easily avoid if we can make sure that all threads are alive), i’m also interested in knowing whether this is “officially” a legal thing to deallocate a buffer from any context. It’s quite important because otherwise we’ll have to keep track of who allocated every piece of data. In the case of the producer-consummer paradigm (with multiple producers), that would make a big difference for instance.

Just my 2 cents,

CÃ©dric

PS: I enclosed the little repro case: (comment the #define TRIGGER_CUDA_BUG to have the problem to disappear).

[codebox]

#include <cuda.h>

#include <cuda_runtime_api.h>

#include <stdio.h>

#include <unistd.h>

#include <pthread.h>

#define TRIGGER_CUDA_BUG 1

float *buffer;

size_t len = 4096*4096;

int reached = 0;

pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void *alloc_thread(void *arg)

{

    cudaSetDevice(1);

cudaError_t res;

    res = cudaHostAlloc((void **)&buffer, len, cudaHostAllocPortable);

    fprintf(stderr, "cudaHostAlloc returns %d\n", res);

pthread_mutex_lock(&mutex);

    reached = 1;

    pthread_cond_signal(&cond);

    pthread_mutex_unlock(&mutex);

#ifndef TRIGGER_CUDA_BUG

    sleep(10);

#endif

return NULL;

}

int main(int argc, char **argv)

{

    pthread_t th;

    pthread_create(&th, NULL, alloc_thread, NULL);

sleep(1);

cudaSetDevice(0);

pthread_mutex_lock(&mutex);

    if (!reached)

            pthread_cond_wait(&cond, &mutex);

    pthread_mutex_unlock(&mutex);

#ifdef TRIGGER_CUDA_BUG

    void *ret;

    pthread_join(th, &ret);

#endif

int res = cudaFreeHost(buffer);

    fprintf(stderr, "cudaFreeHost returns %d\n", res);

return 0;

}

[/codebox]

Topic		Replies	Views
cudaHostAlloc and thread safety problems with pinned, portable memory CUDA Programming and Performance	2	1922	April 8, 2011
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11453	November 15, 2008
do cudaMallocHost and cudaHostAlloc implicitly create a context? CUDA Programming and Performance	4	1334	January 17, 2011
Pinned memory does not play nice with ctx management CUDA Programming and Performance	3	4670	November 7, 2008
Portable Pinned memory CUDA Programming and Performance	3	2802	March 24, 2010
pinned memory cannot be freed on one of multi-GPUs CUDA Programming and Performance	3	920	September 13, 2019
contexts vs portable memory allocation potential bug in cuda driver api CUDA Programming and Performance	1	1727	February 5, 2010
How to pass two flags to cudaHostAlloc()? CUDA Programming and Performance	5	9342	June 17, 2009
Mapped memory across multiple GPUs CUDA Programming and Performance	3	8817	October 28, 2010
CUDAFreeHost() not clearing allocated host memory, when multiple devices are used. CUDA Programming and Performance	2	1277	November 13, 2019

Portable pinned memory deallocation

Related topics