creating a global context using driver api by default context created using driver api seem to be th

The following is how I initialize my cuda context,

And they seem to be initialized to the thread and not the entire process.

After reading http://forums.nvidia…howtopic=194860 , I understand that the feature is supported in CUDA 4.0 but only for runtime API. Is there a CTX flag in CUDA 4.0 for driver API which can make the context global ?

Also, by making the context global, will I be able to free mapped memory from a thread other than the one which allocated it ? I tried using the flag CU_MEMHOSTALLOC_PORTABLE but I couldnt get it to work.

after creating with cuCtxCreate you get as global context as you can get. That context is also set current for the current thread, so there is no need to call cuCtxAttach in your case. To unbind the context from the current thread you call cuCtxPopCurrent.

after creating with cuCtxCreate you get as global context as you can get. That context is also set current for the current thread, so there is no need to call cuCtxAttach in your case. To unbind the context from the current thread you call cuCtxPopCurrent.

there’s no way to do what the runtime API does with the driver API in 4.0–we ran out of time (weren’t happy with the APIs). it’s coming in the next release.

there’s no way to do what the runtime API does with the driver API in 4.0–we ran out of time (weren’t happy with the APIs). it’s coming in the next release.

I cannot unbind the context since I would like the thread to continue running as usual. When I free one of my objects, there is a pointer to cuda mapped memory which needs to be freed as well, but the thread which frees my object is not the thread which runs the kernel. Using a mutex to synchronize the pop & push of a context will not work for me, although I find the approach interesting.

Ah ok. good to hear there is a scheduled release.

I cannot unbind the context since I would like the thread to continue running as usual. When I free one of my objects, there is a pointer to cuda mapped memory which needs to be freed as well, but the thread which frees my object is not the thread which runs the kernel. Using a mutex to synchronize the pop & push of a context will not work for me, although I find the approach interesting.

Ah ok. good to hear there is a scheduled release.

Sounds very weird.

First of all after creating the context do a pop, this will avoid making the just created context active/current until you want it.

Then later push the context in the thread you want to use to run the kernel, once it’s done running and you want to exit the context do a pop.

So code should be:

// can probably be any thread:

// create context

// pop current context

// in thread you want to use for kernel:

// enter context, do a push

// use context.

// leave context, do a pop

// any thread:

// destroy context

Sounds very weird.

First of all after creating the context do a pop, this will avoid making the just created context active/current until you want it.

Then later push the context in the thread you want to use to run the kernel, once it’s done running and you want to exit the context do a pop.

So code should be:

// can probably be any thread:

// create context

// pop current context

// in thread you want to use for kernel:

// enter context, do a push

// use context.

// leave context, do a pop

// any thread:

// destroy context

It’s not weird if you work with multiple devices from the same thread.

It’s not weird if you work with multiple devices from the same thread.

So ? Create a context for each device…

However a thread can only have one current context at a time.

So the thread will need to switch contexts…

This is done via push and pop.

You push the context you want to work with and when you done you pop it…

And then you repeat it for the other contexts !

So what is your problem ?!?

Contexts can “float” which mean they are not associated with anything.

I think you think that a context belongs to a thread, while in reality it probably does not.

Also what gave you the idea that a thread does not run in the same process space/virtual memory ?!?

At least in windows all threads of a process share the same virtual memory.

So I have absolutely no idea what you think the problem is…

I think the problem is in your head and probably not a real problem External Image :)

Unless ofcourse you can explain it better… you mention something about mapping memory ?

So ? Create a context for each device…

However a thread can only have one current context at a time.

So the thread will need to switch contexts…

This is done via push and pop.

You push the context you want to work with and when you done you pop it…

And then you repeat it for the other contexts !

So what is your problem ?!?

Contexts can “float” which mean they are not associated with anything.

I think you think that a context belongs to a thread, while in reality it probably does not.

Also what gave you the idea that a thread does not run in the same process space/virtual memory ?!?

At least in windows all threads of a process share the same virtual memory.

So I have absolutely no idea what you think the problem is…

I think the problem is in your head and probably not a real problem External Image :)

Unless ofcourse you can explain it better… you mention something about mapping memory ?