Hello, I am having some strange illegal access when using UM in a cooperative group kernel.
The launch is as follows:
__global__
void kern(void * mm) {
int * p = (int*)mm+0x3FFFF000; // last 4kb of the 1gb um buffer
*p = 0; // illegal access here
}
int main() {
...
void * mm;
cudaMallocManaged(&mm,1024*1024*1024);
...
// setup launch params
int i;
cudaLaunchParams * lparams = ...
void ** params = (void**)malloc(1*sizeof(void*));
params[0] = (void*)&mm;
// devices is 4 here
for(i=0;i<devices;i++) {
...
lparams[i].func = (void*)kern;
lparams[i].args = params;
}
cudaLaunchCooperativeKernelMultiDevice(lparams,devices);
...
}
It is odd, since if I initialize the last 4kb page in the host before launching the kernel I get no access violation. Am I missing something here for UM usage?
As far as I know, in the C++ order-of-operations precedence stack, the various kinds of type-casting occur before ordinary arithmetic. Therefore mm is first reinterpreted as a int pointer, and then it is offset by 0x3FFFF000 int quantities. That would extend out to something around 4 Gigabytes, because my calculator tells me that:
0x3FFFF000 = 1,073,737,728
so, when that offset is applied to a int pointer, the corresponding byte offset is something like 1,073,737,728*sizeof(int) = 4,294,950,912
Since you’ve not provided a complete test case, I can only offer comments based on what I see of what you have posted. This may or may not be related to observations, and I can’t explain the behavior of code you haven’t shown. I encourage people who want help to provide complete test cases.