I keep getting a “500” error from the Tsukuba server, so I had a go at it myself with the “mini-gmp” found in the sources. I renamed mini-gmp.cpp mini-gmp.cu, decorated (almost) everything with “host device” and rewrote the realloc routine, as there’s no realloc in cuda. The only compiler warning I get is “#20011-D: calling a host function(“abort”) from a host device function(“gmp_die”) is not allowed”. Don’t know how to get around that on (asm(“trap;”); works only on the device and gives me a compile error) - ignoring for now.
Here’s my little test programm obviously using unified memory (sm_86):
#include “mini-gmp_.cu”
device managed mpz_t t;
global void test() {
printf(“tst: 0x%lx\n”, t);
printf(“tst: %d\n”, t->_mp_alloc);
printf(“tst: %d\n”, t->_mp_size);
printf(“tst: 0x%lx\n”, t->_mp_d);
printf(“tst: %ld\n”, t->_mp_d[0]);
}
int main() {
cudaError err;
mpz_init_set_ui(t, 10);
printf(“t: %ld, 0x%lx\n”, mpz_get_ui(t), t);
printf(“_mp_d: 0x%lx\n”, t->_mp_d);
test<<<1, 1>>>();
err = cudaDeviceSynchronize();
printf(“sync: %s\n”, cudaGetErrorString(err));
cudaFree(t);
return 0;
}
mpz_t is defined so in mini-gmp.h:
typedef struct
{
int _mp_alloc; /* Number of limbs allocated and pointed
to by the _mp_d field. /
int _mp_size; / abs(_mp_size) is the number of limbs the
last field points to. If _mp_size is
negative this is a negative number. */
mp_limb_t _mp_d; / Pointer to the limbs. */
} __mpz_struct;
typedef __mpz_struct mpz_t[1];
“mp_limb_t” is unsigned long.
It gives me this output:
t: 10, 0x7f3916000000
_mp_d: 0x211ca70
tst: 0x7f3916000000
tst: 1
tst: 1
tst: 0x211ca70
sync: an illegal memory access was encountered
The exact addresses will vary between runs, of course. And compute-sanitizer:
========= COMPUTE-SANITIZER
t: 10, 0x7f3916000000
_mp_d: 0x211ca70
tst: 0x7f3916000000
tst: 1
tst: 1
tst: 0x211ca70
========= Invalid global read of size 8 bytes
========= at 0xa40 in /home/fiddler/gmp-cu/test.cu:11:test()
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x211ca70 is out of bounds
========= and is 8.653.780.368 bytes before the nearest allocation at 0x205e
00000 of size 8.388.864 bytes
So on the CPU everything is in order, the mpz_t struct gets copied to the GPU as well, including the pointer to the array (*_mp_d), but not the array itself: t->_mp_d is correct, but t->_mp_d[0] is OOB on the GPU. What’s going on here?