Problem with some OpenCL programs on linux

First the environment: Debian GNU/Linux with kernel 3.1.0-1-amd64, Nvidia drivers 290.10-1 from Debian distro (nvidia-opencl-icd, nvidia-glx, xserver-xorg-video-nvidia and so on), xorg 1:7.6+9. Hardware is Asus K8N-DL, 2x AMD Opteron 285, 12 Gb memory, Quadro 600.

All the samples from the NVIDIA GPU Computing SDK work fine. The problem is that both JOCL and aparapi do not work, i.e. clGetPlatformIDs returns -1001. Without the source code of the drivers, the best I could do is to run a strace on both a working program and on one that does not. The only difference I could see is into a mmap, sometimes after the libcuda.so library is loaded. First this is a strace fragment on the successful execution of oclDeviceQuery:

22713 sysinfo({uptime=164643, loads=[5152, 8832, 12704] totalram=12633800704, freeram=4460183552, sharedram=0, bufferram=242556928} totalswap=24330104832, freeswap=24287424512, procs=331}) = 0
22713 getrlimit(RLIMIT_AS, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
22713 mmap(0x200000000, 25769803776, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x200000000

Then the strace on the same place on a failed execution of the Mandel example in aparapi:

19408 sysinfo({uptime=163360, loads=[3872, 14496, 18880] totalram=12633800704, freeram=4475822080, sharedram=0, bufferram=238653440} totalswap=24330104832, freeswap=24287416320, procs=349}) = 0
19408 getrlimit(RLIMIT_AS, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19408 mmap(0x200000000, 25769803776, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7fe0a696b000
19408 munmap(0x7fe0a696b000, 25769803776) = 0
19408 munmap(0x7fe0a696b000, 25769803776) = 0

The driver does not seem to like that the address returned is different from the address requested.

At this point I am running out of idea to fix this. Any suggestion?

I found a workaround, which confirm the symptom seen. The problem is that Java creates by default some of its memory pools at the same place that will later be requested by libcuda (0x200000000-0x800000000). Changing the size of the heap changes where Java allocates the memory, so the workaround in my case is to add a -Xmx1G parameter (but YMMV).