Hi, I am trying to speed up some scientific computation in Mathematica with cudalink.
Mathematicas cudalink just passes stuff along to visual studio for compilation.
Now my problem is with memory allocation. I tried some examples from the nvidias website. For isntance the prefix sum calculation code.
It fails to run as such, because what seems as a segfault from
extern __shared __ float temp;
So I guess I have to do allocation of shared memory as i call the kernel.
In mathematica I do this by passing in a parameter along the run (I haven’t found how to directly allocate at kernelcall in cudalink) so that i have
extern __shared __ float temp[BLOCK_DIM];
but now i get the compiler error
error: __local __ and __shared __ variables cannot have external linkage
And once i remove extern things seemingly work, but i get the wrong results from the prefix sum code.
the code was taken from here
But i also had this same problem with another example i found online. Could someone help me understand this? If i am able to assign at kernel call memory, then can i just specify it as external, or is there some different version of cuda or something that this example is made udner?