I have a little problem with a piece of code. I would ask you if this is a nvcc compiler bug or it’s an hardware limitation. I have a solution yet! :)
If I create an array in constant memory and I point it with a pointer also stored in constant memory, the result is that the PTX generated has the array in GLOBAL memory and the pointer to it in CONSTANT memory, but I declared both of them in CONSTANT memory.
The situation is extendible if I create an array of pointers that everyone points to an array.
I use a C2070 Fermi Card and I compile by command line Bash with options “-arch=sm_20 -keep”.
It isn’t a space availability problem. What could it be?
I would only report the problem, if it’s really a problem. :)
Thank you in advance!
Fermi-class devices use “generic addresses”, which means they got rid of separate address spaces for each memory type as on previous architectures. Instead the memory type is deduced from the address of the memory. You are probably seeing just this, which should not be a problem at all.
EDIT: Sorry, my reply was complete nonsense. I’ve forgotten about the crucial part that Norbert mentions below, that generic addressing for constant memory is not implemented.
Ok! But this strange behavior impacts the performances, because they descends from about 5.05MSample/s to about 4.67MSample/s. This is due to global memory access, instead of constant.
In C/C++ pointers are generic addresses, there is no concept of address spaces. The Fermi architecture supports this by implementing generic addressing, and ways of converting address-space specific addresses to generic addresses (see the CVTA instruction in the PTX specification). Unfortunately, conversion of constant space addresses to generic addresses hit a snag, and so when you take the address of a constant memory object to get a generic pointer, the object currently needs to be placed into global memory instead. The obvious workaround is not to take the address of the constant space object, if the 10% performance difference you observe is crucial to your application.
Yes, in fact my solution is organizing the arrays as matrix that holds the total of all arrays.
Thus it doesn’t seem to be a bug.
Thanks all !!!