Constant memory alignment?

Hey all,

Is it possible to tell CUDA how to align constant memory?

In my case, I need to run a 2D copy between global device memory and constant memory, and due to the fact global memory can’t be dynamically allocated at runtime - there’s no real way to know how constant memory is aligned, if at all - making it impossible to do an aligned 2D memory copy between global device & constant memory.

Needless to say, unaligned 2D device->device memory copies are quite slow.

Am I right to assume it’s currently impossible to do what I need to? or am I missing something?
(The programming Guide, google, and the forums don’t make any reference to this problem.)

Cheers,