I’m experiencing some unusual behavior for PTX instructions cvta and isspacep on a GeForce GTX 470.
cvta is a PTX instruction for converting pointers in .global, .shared, and .local state spaces to generic addresses. Each of the segmented address spaces is mapped to non-overlapping regions in a single unified address space.
cvta.to. converts this generic pointer back to the indicated state space with undefined results if the pointer refers to the wrong state space. The instruction isspacep. sets a predicate register if a generic pointer belongs to a given address space, so PTX programs can query generic pointers before converting.
A simple test program illustrates my interpretation of these instructions (attached to this post, also available in http://code.google.com/p/gpuocelot/source/…a/test/driver):
generic.cpp [host program]
generic.ptx [PTX kernel]
This uses the CUDA driver API to load a PTX kernel from a file. The host program prints resulting predicate values from each permutation of:
{global ptr, local ptr, shared ptr} x {isspacep.global, isspacep.local, isspacep.shared}
When I run this program on a GeForce GTX470, I see several unusual results. Specifically, generic pointers to .shared test positive with isspacep.local and generic pointers to .local test positive with isspacep.shared. More disturbingly, .local pointers do not test positive with isspace.local and .shared pointers do not test positive with isspace.shared.
You should be able to compile this program using:
$ g++ generic.cpp -o GenericMemoryNative -lcuda
Output on my machine (Ubuntu 10.04 x64, CUDA 3.1, GeForce GTX470) with GenericMemoryNative:
$ ./GenericMemoryNative
%p1 - 0
%p2 - 0
%p4 - 1004
%p8 - 1008
Test: FAIL
$
The actual lines of PTX with the strange behavior are:
[codebox]mov.u64 %rd2, %rd1; // .global address
mov.u64 %rd3, $rs; // .shared address
mov.u64 %rd4, $rl; // .local address
cvta.global.u64 %rd2, %rd2;
cvta.shared.u64 %rd3, %rd3;
cvta.local.u64 %rd4, %rd4;
isspacep.global %p0, %rd2; // expect TRUE - program yields TRUE
isspacep.shared %p1, %rd3; // expect TRUE - program yields FALSE - ERROR??
isspacep.local %p2, %rd4; // expect TRUE - program yields FALSE - ERROR??
[/codebox]
Some of the other isspacep tests return true for queries in mismatched address spaces, but those could be correct if address spaces overlap [i.e. .local mapped to .shared or .global]. The above cases, though, return FALSE, and this might result in incorrect program behavior. By my interpretation, [font=“Courier New”]isspacep.shared %p1, %rd3;[/font] and [font=“Courier New”]isspacep.local %p2, %rd4;[/font] should return TRUE.
Does anyone see any problems with my interpretation of these instructions? Do you consider the test program to be valid?
Thanks for your help.
generic.zip (1.93 KB)