Does the driver JIT support PTX 1.3/2.0 binaries on older cards? If so, how?

So I’m trying to run some PTX binaries compiled with sm_13 on a 8800GTS that supports sm_10. Here is a simple test case that reproduces the error:

.version 1.4

.target sm_13

.entry increment

(

	.param .u64 memory

)

{

.reg .u64 %lr<1>;

.reg .u32 %r<2>;

Entry:

	ld.param.u64 %lr0, [memory];

	ld.global.u32 %r0, [%lr0 + 0];

	add.u32 %r1, %r0, 1;

	st.global.u32 [%lr0 + 0], %r1;

Exit:

	exit;

}

edit: I know that this doesn’t use any sm_13 features, but what if it did?

Here is how I am trying to load it:

CUjit_option options[] = {CU_JIT_ERROR_LOG_BUFFER, 

			CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES};

		const int errorLogSize = 2048;

		int errorLogActualSize = errorLogSize;

		char errorLogBuffer[errorLogSize];

		memset(errorLogBuffer, 0, errorLogSize);

		void* optionValues[2] = {(void*)errorLogBuffer, 

			(void*)errorLogActualSize};

		CUresult result = driver::cuModuleLoadDataEx(&_handle, 

			stream.str().c_str(), 2, options, optionValues);

This works fine on say a C1060, but I run into the following error on an 8800GTS:

ptxas fatal   : SM version specified by .target is higher than default SM version assumed

error   : Ptx compilation failed: gpu='sm_10', device code='cuModuleLoadDataEx_92'

Isn’t the whole point of PTX to be backwards/forwards compatible with previous/future gpus? Am I missing something?

[deleted]

I thought only forwards compatible, the idea is as far as I know that you compile with the sm level your code is using. Then future generations (like 1.2 and 1.3 capable cards can compile the code optimally for that platform (twice the registers e.g.)) If you are telling that your code needs sm_13, how can it ever run on a 1.0 capable card?

Yeah, you are probably right about that. It would be hard/impossible to emulate atomics or selective barriers on previous generations. I was hoping for some emulation layer, but guess I should really only expect forward compatibility.

yeah I am 99% sure it’s forward compatibility only.

Btw, you can set options to compile multiple ptx variants for different architectures and include all of them in the program. Sorry, I understand you are using driver api.