self modifying code

spacerat · April 9, 2008, 4:15pm

Is there or will there be the possibility to write self modifing code in CUDA ?
It could help to reduce the amount of registers and therefore further help to increase the number of threads that might run at the same time.

GarryB · April 9, 2008, 7:12pm

I’d like to add a further compatible question. Is there likely to be a way to dynamically generate code?

I am very interested in the new work being done by Alan Kay, and many of the other SmallTalk luminaries, along with some newer pleople like Ian Piumarta, Anyone interested in truly amazing software should have a look at Viewpoints Research Institute, and especially the publications Writings

Specifically the paper Steps Toward the Reinvention of Programming by A. Kay, I. Piumarta, K. Rose, D. Ingalls, D. Amelang, T. Kaehler, Y. Ohshima, C. Thacker, S. Wallace, A. Warth, T. Yamamiya.

In a couple of projects (JitBlt, Gezira), they are writing (admittedly 2D) powerful graphics systems in only a few hundered lines of code, and getting good performance by ‘JIT’ compiling machine code. I believe the technique has been migrated back into Cairo because it generates very fast rendering code. Very cool.

The cognate is dynamically generating the code for a kernel for a specific data set. I am thinking that I don’t want to generate CUDA code, and go via nvopencc, but generate a form of ptx into a memory buffer, and hand that over to the driver to handle it.

I’d like to know if/when we’ll be able to implement direct, fast, ptx generation (on the host side) for immediate loading and execution on the GPU?

GB

tachyon_john · April 10, 2008, 3:52am

I don’t see any reason why you couldn’t generate PTX on the host side, compile it to a shared library of some sort using nvcc/ptxas, and then use dlopen()/dlsym()/dlclose() (Unix) or LoadLibrary() etc on Windows, to use the dynamically generated code. People have been doing things like this for years as a somewhat portable means of generating native machine code. It’s not as fast as banging the metal and writing to the text segment while code is running, but that stuff is becoming more difficult as the security protocols enforced by modern operating systems have begun to frown on these things. Generating PTX, linking to a shared lib and subsequently loading it won’t be very fast, but you can at least prototype your code if it’s that important to you.

John

wumpus · April 10, 2008, 10:05am

Yes, I’ve done this… the generate CUDA code on the fly thing, that is. It’s not very different from generating shaders on the fly like is usual with graphics programming. Just print out ptx or c code to a string and feed it to the compiler. Then load it using the CUDA API.

I’m quite sure self-modification is not possible. CUDA programs can read the memory where they are hosted, but writing to it doesn’t do anything (also, remember that CUDA is meant to be very parallel, and all blocks are executing the same code… it will always be a mess)

GarryB · April 11, 2008, 11:27pm

Wumpus, thanks, that sounds right.

I assume this works by creating a cubin ‘file’ on the fly, then calling cuModuleLoad() or cuModuleLoadData()?

Can you explain what cuModuleLoadData does? I don’t get the documentation, and I can’t find an example or a mention on the forums (or by googling), but it looks like I can pass a faked-up cubin file as a string in the ‘image’ parameter.

I would want this to be pretty quick (milli-seconds). Is the method you’ve tried in that ball-park?

TIA

GB

lars · April 16, 2008, 7:49am

I guess you mean using cuModuleLoad(), cuLaunchGrid() etc… Is it possible to mix these Driver API functions and kernel launches with my original code that uses the Runtime API, or do I have to rewrite all of my code to use the driver API only?

/Lars

jordyvaneijk · April 16, 2008, 9:08am

I think I read somewhere that you cannot mix those two together.

Section 4.5 Host Runtime Component of the programming guide

Topic		Replies	Views
Generate CUDA at run-time ? CUDA Programming and Performance	13	3066	September 28, 2011
Dynamic Kernel Function Runtime code generation CUDA Programming and Performance	17	25658	March 26, 2013
PTX Code Transformations CUDA Programming and Performance	2	5144	September 16, 2009
Change source code? How exactly... CUDA Programming and Performance	6	2457	October 3, 2008
On-the-fly compilation CUDA Programming and Performance	9	1729	September 26, 2013
How do You Run a CUDA Program on Multiple Systems? CUDA Programming and Performance	8	6306	August 16, 2011
How should I generate CUDA codes in Runtime ? maybe can't CUDA Programming and Performance	3	5163	October 24, 2007
Interfaces for CUDA programs and portability. CUDA Programming and Performance	7	1292	May 26, 2014
On-the-fly recompilation how to alter kernel after launch CUDA Programming and Performance	9	4464	October 3, 2008
Going to learn PTX and write a GPU compiler CUDA Programming and Performance	20	26832	January 19, 2009

self modifying code

Related topics