How are kernels embedded into the executable and how to mimic this in other languages/tools ?


My question is:

How are the cuda kernels embedded into the executable produced by Visual Studio 2010 / Visual C/C++ 2010 / NVCC ?

I can imagine different techniques:

  1. Embedding PTX as some kind of resource string or something/some other string.
    (Also if this string technique is chosen, how to compile this string ?)

  2. Some kind of binary equivalent resource.
    (How to load this, probably load image api ?)

What techniques are used storage-wise and loading/executing/api-wise ?

And which technique and compiler options are available to do the same for other languages like pascal/Delphi ?


I probably found the answer to my question in the gpu computing sdk 4.0.

The example in the folder “ptxjit”.

Shows how to store the ptx file in a string and load it into the device.

This is probably with visual studio does.

It compiles kernels to ptx and then stores it as a string somewhere in the executable and then loads it just like the example ;)

The api to use is:


This is pretty cool ! ;) =D