Using PTX actively

Hi,
I need generate a .ptx code with flag -ptx in nvcc command line arg and then i want modificate a .ptx then generate a executable to GPU.
Has anyone done this before?

Tanks!

Yes, see this project: http://code.google.com/p/gpuocelot

The approach that we take is to parse a PTX file, perform various transformations on it, emit another PTX program stored in host memory, load it as a module using the CUDA driver API, and then launch kernels within it using the CUDA driver API.

I tried to compile ocelot:

In file included from /usr/include/boost/dynamic_bitset.hpp:15,
from ./ocelot/executive/interface/CTAContext.h:12,
from ./ocelot/executive/interface/EmulatedKernel.h:21,
from ocelot/executive/implementation/Executive.cpp:17:
/usr/include/boost/dynamic_bitset/dynamic_bitset.hpp: In member function ‘size_t boost::dynamic_bitset<Block, Allocator>::count() const’:
/usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:1021: error: ‘mode’ cannot appear in a constant-expression
/usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:1021: error: template argument 1 is invalid
/usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:1021: error: expected >' before ‘*’ token /usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:1021: error: expected (’ before ‘*’ token
/usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:1021: error: expected primary-expression before ‘>’ token
make[1]: ** [libOcelotExecutive_la-Executive.lo] Erro 1
make[1]: Saindo do diretório `/home/pcslara/Desktop/ocelot-0.4.72’
make: ** [install] Erro 2

I go to read documentation about this and after i go write here.

That is a known problem with gcc 4.3 and boost-1.37 that Redhat and Ubuntu have been shipping. See here, for example. There is a patch floating around for it on the boost mailing lists.

EDIT: Redhat’s bug report has the patch I applied to fix it here.

I have done that, I think…

First, enable “verbose” option of NVCC and note down all the steps that lead to production of EXE.
Now, modify your PTX and run the commands that NVCC runs after generation of PTX :). I think it worked for me.

In dynamic_bitset.hpp change line 1022 to

return do_count(m_bits.begin(), num_blocks(), Block(0),

                                   static_cast<value_to_type<true> *>(0));

This modification fixed the problem in compilation.