I need to write a toy Monte Carlo in a Python application, but want to reuse the Thrust deviceside RNG functions. It turns out this is not so hard. In case someone is curious, here’s a quick example application showing how:
import pycuda.autoinit, pycuda.driver
from pycuda.compiler import SourceModule
import numpy
import time
mod = SourceModule('''
#include <thrust/random.h>
extern "C" {
__global__ void test_rand(int numRand, float *samples)
{
int threadID = blockIdx.x * gridDim.x + threadIdx.x;
thrust::default_random_engine rng;
rng.discard(numRand * threadID);
thrust::uniform_real_distribution<float> rand01(0,1);
float acc=0.0;
for (int i=0; i < numRand; i++) {
float r = rand01(rng);
acc += r;
}
samples[threadID] = acc/numRand; // Normalize back to range [0,1)
}
}
''', no_extern_c=True, include_dirs=['/home/seibert/tmp/'])
test_rand = mod.get_function('test_rand')
### Compute many sums of random numbers
numRand = 512
threads_per_block = 256
blocks = 512
threads = threads_per_block * blocks
print 'Sampling sum of', numRand, 'random numbers', threads, 'times.'
samples = numpy.zeros(threads, dtype=numpy.float32)
start = time.time()
test_rand(numpy.int32(numRand), pycuda.driver.Out(samples), grid=(blocks,1), block=(threads_per_block,1,1))
stop = time.time()
print 'Execution+copy time = %1.1f msec' % ( (stopstart) * 1000.0,)
### Look at results in a lame textonly way
### Squint hard! Central Limit Theorem says this is approximately a Gaussian.
hist, bin_edges = numpy.histogram(samples, bins=20, range=(0.3, 0.7))
print 'Distribution:', hist
Notes:

You need no_extern_c=True, otherwise PyCUDA will wrap your Thrust #includes with extern “C” and break compilation.

Then you need to manually wrap your kernel in extern “C”, otherwise the name will be mangled and you won’t be able to look it up later with get_function().

If you put thrust in a nonstandard location, include_dirs is required. (Remember to go up a level if you put “thrust” in the #include path)

Buyerbeware on the use of the default linear congruence generator. Unfortunately, taus88 performs terribly in this example because there is no efficient implementation of the discard() function. I’ll have to see if hashing the threadID for a seed, as in one of the thrust examples, is acceptable.