I have started using thrust::sort_by_key which has increased my compile time by an order of magnitude (also the memory requirements). I understand that this is a known issue with thrust, however would it not be possible to simply run several instances of ptxas in parallel on a multi-cpu machine? I don’t know exactly what’s going on in there, but by watching the output with --ptxas-options=-v set, it seems like ptxas is just compiling one kernel after another sequentially.
I would break things up into different files and let make do the work of parallelizing the build, but the time consuming thrust code is all in a single host function.
I’m using a dual core AMD processor with 6GB of ram (used to be 2GB!), Quadro 5000, and Ubuntu 10.10 server.