Excessive Kernel Launches on Context Creation

Recently I began extending a very boost dependent project to use CUDA for its innermost loop. I thought it would be worth posting here about some odd behaviour I’ve been seeing though. Simply including certain boost headers will cause my first cuda call to generate a large number of kernels.

If compile and debug the following code:
simplestCase.cu

#include <boost/thread.hpp>

int main(int argc, char **argv){
	int *myInt;
	cudaMalloc(&myInt, sizeof(int));
	return 0;
}

I get the following debug message lines upon executing cudaMalloc (same behaviour if I run a kernel I’ve defined. Seems like anything that triggers context creation will trigger this.):
[Launch of CUDA Kernel 0 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 1 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 2 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 3 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 4 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 5 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 6 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 7 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]
[Launch of CUDA Kernel 8 (memset32_post<<<(1,1,1),(64,1,1)>>>) on Device 0]

So far I have identified two headers that cause the problem:
boost/thread.hpp
boost/mpi.hpp

Here’s a bit of info that may be useful in replicating the problem:
IDE: nSight Eclipse edition
OS: ubuntu 12.04 x64
GPU: GeForce GTX 580 / Tesla K20 (tried both, happens on both. I believe my GeForce GT 520 is being used by my OS)
boost lib: 1.52
cat /proc/driver/nvidia/version:
-> NVRM version: NVIDIA UNIX x86_64 Kernel Module 310.32 Mon Jan 14 14:41:13 PST 2013
-> GCC version: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

project settings:
Properties->Build->CUDA->DeviceLinkerMode = Separate Compilation
Properties->Build->CUDA->GenerateGPUCode = 2.0
Properties->Build->Settings->ToolSettings->NVCCLinker->Libraries = boost_system
Properties->Name = simplest_case_example

It seems odd to me that very specific includes generate peripheral kernel calls, particularly since I don’t use those includes, and I don’t see how they could affect my interaction with CUDA. Is this expected behaviour? I see over 100 kernels launched in the project I’m working on now when the only CUDA related code I have in my project is a single cudaMalloc at the program’s entry point.

edit: recently updated to driver version 319.23 with no change in the mentioned behaviour, although it did fix a number of debugger malfunctions I was having in larger programs.