Set CUDA_VISIBLE_DEVICES to run kernels on specific MIG instance

I am trying to run a cuda program that has multiple kernels on specific MIG instances using setenv() method to change CUDA_VISIBLE_DEVICES to switch between MIG instances where the kernels are launched as shown in the snipped below. However, all kernels only run on MIG instance that is first made visible using the setenv() method, even if I pass in the UUID of another MIG instance. How do I get around this limitation? Thanks

void runKernelOnMigPartition(const char* migUuid, int size, float* h_output, int kernelValue) {
    // Set the MIG partition via CUDA_VISIBLE_DEVICES
    if (setenv("CUDA_VISIBLE_DEVICES", migUuid, 1) != 0) {
        std::cerr << "Error setting CUDA_VISIBLE_DEVICES for MIG partition: " << migUuid << std::endl;
        return;
    }

    std::cout << "Running on MIG Partition: " << migUuid << std::endl;

    // Allocate device memory
    float* d_output;
    cudaMalloc(&d_output, size * sizeof(float));

    // Launch the kernel
    int threadsPerBlock = 256;
    int blocksPerGrid = (size + threadsPerBlock - 1) / threadsPerBlock;
    longRunningKernel<<<blocksPerGrid, threadsPerBlock>>>(d_output, size, kernelValue);
    cudaDeviceSynchronize();

    // Copy results back to host
    cudaMemcpy(h_output, d_output, size * sizeof(float), cudaMemcpyDeviceToHost);

    // Free device memory
    cudaFree(d_output);
}

By the time your application that uses the CUDA runtime API launches, one and only one MIG partition has already been assigned to your application/process. After that, subsequent changes to environment variables have no effect.

You must set the environment variable before your application initializes the CUDA runtime. This generally means, for runtime API applications, that you set these environment variables before app launch.

Although I haven’t tried it, it may be possible with a driver API application to set the environment variable programmatically from your application, once and only once, before any call to cuInit(). Thereafter, I wouldn’t expect any further changes could be made successfully. Again, I haven’t explored that. Carrying this one step further, it might be possible to create a multi-process application, using the driver API, and make sure not to call cuInit() in any parent or child process until the env var for each child process has been set, programmatically (don’t make use of CUDA in the parent process). That might allow one “application launch” to target multiple MIG instances.