About Host API of curand curandGenerateUniform() function

I’m faced with the problem with curandGenerateUniformDouble() function when using GPU to accelerate my program. Does this API run on GPU or CPU? IF it works on GPU using a kernel, how does it generate configurations? If I use this API on host and generate random numbers on host, does the random numbers firstly generate on GPU and then transfer to host memory?
Thank you

Does this API run on GPU or CPU?

both. see following sample.

#include <cuda_runtime.h>
#include <curand.h>

#include <iostream>

int main() {
  using namespace std;
  curandGenerator_t genGPU;
  curandGenerator_t genCPU;

  curandCreateGenerator(&genGPU, CURAND_RNG_PSEUDO_MTGP32);
  curandSetPseudoRandomGeneratorSeed(genGPU, 1234ULL);
  curandCreateGeneratorHost(&genCPU, CURAND_RNG_PSEUDO_MTGP32);
  curandSetPseudoRandomGeneratorSeed(genCPU, 1234ULL);

  const int n = 10;
  double CPU[n];
  double GPU[n];

  double* d_GPU;
  cudaMalloc(&d_GPU, n*sizeof(double));

  curandGenerateUniformDouble(genCPU, CPU, n);
  curandGenerateUniformDouble(genGPU, d_GPU, n);
  cudaMemcpy(GPU, d_GPU, n*sizeof(double), cudaMemcpyDeviceToHost);

  for ( int i = 0; i < n; ++i ) {
    cout << CPU[i] << ' ' << GPU[i] << endl;



Thank you for answering my questions. I know that Host API can call on both sides. my question is that whether it total works on CPU when use curandCreateGeneratorHost(). That is ,when I disable my GPU, it does not work any more. The reason why I ask this question is that I find it is extremely slow when generating small size sequence(0.16s ,size=100). I read the manual which says that it need a kernel to generate random numbers. So I ask this question to confirm my conclusion. Thank you

run test codes using the profiler. It will immediately be obvious whether it is using the GPU. You can also observe host/device data transfers, size and direction.