Question about BlackScholes_nvrtc

Hi
I have a question about one of the examples in the SDK. For BlackScholes_nvrtc, I see the following lines for block and grid sizes:

    dim3 cudaBlockSize( 128, 1, 1);
    dim3 cudaGridSize(DIV_UP(OPT_N/2, 128),1,1);

    float risk = RISKFREE;
    float volatility = VOLATILITY;
    int optval = OPT_N;

    void *arr[] = { (void *)&d_CallResult, (void *)&d_PutResult, (void *)&d_StockPrice,
        (void *)&d_OptionStrike, (void *)&d_OptionYears, (void *)&risk, (void *)&volatility, (void *)&optval };

    for (i = 0; i < NUM_ITERATIONS; i++)
    {

        checkCudaErrors(cuLaunchKernel(kernel_addr,
                                            cudaGridSize.x, cudaGridSize.y, cudaGridSize.z, /* grid dim */
                                            cudaBlockSize.x, cudaBlockSize.y, cudaBlockSize.z, /* block dim */
                                            0,0, /* shared mem, stream */
                                            &arr[0], /* arguments */
                                            0));

    }

The default cudaBlockSize( 128, 1, 1) works fine. However, if I change that to 256, I get this error:

checkCudaErrors() Driver API error = 0001 "invalid argument" from file <BlackScholes.cpp>, line 176.

I don’t know if the 128 in cudaGridSize(DIV_UP(OPT_N/2, 128),1,1) is also coupled to the block size of 128 or not. I also tried

cudaBlockSize( 256, 1, 1)
udaGridSize(DIV_UP(OPT_N/2, 256),1,1)

But got the same error. I don’t see any reason for the failure. Any idea about that?

grid size and block size typically depend on the kernel. You should first check how the kernel works to understand the relationship between blocksize and grid size.

If you take a look at the kernel at BlackScholes_kernel.cuh you will see that each thread processes two elements. So a total of OPT_N/2 threads are required. Thus, if the block size is 256, then you need at least DIV_UP(OPT_N/2, 256) blocks of size 256

Your specific invalid argument error occurs because the kernel launch bounds are set to only allow a maximum blocksize of 128. __launch_bounds__(128) . Blocksize 256 is invalid.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.