Zero Copy on Jetson Xavier and Setting __constant__ variables from a Host Function

I am developing for a Jetson Xavier and am using zero copy since the GPU and CPU share memory. For zero copy, I use cudaHostAlloc(…,cudaHostAllocMapped ) to allocate the memory on the host and then cudaHostGetDevicePointer() to get the device pointer for that memory and therefore not have to call cudaMemcpyAsync to transfer between the host memory and GPU memory.

I have kernels that use constants calculated in host functions. I was hoping to use constant variables instead of passing the arg to the kernel. When not using zero copy, one could use cudaMemcpyToSymbol() to set a constant variable. Is it possible to set constant variables from a host function while using zero copy?

The process of setting constant values using cudaMemcpyToSymbol shouldn’t be affected by the nature of the host allocation, whether it is an ordinary allocation (e.g. malloc) or an allocation provided by cudaHostAlloc. It should work similarly.

By the time I run the kernel, it is behaving as if though the constant is not being set. I’m trying to do:

In sliding_window.cuh:

device constant int const_window_size;

#endif // #ifndef SLIDING_WINDOW_CUH


#include “sliding_window.cuh”

global void sliding_window_sums( float2* sums, const float2* vals,
const int num_sums/, const int window_size/ ) {

int global_index = blockIdx.x * blockDim.x + threadIdx.x;
// TODO: Take this out when finished debugging!
if (global_index == 0) {
// The Problem: When run, this says that const_window_size is 0
printf( “From sliding_window_sums(): const_window_size = %d\n”, const_window_size );


#include “sliding_window.cuh”

int main() {
int window_size = 4000;
cudaError_t cerror;

printf( “%s(): Trying to set constant const_window_size to %d\n”, func, window_size );
// try_cuda_func() is a #define macro function that checks cerror for not being cudaSuccess.
// If !cudaSuccess it prints an error message and exits
try_cuda_func( cerror, cudaMemcpyToSymbol( const_window_size, &window_size, sizeof(int) ) );
try_cuda_func( cerror, cudaDeviceSynchronize() );

sliding_window_sums<<<num_thread_blocks, threads_per_block, 0, non_default_stream>>>( sums,
vals, num_sums/, window_size/ );
// Side Question: Should I check the error status of the kernel launch here too?
// Wouldn’t that require a cudaDeviceSynchronize()? (I run another kernel after this one though.)

That is obviously not a valid line of code. It would not compile.

Ooops. Typo. I edited my comment with the type.

unless you’re linking this properly, you can’t put the constant variable in one module, and the call to cudaMemcpyToSymbol in another.

Right now you appear to have 2 separate instances of the constant variable. One of those needs to be marked with extern, and you need to use proper linking. This topic/requirement for cross-module linking is covered in many posts on the internet if you look for it.

Here is one example: