Hi I have something that I cannot understand how it works that much. So SDK called SimpleGL somehow uses extern “C” to wrap kernel
function to compile in C style but Im not sure how they can use values in host directly when calling kernel function. Bellow is the code.
/*
* Copyright 1993-2010 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/
/* This example demonstrates how to use the Cuda OpenGL bindings with the
* runtime API.
* Device code.
*/
#ifndef _SIMPLEGL_KERNEL_H_
#define _SIMPLEGL_KERNEL_H_
///////////////////////////////////////////////////////////////////////////////
//! Simple kernel to modify vertex positions in sine wave pattern
//! @param data data in global memory
///////////////////////////////////////////////////////////////////////////////
__global__ void kernel(float4* pos, unsigned int width, unsigned int height, float time)
{
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
// calculate uv coordinates
float u = x / (float) width;
float v = y / (float) height;
u = u*2.0f - 1.0f;
v = v*2.0f - 1.0f;
// calculate simple sine wave pattern
float freq = 4.0f;
float w = sinf(u*freq + time) * cosf(v*freq + time) * 0.5f;
// write output vertex
pos[y*width+x] = make_float4(u, w, v, 1.0f);
}
// Wrapper for the __global__ call that sets up the kernel call
extern "C" void launch_kernel(float4* pos, unsigned int mesh_width, unsigned int mesh_height, float time)
{
// execute the kernel
dim3 block(8, 8, 1);
dim3 grid(mesh_width / block.x, mesh_height / block.y, 1);
kernel<<< grid, block>>>(pos, mesh_width, mesh_height, time);
}
#endif // #ifndef _SIMPLEGL_KERNEL_H_
does anybody know how this works? Is there case when I can directly use host memory instead of sending values to GPU using
cudaMemcpy and cudaMalloc? Thank you.
Regard,
Masterkiten