OpenCL and Ubuntu 10.10

Hi

I’m new at the topic OpenCl and so i have a lot of questen and I hope that i get answers. I work on Laptop with an Intel GMA card but I want to run it on a PC with an Intel Quad Core and a NVIDIA Card (8600GT). I installed the CUDA toolkit on my Laptop but no NVIDIA driver because of the Intel GMA card but I installed the DevDriver on the PC. After the installation of the CUDA toolkit I set the environmentvariables PATH and LD_LIBRARY_PATH to the installation direction of the CUDA toolkit. But when I want to compile a program he said “cannot find -lOpenCL”. I used the gcc compiler with gcc helloworld.c -o helloworld -lOpenCL. Is there something which I forgot to do?

Thanks for all answers.

Best regards
Harald

The OpenCL runtime ships with the NVIDIA driver. If you don’t have the driver installed, you don’t have the library. You can download a driver bundle and extract the contents (which includes the CUDA and OpenCL runtime libraries) without actually installing the driver, and then put them somewhere locally in your build environment to link against.

I thought the driver is only necessary for the execution of a openCL program that he can use the graphiccard. But I will try it out if I can extract the dev driver. This also means that on a laptop where no nvidia driver is installed it is not possible to execute a openCL program on the CPU if there exists no driver with opencl support for the CPU or?

Best regards

Harald

True, but that is really a separate question. You are not asking about the driver, you are asking about the runtime library. But the runtime support ships with the driver and to compile and link an executable containing API code you need to have the runtime library.

If you don’t have an OpenCL runtime library installed which is valid for the hardware you are trying to use, you can’t run OpenCL programs. That applies equally to CPU and GPU targets.

The easiest way is probably to use AMD’s OpenCL implementation for development on the laptop and NVIDIA’s for the desktop machine. The same program should be able to run on both of them.

So now i use the AMD stream on my laptop and the compilation works fine and i can start the program but i got the following error and i dont know how i can solve the problem:

Id of the platform: 1

OpenCL demo application started!

clCreateContext

clCreateCommandQueue

clCreateProgramWithSource

Error: Failed to build program executable!

/tmp/OCLSIKDpf.cl(2): warning: explicit type is missing (“int” assumed)

__kernel square(

       ^

/tmp/OCLSIKDpf.cl(2): error: kernel must return void

__kernel square(

       ^

1 error detected in the compilation of “/tmp/OCLSIKDpf.cl”.

Best regards

Harald

//

// File:       hello.c

//

// Abstract:   A simple "Hello World" compute example showing basic usage of OpenCL which

//             calculates the mathematical square (X[i] = pow(X[i],2)) for a buffer of

//             floating point values.

//

////////////////////////////////////////////////////////////////////////////////

#include <fcntl.h>

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <math.h>

#include <unistd.h>

#include <sys/types.h>

#include <sys/stat.h>

#include <CL/opencl.h>

////////////////////////////////////////////////////////////////////////////////

// Use a static data size for simplicity

//

#define DATA_SIZE (1024)

////////////////////////////////////////////////////////////////////////////////

// Simple compute kernel which computes the square of an input array 

//

const char *KernelSource = "\n" \

"__kernel square(                                                       \n" \

"   __global float* input,                                              \n" \

"   __global float* output,                                             \n" \

"   const unsigned int count)                                           \n" \

"{                                                                      \n" \

"   int i = get_global_id(0);                                           \n" \

"   if(i < count)                                                       \n" \

"       output[i] = input[i] * input[i];                                \n" \

"}                                                                      \n" \

"\n";

////////////////////////////////////////////////////////////////////////////////

int main(int argc, char** argv)

{

    int err;                            // error code returned from api calls

float data[DATA_SIZE];              // original data set given to device

    float results[DATA_SIZE];           // results returned from device

    unsigned int correct;               // number of correct results returned

size_t global;                      // global domain size for our calculation

    size_t local;                       // local domain size for our calculation

cl_platform_id platform_id;

    cl_uint num_id; 

cl_device_id device_id;             // compute device id 

    cl_context context;                 // compute context

    cl_command_queue commands;          // compute command queue

    cl_program program;                 // compute program

    cl_kernel kernel;                   // compute kernel

cl_mem input;                       // device memory used for the input array

    cl_mem output;                      // device memory used for the output array

// Fill our data set with random float values

    //

    int i = 0;

    unsigned int count = DATA_SIZE;

    for(i = 0; i < count; i++)

        data[i] = rand() / (float)RAND_MAX;

// Connect to a compute device

    //

    int gpu = 1;

	

    err = clGetPlatformIDs(1, &platform_id, &num_id);

    if(err != CL_SUCCESS)

    {

	printf("Failed to get the ID of the platform (%i)\n", num_id);

        return EXIT_FAILURE;

    }

    printf("Id of the platform: %i\n",num_id);

err = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_ALL, 1, &device_id, NULL); // gpu ? CL_DEVICE_TYPE_GPU : CL_DEVICE_TYPE_CPU

    if (err != CL_SUCCESS)

    {

	if(err == CL_INVALID_PLATFORM)

		printf("CL_INVALID_PLATFORM\n");

	if(err == CL_INVALID_DEVICE_TYPE)

		printf("CL_INVALID_DEVICE_TYPE\n");

	if(err == CL_INVALID_VALUE)

		printf("CL_INVALID_VALUE\n");

	if(err == CL_DEVICE_NOT_FOUND)

		printf("CL_DEVICE_NOT_FOUND\n");

        printf("Error: Failed to create a device group!\n");

        return EXIT_FAILURE;

    }

    printf("\nOpenCL demo application started!\n"); 

// Create a compute context 

    //

    context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);

    if (!context)

    {

        printf("Error: Failed to create a compute context!\n");

        return EXIT_FAILURE;

    }

printf("clCreateContext\n");

// Create a command commands

    //

    commands = clCreateCommandQueue(context, device_id, 0, &err);

    if (!commands)

    {

        printf("Error: Failed to create a command commands!\n");

        return EXIT_FAILURE;

    }

printf("clCreateCommandQueue\n");

// Create the compute program from the source buffer

    //

    program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);

    if (!program)

    {

        printf("Error: Failed to create compute program!\n");

        return EXIT_FAILURE;

    }

printf("clCreateProgramWithSource\n");

// Build the program executable

    //

    err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

    if (err != CL_SUCCESS)

    {

        size_t len;

        char buffer[2048];

printf("Error: Failed to build program executable!\n");

        clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);

        printf("%s\n", buffer);

        exit(1);

    }

printf("clBuildProgram\n");

// Create the compute kernel in the program we wish to run

    //

    kernel = clCreateKernel(program, "square", &err);

    if (!kernel || err != CL_SUCCESS)

    {

        printf("Error: Failed to create compute kernel!\n");

        exit(1);

    }

printf("clCreateKernel\n");

// Create the input and output arrays in device memory for our calculation

    //

    input = clCreateBuffer(context,  CL_MEM_READ_ONLY,  sizeof(float) * count, NULL, NULL);

    output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * count, NULL, NULL);

    if (!input || !output)

    {

        printf("Error: Failed to allocate device memory!\n");

        exit(1);

    }    

printf("clCreateBuffer\n");

// Write our data set into the input array in device memory 

    //

    err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0, NULL, NULL);

    if (err != CL_SUCCESS)

    {

        printf("Error: Failed to write to source array!\n");

        exit(1);

    }

printf("clEnqueueWriteBuffer\n");

// Set the arguments to our compute kernel

    //

    err = 0;

    err  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);

    err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &output);

    err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &count);

    if (err != CL_SUCCESS)

    {

        printf("Error: Failed to set kernel arguments! %d\n", err);

        exit(1);

    }

// Get the maximum work group size for executing the kernel on the device

    //

    err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);

    if (err != CL_SUCCESS)

    {

        printf("Error: Failed to retrieve kernel work group info! %d\n", err);

        exit(1);

    }

// Execute the kernel over the entire range of our 1d input data set

    // using the maximum number of work group items for this device

    //

    global = count;

    err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);

    if (err)

    {

        printf("Error: Failed to execute kernel!\n");

        return EXIT_FAILURE;

    }

// Wait for the command commands to get serviced before reading back results

    //

    clFinish(commands);

// Read back the results from the device to verify the output

    //

    err = clEnqueueReadBuffer( commands, output, CL_TRUE, 0, sizeof(float) * count, results, 0, NULL, NULL );  

    if (err != CL_SUCCESS)

    {

        printf("Error: Failed to read output array! %d\n", err);

        exit(1);

    }

// Validate our results

    //

    correct = 0;

    for(i = 0; i < count; i++)

    {

        if(results[i] == data[i] * data[i])

            correct++;

    }

// Print a brief summary detailing the results

    //

    printf("Computed '%d/%d' correct values!\n", correct, count);

// Shutdown and cleanup

    //

    clReleaseMemObject(input);

    clReleaseMemObject(output);

    clReleaseProgram(program);

    clReleaseKernel(kernel);

    clReleaseCommandQueue(commands);

    clReleaseContext(context);

return 0;

}

The answer is right in the error message:

/tmp/OCLSIKDpf.cl(2): error: kernel must return void

__kernel square(

OpenCL functions must return void. So declare your function like this:

__kernel void square(__global float* input, _global float* output, const unsigned int count)

I’m new here, but do you not need to put the information about the target device you want to run on the second and third parameters?