Get better performance

Hi guys

I have the next question.

I have to evaluate the behavior of a system wich is composed by many objects.
The system must be evaluated with different parameters, in this case i’m testing 100000 different parameters.

What I have done is, copy a system instance to the global memory of the gpu and each thread will have a different parameter.

The problem is that when each thread have to access de system instance, they are accessing to the same memory space, and for that motive i’m getting a lower performance (the execution time is higher).

I can not to copy the system object/instance to the constant memory because this memory is lower that the my system size.

Can I to use the pitch memory to copy objects that are not of basic type?
Can someone tell me what is the best solution to this problem to avoid the reading blocking?

thanks in advance

use const restrict on compute 3.5 devices for the memory space of your system instance?

Hi cbuchner1

Thanks for your answer

My card is a Tesla C1060. Im no using this type of parameters.

This are the specifications

Name: Tesla C1060
Compute capability: 1.3
Clock rate: 1296000
Can map memory: True
Device copy overlap: Enabled
Kernel Execution Timeout: Disabled

---------- Memory Information for Device 0 ----------
Total global memory: 4294770688
Total constant memory: 65536
Max memory pitch: 2147483647
Texture Alignment: 256