the subarray usage in copy

I try to use like

 copyin(arr[0:total/2], arr[total/2:total/2])

instead of

copyin(arr[0:total])

as i have big data and have to split data.
but the ans use second is right, first is wrong.
I dont know what happened.
thanks ahead

I do a sample test like is:

 int arr[100];
        for (int i = 0; i <  100; ++i) {
                arr[i] = i*2;
        }
        int ans[100];

        #pragma acc kernels copyin(arr[0:50], arr[50:50]), copyout(ans[100])
        for (int i = 0; i < 100; ++i) {
                ans[i] = arr[i];
        }
        for (int i = 0; i < 100; ++i) {
                fprintf(stderr, "%d\t", ans[i]);
        }

and find arr[0:50] do not actually copyed into device memory.

what should i do to copyin two subset of an array.
thanks.

copyin(temp2) followed by copy[in](temp2); copyin attribute ignored

as the msg says,

what can i do…:(

deeply said.-.-

I do not think i find a solution, but it is very ugly…

int *_arr = arr;

so i can use _arr instead of arr when warning occur…
but it is very slowly…
I was afraid of seeing the time command …
any other way to do???[/code]

Hi luxuia,

No, you can’t put the same variable more than once in a data copy clause. There has to be a single base host address from which to map the host and device data.

Though, I question why you need to break apart the array this way. Can you please clarify what you mean by “as i have big data” and why you believe it needs to be split?

If you don’t have enough memory on the device to hold the entire array, breaking it apart in the data clause wont help. In this case, you need to strip mine the loop into a block size that fits on the device. Something like:

% cat test_block.c 
#include <stdio.h>

int main () {
        const int size = 100;
        int arr[size];
        int ans[size];
        int block_size = 50;
        for (int i = 0; i <  size; ++i) {
                arr[i] = i*2;
        }

        for (int b=0; b < size; b+=block_size) {
#pragma acc kernels copyin(arr[b:block_size]), copyout(ans[b:block_size])
        	for (int i = b; i < b+block_size; ++i) {
                	ans[i] = arr[i];
        	}
	}
        for (int i = 0; i < size; ++i) {
                printf("%d: %d\n", i, ans[i]);
        } 
	exit(0);
}

sb-colgrove:/local/home/colgrove% pgcc test_block.c -acc -Minfo=accel
main:
     12, Generating copyout(ans[b:block_size])
         Generating copyin(arr[b:block_size])
     13, Generating NVIDIA code
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     14, Loop is parallelizable
         Accelerator kernel generated
         14, #pragma acc loop gang, vector(64) /* blockIdx.x threadIdx.x */
% setenv PGI_ACC_NOTIFY 1
% a.out
launch CUDA kernel  file=../test_block.c function=main line=14 device=0 grid=1 block=64
launch CUDA kernel  file=../test_block.c function=main line=14 device=0 grid=1 block=64
0: 0
1: 2
2: 4
3: 6
4: 8
5: 10
6: 12
7: 14
8: 16
....
94: 188
95: 190
96: 192
97: 194
98: 196
99: 198

Hope this helps,
Mat