I try to use like
copyin(arr[0:total/2], arr[total/2:total/2])
instead of
copyin(arr[0:total])
as i have big data and have to split data.
but the ans use second is right, first is wrong.
I dont know what happened.
thanks ahead
I try to use like
copyin(arr[0:total/2], arr[total/2:total/2])
instead of
copyin(arr[0:total])
as i have big data and have to split data.
but the ans use second is right, first is wrong.
I dont know what happened.
thanks ahead
I do a sample test like is:
int arr[100];
for (int i = 0; i < 100; ++i) {
arr[i] = i*2;
}
int ans[100];
#pragma acc kernels copyin(arr[0:50], arr[50:50]), copyout(ans[100])
for (int i = 0; i < 100; ++i) {
ans[i] = arr[i];
}
for (int i = 0; i < 100; ++i) {
fprintf(stderr, "%d\t", ans[i]);
}
and find arr[0:50] do not actually copyed into device memory.
what should i do to copyin two subset of an array.
thanks.
copyin(temp2) followed by copy[in](temp2); copyin attribute ignored
as the msg says,
what can i do…:(
deeply said.-.-
I do not think i find a solution, but it is very ugly…
int *_arr = arr;
so i can use _arr instead of arr when warning occur…
but it is very slowly…
I was afraid of seeing the time
command …
any other way to do???[/code]
Hi luxuia,
No, you can’t put the same variable more than once in a data copy clause. There has to be a single base host address from which to map the host and device data.
Though, I question why you need to break apart the array this way. Can you please clarify what you mean by “as i have big data” and why you believe it needs to be split?
If you don’t have enough memory on the device to hold the entire array, breaking it apart in the data clause wont help. In this case, you need to strip mine the loop into a block size that fits on the device. Something like:
% cat test_block.c
#include <stdio.h>
int main () {
const int size = 100;
int arr[size];
int ans[size];
int block_size = 50;
for (int i = 0; i < size; ++i) {
arr[i] = i*2;
}
for (int b=0; b < size; b+=block_size) {
#pragma acc kernels copyin(arr[b:block_size]), copyout(ans[b:block_size])
for (int i = b; i < b+block_size; ++i) {
ans[i] = arr[i];
}
}
for (int i = 0; i < size; ++i) {
printf("%d: %d\n", i, ans[i]);
}
exit(0);
}
sb-colgrove:/local/home/colgrove% pgcc test_block.c -acc -Minfo=accel
main:
12, Generating copyout(ans[b:block_size])
Generating copyin(arr[b:block_size])
13, Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
14, Loop is parallelizable
Accelerator kernel generated
14, #pragma acc loop gang, vector(64) /* blockIdx.x threadIdx.x */
% setenv PGI_ACC_NOTIFY 1
% a.out
launch CUDA kernel file=../test_block.c function=main line=14 device=0 grid=1 block=64
launch CUDA kernel file=../test_block.c function=main line=14 device=0 grid=1 block=64
0: 0
1: 2
2: 4
3: 6
4: 8
5: 10
6: 12
7: 14
8: 16
....
94: 188
95: 190
96: 192
97: 194
98: 196
99: 198
Hope this helps,
Mat