I have a kernel whose job it is to fill out the values in two arrays, as part of a larger function:
//host code only above here
dim3 bpg;
bpg.x = 256;
bpg.y = 16;
dim3 tpb;
tpb.x = 256;
tpb.y = 256;
long2* d_H_pos;
cuDoubleComplex* d_H_vals;
status1 = cudaMalloc(&d_H_pos, dim*stridepos*sizeof(long2));
status2 = cudaMalloc(&d_H_vals, dim*strideval*sizeof(cuDoubleComplex));
if ( (status1 != CUDA_SUCCESS) || (status2 != CUDA_SUCCESS) ){
cout<<"Memory allocation for device Hamiltonian failed! Error: "<<cudaGetErrorString( cudaPeekAtLastError() )<<endl;
return 1;
}
//dim = 256*256
SetFirst<<<256, 256>>>(d_H_pos, stridepos, dim, 1); //count the diagonal element
cudaThreadSynchronize();
FillSparse<<<bpg, tpb>>>(d_basis_Position, d_basis, dim, d_H_vals, d_H_pos, d_Bond, lattice_Size, JJ);
//function continues
When I run cuda-gdb, SetFirst launches with <<<(256,1,1),(256,1,1)>>> and FillSparse launches with <<<(1,1,1),(1,1,1)>>>. What’s going on here? I don’t get any segfaults or memory allocation problems when I create the arrays or run SetFirst. I’m running CUDA 4.0 on a GTX 460 (with 2GB RAM), using the 64-bit Ubuntu 10.10.
I have a kernel whose job it is to fill out the values in two arrays, as part of a larger function:
//host code only above here
dim3 bpg;
bpg.x = 256;
bpg.y = 16;
dim3 tpb;
tpb.x = 256;
tpb.y = 256;
long2* d_H_pos;
cuDoubleComplex* d_H_vals;
status1 = cudaMalloc(&d_H_pos, dim*stridepos*sizeof(long2));
status2 = cudaMalloc(&d_H_vals, dim*strideval*sizeof(cuDoubleComplex));
if ( (status1 != CUDA_SUCCESS) || (status2 != CUDA_SUCCESS) ){
cout<<"Memory allocation for device Hamiltonian failed! Error: "<<cudaGetErrorString( cudaPeekAtLastError() )<<endl;
return 1;
}
//dim = 256*256
SetFirst<<<256, 256>>>(d_H_pos, stridepos, dim, 1); //count the diagonal element
cudaThreadSynchronize();
FillSparse<<<bpg, tpb>>>(d_basis_Position, d_basis, dim, d_H_vals, d_H_pos, d_Bond, lattice_Size, JJ);
//function continues
When I run cuda-gdb, SetFirst launches with <<<(256,1,1),(256,1,1)>>> and FillSparse launches with <<<(1,1,1),(1,1,1)>>>. What’s going on here? I don’t get any segfaults or memory allocation problems when I create the arrays or run SetFirst. I’m running CUDA 4.0 on a GTX 460 (with 2GB RAM), using the 64-bit Ubuntu 10.10.
I have a kernel whose job it is to fill out the values in two arrays, as part of a larger function:
//host code only above here
dim3 bpg;
bpg.x = 256;
bpg.y = 16;
dim3 tpb;
tpb.x = 256;
tpb.y = 256;
long2* d_H_pos;
cuDoubleComplex* d_H_vals;
status1 = cudaMalloc(&d_H_pos, dim*stridepos*sizeof(long2));
status2 = cudaMalloc(&d_H_vals, dim*strideval*sizeof(cuDoubleComplex));
if ( (status1 != CUDA_SUCCESS) || (status2 != CUDA_SUCCESS) ){
cout<<"Memory allocation for device Hamiltonian failed! Error: "<<cudaGetErrorString( cudaPeekAtLastError() )<<endl;
return 1;
}
//dim = 256*256
SetFirst<<<256, 256>>>(d_H_pos, stridepos, dim, 1); //count the diagonal element
cudaThreadSynchronize();
FillSparse<<<bpg, tpb>>>(d_basis_Position, d_basis, dim, d_H_vals, d_H_pos, d_Bond, lattice_Size, JJ);
//function continues
When I run cuda-gdb, SetFirst launches with <<<(256,1,1),(256,1,1)>>> and FillSparse launches with <<<(1,1,1),(1,1,1)>>>. What’s going on here? I don’t get any segfaults or memory allocation problems when I create the arrays or run SetFirst. I’m running CUDA 4.0 on a GTX 460 (with 2GB RAM), using the 64-bit Ubuntu 10.10.
Thou shalt check return codes for errors.
Fermi supports no more than 1024 threads per block. Your tpb dimensions exceed that. You should check for errors after the kernel launch too.
I have a kernel whose job it is to fill out the values in two arrays, as part of a larger function:
//host code only above here
dim3 bpg;
bpg.x = 256;
bpg.y = 16;
dim3 tpb;
tpb.x = 256;
tpb.y = 256;
long2* d_H_pos;
cuDoubleComplex* d_H_vals;
status1 = cudaMalloc(&d_H_pos, dim*stridepos*sizeof(long2));
status2 = cudaMalloc(&d_H_vals, dim*strideval*sizeof(cuDoubleComplex));
if ( (status1 != CUDA_SUCCESS) || (status2 != CUDA_SUCCESS) ){
cout<<"Memory allocation for device Hamiltonian failed! Error: "<<cudaGetErrorString( cudaPeekAtLastError() )<<endl;
return 1;
}
//dim = 256*256
SetFirst<<<256, 256>>>(d_H_pos, stridepos, dim, 1); //count the diagonal element
cudaThreadSynchronize();
FillSparse<<<bpg, tpb>>>(d_basis_Position, d_basis, dim, d_H_vals, d_H_pos, d_Bond, lattice_Size, JJ);
//function continues
When I run cuda-gdb, SetFirst launches with <<<(256,1,1),(256,1,1)>>> and FillSparse launches with <<<(1,1,1),(1,1,1)>>>. What’s going on here? I don’t get any segfaults or memory allocation problems when I create the arrays or run SetFirst. I’m running CUDA 4.0 on a GTX 460 (with 2GB RAM), using the 64-bit Ubuntu 10.10.
Thou shalt check return codes for errors.
Fermi supports no more than 1024 threads per block. Your tpb dimensions exceed that. You should check for errors after the kernel launch too.
Wow, I feel dumb now. Thanks!
Wow, I feel dumb now. Thanks!