I am newbie to CUDA. Here is a problem I got recently.

I have a vector v and matrix m in device memory. The length of v is dim, and m is a square matrix dim x dim.

Here is the pseudo code I want to write with CUDA

[codebox]for(int i=0; i<dim; ++i) {

if(v[i] == 0) {

```
set the column and row i in matrix m to 0 except the diagonal element.
```

}

}[/codebox]

Here is the CUDA code I wrote to implement this:

[codebox]**global** void zero_row(float* matrix, int dim, int i)

{

unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

if(x>=dim)

```
return;
```

else

```
matrix[i+x*dim] = 0;
```

}

**global** void zero_column(float* matrix, int dim, int i)

{

unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

if(x>=dim)

```
return;
```

else

```
matrix[i*dim+x] = 0;
```

}

**global** void make_matrix(const float* v, int dim, float* matrix)

{

unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

if(x >= dim)

```
return;
```

if(v==0) {

```
float temp = matrix[x*dim+x];
zero_row<<<(dim+nthreads-1)/nthreads, nthreads>>>(matrix, dim, x);
zero_column<<<(dim+nthreads-1)/nthreads, nthreads>>>(matrix, dim, x);
matrix[x*dim+x] = temp;
```

}

}

make_matrix<<<(dim+nthreads-1)/nthreads, nthreads>>>(x,dim,matrix);

[/codebox]

The compiler gave me an error said that

“calling a **global** function from a **global** function is not allowed”.

One method is to work around this is to make functions zero_row() and zero_column() a device function,

but then I will not be able to set the rows and columns in parallel. Could someone give some suggestions

about how to solve this problem? Thanks a lot!