Hi. I am new to CUDA

I have 100 groups, each of which contain a 1000x100 matrix and a 1000x1 vector. For each group, I would like to scalar multiply the vector across the matrix. Scaled down, the operation would look like this:

```
1 1 2 3 1 2 3
2 x 4 5 6 = 8 10 12
3 7 8 9 21 24 27
```

This need to happen 100 times (once for each group) using the respective vector and matrix from each group.

Does it make sense to use Cublas for this, and if so which functions should I look into. If not, can anyone offer a better CUDA approach to this?

Thanks