I’m working in parallelization of an existing project which use a lot of Slatec functions for nonlinear least squares calculations and I can’t find any porting of Slatec in CUDA. specifically, it would be nice if there is at least a CUDA implementation for dcov, dnls1e and deform which are highly used.
Hi,
Haven’t heard of any canned implementations available but looks like the Levenberge-M loop these routines build on is a pretty hard problem not yet ported to GPUs (see elsewhere on the forums). Couldn’t find any projects out there that list it, but we could help in either of two ways: provide support for you to build it with some of the linear algebra in ArrayFire or just hire us outright to build it for you. Let us know if we can help,
Best,
James
I think that considering the effort to port those routines, the cheaper way is to see the performance with a higher level of parallelization. Anyways I’ll take a look to ArrayFire to see if there is any fortran routine in order to make a swap.