Greetings! I found an implementation in CuPy that fits my needs, but I haven’t been able to find a similar library for C++. Does anyone know of a C++ library with comparable functionality?
The description:
Multi-dimensional array distributed across multiple CUDA devices.
This class implements some elementary operations that cupy.ndarray
provides. The array content is split into chunks, contiguous arrays corresponding to slices of the original array. Note that one device can hold multiple chunks.
Best regards,