Latest GPU-Accelerated Libraries topics

Topic	Replies	Views	Activity
Equivalent of NVreg_EnableStreamMemOPs and NVreg_InitializeSystemMemoryAllocations for Windows	0	117	March 18, 2024
Batch transforms in cuFFT-Regent cufft	2	144	March 18, 2024
Cuda function to convert P010le to NV12	0	93	March 18, 2024
GDS / CUDA install on Ubuntu 22.04 - Forced to nvidia-kernel=source-550-open no matching cuda-drivers-550 cuda , ubuntu , gds	0	411	March 15, 2024
cufftMP slow plan creation and execution on multiple nodes cufft	1	203	March 14, 2024
How to use negative leading dimension in cuBLASLt matmul interface? cublas	0	124	March 13, 2024
Recreating cuDSS matrix causes access violation reading location error cudss	2	199	March 13, 2024
GEMM stage on ampere cutlass	0	163	March 12, 2024
How to understand "CU_FILE_RDMA_REGISTER"? gds	6	194	March 12, 2024
cuBLAS Level-1 amax execution error cublas	1	151	March 11, 2024
Sparse cusolver inside loop .................. factorization at every call? cusparse	8	1154	March 9, 2024
Multi-GPU FFT own memory allocation cufft	4	746	March 8, 2024
cuFFT guru interface cufft	0	145	March 8, 2024
Large % of time in cuBLAS calls spent in clock_gettime cublas	3	163	March 6, 2024
cuSolverSP module cusolver	1	120	March 6, 2024
Multinode NCCL test hangs after Init COMPLETE nccl	0	171	March 6, 2024
Minor bugs in header file "cublasmp.h" of cuBLASMp cublas	1	212	March 5, 2024
Segfault using cuda-gdb 12 with cusparseCreate() in a thread cusparse	2	121	March 5, 2024
Can not compile cublas file in windows10 cublas	3	276	March 19, 2024
Why are CuNumeric's Discrete Fourier Transform functions slower than Numpy's? python	1	152	March 4, 2024
Undefined symbol: cufftExecC2R after installing cmake python library cuda , python , cufft	2	208	March 4, 2024
CUDA 12 - Sparse Triangular Matrix Solver cusparse	4	230	March 2, 2024
Batched multiplication with sparse matrices and dense vectors cusparse	4	202	March 15, 2024
Failure in installation of nvshmem cuda , nvshmem	5	237	March 13, 2024
Signature Error in GDS cuFileReadAsync and cuFileWriteAsync Documentation gds	0	123	February 28, 2024
CUTENSOR_OP_POW2 op(x) = x*x? cutensor	0	125	February 28, 2024
Understanding Read and Write Op Counts in Async GDS Operations gds	0	134	February 27, 2024
Stripmining matmul for bandwidth optimization host-to-gpu for LLM computation cublas	2	181	February 26, 2024
cusolverSp QR runs much slower on V100 than on T4 cusolver	0	144	February 23, 2024
GPUDirect Storage gds	6	1380	February 25, 2024

Equivalent of NVreg_EnableStreamMemOPs and NVreg_InitializeSystemMemoryAllocations for Windows

0

117

March 18, 2024

Batch transforms in cuFFT-Regent

cufft

2

144

March 18, 2024

Cuda function to convert P010le to NV12

0

93

March 18, 2024

GDS / CUDA install on Ubuntu 22.04 - Forced to nvidia-kernel=source-550-open no matching cuda-drivers-550

cuda , ubuntu , gds

0

411

March 15, 2024

cufftMP slow plan creation and execution on multiple nodes

cufft

1

203

March 14, 2024

How to use negative leading dimension in cuBLASLt matmul interface?

cublas

0

124

March 13, 2024

Recreating cuDSS matrix causes access violation reading location error

cudss

2

199

March 13, 2024

GEMM stage on ampere

cutlass

0

163

March 12, 2024

How to understand "CU_FILE_RDMA_REGISTER"?

gds

6

194

March 12, 2024

cuBLAS Level-1 amax execution error

cublas

1

151

March 11, 2024

Sparse cusolver inside loop .................. factorization at every call?

cusparse

8

1154

March 9, 2024

Multi-GPU FFT own memory allocation

cufft

4

746

March 8, 2024

cuFFT guru interface

cufft

0

145

March 8, 2024

Large % of time in cuBLAS calls spent in clock_gettime

cublas

3

163

March 6, 2024

cuSolverSP module

cusolver

1

120

March 6, 2024

Multinode NCCL test hangs after Init COMPLETE

nccl

0

171

March 6, 2024

Minor bugs in header file "cublasmp.h" of cuBLASMp

cublas

1

212

March 5, 2024

Segfault using cuda-gdb 12 with cusparseCreate() in a thread

cusparse

2

121

March 5, 2024

Can not compile cublas file in windows10

cublas

3

276

March 19, 2024

Why are CuNumeric's Discrete Fourier Transform functions slower than Numpy's?

python

1

152

March 4, 2024

Undefined symbol: cufftExecC2R after installing cmake python library

cuda , python , cufft

2

208

March 4, 2024

CUDA 12 - Sparse Triangular Matrix Solver

cusparse

4

230

March 2, 2024

Batched multiplication with sparse matrices and dense vectors

cusparse

4

202

March 15, 2024

Failure in installation of nvshmem

cuda , nvshmem

5

237

March 13, 2024

Signature Error in GDS cuFileReadAsync and cuFileWriteAsync Documentation

gds

0

123

February 28, 2024

CUTENSOR_OP_POW2 op(x) = x*x?

cutensor

0

125

February 28, 2024

Understanding Read and Write Op Counts in Async GDS Operations

gds

0

134

February 27, 2024

Stripmining matmul for bandwidth optimization host-to-gpu for LLM computation

cublas

2

181

February 26, 2024

cusolverSp QR runs much slower on V100 than on T4

cusolver

0

144

February 23, 2024

GPUDirect Storage

gds

6

1380

February 25, 2024

Accelerated Computing GPU-Accelerated Libraries