@fanzh Python seems to be working perfectly fine with CUDA shared memory and dlpack, using torch. Am I wrong here?
Attaching Relevant Triton Python Backend README section:
For PyTorch 2.0:
Thanks!
@fanzh Python seems to be working perfectly fine with CUDA shared memory and dlpack, using torch. Am I wrong here?
Attaching Relevant Triton Python Backend README section:
For PyTorch 2.0:
Thanks!