Incorrect result of cublasLtMatmul with CUBLASLT_EPILOGUE_RELU when input is NaN

It is expected that ReLU(nan) == nan (e.g. torch.relu(nan)==nan). However, this is not the behavior of cublasLtMatmul with CUBLASLT_EPILOGUE_RELU. When the input is nan, the result from cublasLtMatmul is 0.

P.S. If there is no CUBLASLT_EPILOGUE_RELU, the result from cublasLtMatmul is nan.