CUDA: v11.1.0 , cuDNN: v8.0.4.30
OS: Windows 10 version 2004 (build 19041.572)
GPU: Geforce RTX 3090
Driver version: 456.71
Using Pytorch 1.8.0 nightly , same error using the just-released Pytorch 1.7
I’m running into an nvrtc compiler error when trying to run WaveGlow from Nvidia’s own github: GitHub - NVIDIA/waveglow: A Flow-based Generative Network for Speech Synthesis
The GA102 architecture should be fully supported in Cuda 11.1, correct? At least the release notes state so.
Yet it seems the compiler in question is not able to deal with the architecture in this case.
Other projects, such as Tacotron2, Huggingface transformers library, etc. work flawlessly, both for training and inference workloads.
WaveGlow on the other hand fails to run, with the NVRTC throwing an error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-8b32f5b07435> in <module>
4 for k in waveglow.convinv:
5 k.float()
----> 6 denoiser = Denoiser(waveglow)
K:\tts\AMP_TACOTRON2\tacotron2\waveglow\denoiser.py in __init__(self, waveglow, filter_length, n_overlap, win_length, mode)
28
29 with torch.no_grad():
---> 30 bias_audio = waveglow.infer(mel_input, sigma=0.0).float()
31 bias_spec, _ = self.stft.transform(bias_audio)
32
K:\tts\AMP_TACOTRON2\tacotron2\waveglow\glow.py in infer(self, spect, sigma)
274 audio_1 = audio[:,n_half:,:]
275
--> 276 output = self.WN[k]((audio_0, spect))
277
278 s = output[:, n_half:, :]
k:\tts\amp_tacotron2\venv\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
742 result = self._slow_forward(*input, **kwargs)
743 else:
--> 744 result = self.forward(*input, **kwargs)
745 for hook in itertools.chain(
746 _global_forward_hooks.values(),
K:\tts\AMP_TACOTRON2\tacotron2\waveglow\glow.py in forward(self, forward_input)
164 self.in_layers[i](audio),
165 spect[:,spect_offset:spect_offset+2*self.n_channels,:],
--> 166 n_channels_tensor)
167
168 res_skip_acts = self.res_skip_layers[i](acts)
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template<typename T>
__device__ T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template<typename T>
__device__ T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
#define __HALF_TO_US(var) *(reinterpret_cast<unsigned short *>(&(var)))
#define __HALF_TO_CUS(var) *(reinterpret_cast<const unsigned short *>(&(var)))
#if defined(__cplusplus)
struct __align__(2) __half {
__host__ __device__ __half() { }
protected:
unsigned short __x;
};
/* All intrinsic functions are only available to nvcc compilers */
#if defined(__CUDACC__)
/* Definitions of intrinsics */
__device__ __half __float2half(const float f) {
__half val;
asm("{ cvt.rn.f16.f32 %0, %1;}\n" : "=h"(__HALF_TO_US(val)) : "f"(f));
return val;
}
__device__ float __half2float(const __half h) {
float val;
asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(__HALF_TO_CUS(h)));
return val;
}
#endif /* defined(__CUDACC__) */
#endif /* defined(__cplusplus) */
#undef __HALF_TO_US
#undef __HALF_TO_CUS
typedef __half half;
extern "C" __global__
void func_1(half* t0, half* t1, half* aten_mul_flat) {
{
float v = __half2float(t1[(512 * blockIdx.x + threadIdx.x) % 2816 + 2816 * (((512 * blockIdx.x + threadIdx.x) / 2816) % 256)]);
float v_1 = __half2float(t0[(512 * blockIdx.x + threadIdx.x) % 2816 + 2816 * (((512 * blockIdx.x + threadIdx.x) / 2816) % 256)]);
aten_mul_flat[512 * blockIdx.x + threadIdx.x] = __float2half((tanhf(v)) * (1.f / (1.f + (expf(0.f - v_1)))));
}
}
This was not mentioned as known issue anywhere (as far as I can see), so I wanted to ask if this is a known problem and going to be addressed in an upcoming version or not, or if there is something I’m overlooking which could fix the problem.
Reproduction is very trivial - just clone the code from github and run it, you’ll encounter the same compiler error each time.