Hi,
It seems that your input source is consist of two input.
input = [video_data, audio_data.to("cpu")]
Please note that mixing a buffer with CPU memory and GPU memory is not supported.
You can either use a CPU buffer (htod copy) or a GPU buffer (dtod copy).
But the type of video_data and audio_data need to be aligned.
Thanks.