Solutions to ChatWithRTX YouTube Non-English Transcript Download

I was trying to download Chinese youtube video transcripts, but the command line shows a language list and “Not all videos transcripts are downloaded and processed”.

It also shows a URL: GitHub - jdepoix/youtube-transcript-api: This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
This is the transcript download api project official page, and it shows the usage to download transcripts in different languages:

YouTubeTranscriptApi.get_transcript(video_id, languages=[‘de’, ‘en’])

In ChatWithRTX, this line is involved in:

~\ChatWithRTX\RAG\trt-llm-rag-windows-main\app.py
line 229

Initially it was:

transcript = YouTubeTranscriptApi.get_transcript(video_id)

According to the project github page, I added the languages param and turned this line into:

transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[‘en’,‘zh’])

Now ChatWithRTX will first try to download English transcript. If the transcript doesn’t exist, it will try Chinese.

You can also change it into any language you want.

2 Likes

I was also having problems with youtube video transcripts in my native language. It works now. Thank you.