Do we have a text to speech feature?

Andrey1984 · June 1, 2017, 4:50am

lets try reverse to the default voice:

cat /etc/festival.scm.backup
cat: cat: No such file or directory
;; WARNING: It is inherently insecure to run a festival instance as a
;; server, mainly because it exposes the whole system to exploits which
;; can be easily used by attackers to gain access to your
;; computer. This is because of the inherent design of the festival
;; server. Please use it only in a situation where you are sure that
;; you will not be subjected to such an attack, or have adequate
;; security precautions.

;; This file has been provided as an example file for your use, should
;; you wish to run festival as a server.

; Maximum number of clients on the server
;(set! server_max_clients 10)

; Server port
;(set! server_port 1314)

; Server password:
;(set! server_passwd "password")

; Log file location
;(set! server_log_file "/var/log/festival/festival.log")

; Server access list (hosts)
; Example:
; (set! server_access_list '("[^.]+" "127.0.0.1" "localhost.*" "192.168.*"))
; Secure default:
;(set! server_access_list '("[^.]+" "127.0.0.1" "localhost"))

; Server deny list (hosts)

;; Debian-specific: Use aplay to play audio
(Parameter.set 'Audio_Command "aplay -q -c 1 -t raw -f s16 -r $SR $FILE")
(Parameter.set 'Audio_Method 'Audio_Command)

Andrey1984 · June 1, 2017, 4:52am

now it speaks with a default voice:
echo “This is a test.” | festival --tts

it seems that something with setting up a default voice is broken

Andrey1984 · June 1, 2017, 5:02am

festival

Festival Speech Synthesis System 2.4:release December 2014
Copyright (C) University of Edinburgh, 1996-2010. All rights reserved.

clunits: Copyright (C) University of Edinburgh and CMU 1997-2010
clustergen_engine: Copyright (C) Carnegie Mellon University 2005-2014
hts_engine: 
The HMM-Based Speech Synthesis Engine "hts_engine API"
hts_engine API version 1.07 (http://hts-engine.sourceforge.net/)
Copyright (C) The HMM-Based Speech Synthesis Engine "hts_engine API"
Version 1.07 (http://hts-engine.sourceforge.net/)
Copyright (C) 2001-2012 Nagoya Institute of Technology
              2001-2008 Tokyo Institute of Technology
All rights reserved.

All rights reserved.
For details type `(festival_warranty)'
festival> (voice.list)
(kal_diphone)
festival>

/usr/share/festival/voices/english$ ll
total 127016
drwxr-xr-x  5 root   root        4096 Jun  1 04:59 ./
drwxr-xr-x  3 root   root        4096 May 31 01:12 ../
drwxr-xr-x 13 nvidia nvidia      4096 Jun  1 04:34 cmu_us_awb_arctic/
lrwxrwxrwx  1 root   root          17 Jun  1 04:36 cmu_us_awb_arctic_clunits -> cmu_us_awb_arctic/
drwxr-xr-x 27   7945 users       4096 Jun  6  2006 cmu_us_clb_arctic/
-rw-r--r--  1 root   root   130040259 Jun  7  2006 cmu_us_clb_arctic-0.95-release.tar.bz2
lrwxrwxrwx  1 root   root          17 Jun  1 04:44 cmu_us_clb_arctic_clunits -> cmu_us_clb_arctic/
drwxr-xr-x  4 root   root        4096 May 31 01:12 kal_diphone/

Andrey1984 · June 1, 2017, 5:15am

However, I managed to install hits voices;
it seems they require .scm file to get installed

For details type `(festival_warranty)'
festival> (voice.list)
(nitech_us_awb_arctic_hts kal_diphone)

Andrey1984 · June 1, 2017, 5:18am

However, still issues:

ll rights reserved.
For details type `(festival_warranty)'
festival> (voice.list)
(nitech_us_awb_arctic_hts kal_diphone)
festival> (voice_nitech_us_awb_arctic_hts)
nitech_us_awb_arctic_hts
festival> hello
SIOD ERROR: unbound variable : hello
festival> (SayText "Hello World")

Warning: HTS_fopen: Cannot open hts/htsvoice.
aplay: main:593: bad speed value 0
#<Utterance 0x7f753f3250>
festival>

I will rather use Google text to speech api by now :)

Andrey1984 · June 1, 2017, 6:15am

what is noticeable is that a voice which is not a high quality : http://festvox.org/packed/festival/2.4/voices/festvox_cmu_us_fem_cg.tar.gz
installs and plays with festival

All rights reserved.
For details type `(festival_warranty)'
festival> (voice.list)
(nitech_us_awb_arctic_hts cmu_us_fem_cg kal_diphone)
festival> (voice_cmu_us_fem_cg)
cmu_us_fem_cg
festival> (SayText "Hello World")
#<Utterance 0x7f90deabf0>
festival>

Andrey1984 · June 1, 2017, 6:53am

some findings and references:[url]http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html[/url]

[url]http://htk.eng.cam.ac.uk/[/url]
[url]http://hts.sp.nitech.ac.jp/[/url]
[url]http://hts-engine.sourceforge.net/[/url]
[url]http://sp-tk.sourceforge.net/[/url]
[url]http://open-jtalk.sourceforge.net/[/url]
[url]http://www.sinsy.jp/[/url]

Andrey1984 · June 1, 2017, 7:23am

I am just wondering if the following would be fine for htk 3.4.1 compilation:

CC      = /usr/local/cuda-8.0/bin/nvcc
CFLAGS  := -m64 -ccbin gcc -gencode arch=compute_30,code=sm_30 -D'ARCH="aarch64$
LDFLAGS = -L/usr/local/cuda-8.0/lib -Wl,-rpath=/usr/local/cuda-8.0/lib -L/usr/l$
RANLIB = ranlib

Andrey1984 · June 1, 2017, 7:26am

though there lib should be substituted with lib64 value

Andrey1984 · June 1, 2017, 7:32am

all - errors
make -f MakefileNVCC /usr/local/cuda-8.0/bin/nvcc -m64 -ccbin gcc -gencode arc - Pastebin.com

linuxdev · June 1, 2017, 2:53pm

The missing values are from the voice package. Basically festival has speech software, and then it can plug in different voices from different locations, e.g., you can have male or female voice, it can have a British accent, an Australian accent, a US mid-west accent, so on. I’m not sure how the system determines the default voice, but what is your default locale?

If you look at “/usr/share/festival/voices/” you’ll see “english”, and if you’ve installed other languages, you’ll see that too. If you then cd to “english” (or any other), type “tree” to see what is there (the “.scm” files seem to be the voices…you may need to “sudo apt-get install tree”).

If you run “apt search festival” you’ll see voices are packages starting with “festvox-”. There are also dictionaries. I see “festlex-cmu” is a CMU dictionary. Try “apt-get install festlex-cmu”.

Andrey1984 · June 1, 2017, 3:39pm

Hi linuxdev, thank you for your response;

what is your default voice<<<<<<<<<<<

festival> (voice_default)
SIOD ERROR: unbound variable : voice_cmu_us_slt_arctic_clunits

by default it uses the "

kal_diphone
    ├── festvox
    │   ├── kal_diphone.scm
    │   └── kaldurtreeZ.scm

"
it was the only one installed with apt-get by default;
Later I managed to add “http://festvox.org/packed/festival/2.4/voices/festvox_cmu_us_fem_cg.tar.gz”

It seems to recognize it, on add to the /usr/share/festival/voices/us folder it lists it on:

festival> (voice.list)
(nitech_us_awb_arctic_hts cmu_us_fem_cg kal_diphone)

I think it will recognize and allow to use any of the voices from the link Index of /packed/festival/2.4/voices at once and automatically.
It allows to select and set a “voice-to-be-used” with execution

festival> (voice_kal_diphone)
kal_diphone

and it sets a default voice either via /etc/festival.scm line add
“(set! voice_default voice_cmu_us_slt_arctic_hts)” , however that wont work in my case.
May be I should try set the default voice in one of a different ways described there:Festival - ArchWiki

It also recognises hts voices available from Japan but it wont work at once, unfortunately:

festival> (voice_nitech_us_awb_arctic_hts)
nitech_us_awb_arctic_hts
festival> (SayText "hello")

Warning: HTS_fopen: Cannot open hts/htsvoice.
aplay: main:593: bad speed value 0
#<Utterance 0x7f7c16fef0>

Try “apt-get install festlex-cmu”<<<<<<<<<<<<<<<<

sudo apt-get install festlex-cmu
[sudo] password for nvidia: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
festlex-cmu is already the newest version (1.4.0-8).
The following packages were automatically installed and are no longer required:
  apt-clone archdetect-deb dmeventd dmraid dpkg-repack gir1.2-timezonemap-1.0 gir1.2-xkl-1.0
  grub-common kpartx kpartx-boot libdebian-installer4 libdevmapper-event1.02.1 libdmraid1.0.0.rc16
  liblockfile-bin liblockfile1 liblvm2app2.2 liblvm2cmd2.02 libparted-fs-resize0 libreadline5
  lockfile-progs lvm2 os-prober pmount python3-icu python3-pam rdate ubiquity-casper
  ubiquity-ubuntu-artwork ubuntu-core-launcher
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.

/usr/share/festival/voices/english/ :

nvidia@tegra-ubuntu:/usr/share/festival/voices/english$ ll
total 20
drwxr-xr-x  5 root   root   4096 Jun  1 06:23 ./
drwxr-xr-x  4 root   root   4096 Jun  1 05:09 ../
drwxrwxrwx 13 nvidia nvidia 4096 Jun  1 06:30 cmu_us_awb_arctic_clunits/
drwxrwxrwx 27   7945 users  4096 Jun  6  2006 cmu_us_clb_arctic_clunits/
drwxr-xr-x  4 root   root   4096 May 31 01:12 kal_diphone/

│   │   ├── arctic_b0536.wav
│   │   ├── arctic_b0537.wav
│   │   ├── arctic_b0538.wav
│   │   └── arctic_b0539.wav
│   └── wrd
└── kal_diphone
├── festvox
│   ├── kal_diphone.scm
│   └── kaldurtreeZ.scm
└── group
└── kallpc16k.group

linuxdev · June 1, 2017, 4:08pm

I am in en_US locale with the following relevant packages installed on a TX1 (sorry, TX2 is in another room, I can look later on the TX2 if this doesn’t help):

# dpkg --list | egrep -i fest
ii  festival                                      1:2.4~release-2                                     arm64        General multi-lingual speech synthesis system
ii  festlex-cmu                                   1.4.0-8                                             all          CMU dictionary for Festival
ii  festlex-poslex                                1.4.0-6                                             all          Part of speech lexicons and ngram from English
ii  festvox-kallpc16k                             1.4.0-6                                             all          American English male speaker for festival, 16khz sample rate

If I run this command from a terminal natively running in the GUI it works:

echo "hello" | festival --tts

If I run that command from a remote system and set DISPLAY to “:0” using the same login name as GUI and remote ssh it also works (but make sure you don’t “ssh -Y”, just plain ssh). Are you running from a terminal logged in to the GUI?

Also, I am using USB headphones. I had to go into sound settings and make sure headphones were the output device before it worked correctly.

Andrey1984 · June 1, 2017, 4:18pm

yes,
default locale I am using seems to be en_US
yes, it does play on

echo "hello" | festival --tts

in my case it looks like

dpkg --list | egrep -i fest
ii  festival                                          1:2.4~release-2                               arm64        General multi-lingual speech synthesis system
ii  festival-freebsoft-utils                          0.10-4                                        all          Festival extensions and utilities
ii  festlex-cmu                                       1.4.0-8                                       all          CMU dictionary for Festival
ii  festlex-poslex                                    1.4.0-6                                       all          Part of speech lexicons and ngram from English
ii  festvox-kallpc16k                                 1.4.0-6                                       all          American English male speaker for festival, 16khz sample rate
ii  speech-dispatcher-festival                        0.8.3-1ubuntu3                                arm64        Festival support for Speech Dispatcher

In your case it uses the default voice, most likely which is “kal_diphone”
what does festival say if you execute

festival
festival> (voice_default)

I think it would be the kal_diphone
The voice is a somewhat voice and there exist an extension with high quality voices.
Two branches: A : the hts branch & B: CMU arctic clunits.
The two above packs of voices are said to be of most better quality than the default and somehow voices.
Example of the voices packs are available online there: Festival Online Demo
By some misfortune, installed festival doesnt support at once the two mentioned branches at jetson tx2.
Neither HTS voices do play , though they are recognized and listed in installed voices, nor arctic cmu voices does play, neither they are recognized as installed on download and add them to voices folder.

Andrey1984 · June 1, 2017, 4:26pm

The default festival voice doesn’t seem to be of enough quality to read a book aloud and listen it with comfort, as it seems to me.

I have an idea that HTS or CMU could turn out to be better from that point of view.

By now I have managed to use translate.google.com to read books with somewhat qualitative voice. But that, as Snarky has mentioned requires being online.

CMU & Japaneese high quality voices equivalent seem to be promising for offline use.

The finest voices I have found for today are there perhaps
: http://sinsy.sp.nitech.ac.jp/temp/20161230024942_7698.wav
: http://sinsy.sp.nitech.ac.jp/temp/20161230025219_8927.wav
: http://sinsy.sp.nitech.ac.jp/temp/20161230025359_8926.wav
But they are in Javanese and musicial :)

Andrey1984 · June 1, 2017, 4:29pm

from that point of view http://hts.sp.nitech.ac.jp/ seems to be promising,
but that seems to be a complex task which involves http://htk.eng.cam.ac.uk/ use
which I can not make functioning at jetson yet, but hopefully their mailing list could provide answers

Andrey1984 · June 1, 2017, 5:14pm

e.g. the qualitative voice synthesized: 【Sinsy】にじいろ【歌わせてみた】 - ニコニコ動画
but that is rather singing system than for reading

linuxdev · June 1, 2017, 5:26pm

On my desktop:

festival> (voice_default)
SIOD ERROR: unbound variable : nitech_us_slt_arctic_hts

…my working TX1 (I can’t test TX2 right now) festival says the same thing, except “kal_diphone” instead of “nitech_us_slt_arctic_hts” (your guess was correct). It seems that this is a variable needing to be set, but that this will not prevent success…the system seems to just search for the next possible value and use the one it finds.

FYI, I do see that there are different sample rates available for some of the voices…either 8khz or 16khz. This could help clarity, but I suspect the quality you are speaking of is not a sample rate issue.

I see there is a “–language ” option (“man festival”). Perhaps when this is explicitly called some errors for alternate voices might go away.

mehmetmertyildiran · October 9, 2017, 9:24pm

Most of the voices are now broken thanks to package maintainers Bug #1637567 “Festival 2.4 Regression” : Bugs : festival package : Ubuntu

I’m not sure though it’s related to Ubuntu maintainers or Debian maintainers.

If you compile festival from source then it works perfectly. I looked at http://archive.ubuntu.com/ubuntu/pool/universe/f/festival/festival_2.4~release-2.debian.tar.xz but couldn’t find the source of the problem.

Best voice is cmu_us_clb_arctic and you would install via this command:

cd /usr/share/festival/voices/english/ && wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_clb_arctic-0.95-release.tar.bz2 && tar jxf cmu_us_clb_arctic-0.95-release.tar.bz2 && ln -s cmu_us_clb_arctic cmu_us_clb_arctic_clunits && cp /etc/festival.scm /etc/festival.scm.backup && chmod o+w /etc/festival.scm && echo "(set! voice_default 'voice_cmu_us_clb_arctic_clunits)" >> /etc/festival.scm

But that’s not working anymore with Ubuntu’s repositories :/ (I can only confirm that)

Topic		Replies	Views
voiceprint regnization based on jetson tx2 Jetson TX2	2	628	October 18, 2021
Add more voices for pyttsx3, text to speech Jetson Nano jetson-inference	2	4663	October 15, 2021
Jetson HDMI no sound Jetson TK1	5	3777	April 24, 2017
TX2 have any audio Optimized SDK Jetson TX2	2	567	November 2, 2019
Voice Demo Container for Jetson Xavier NX not working Jetson Xavier NX audio	11	1965	October 18, 2021
Jetson Tk1: Ubuntu 14.04 - non working packages and applications Jetson TK1	33	13053	February 21, 2016
Is there any light text to speech(speech synthesis) module for Jetson Nano? Jetson Nano ai-training	2	1411	October 18, 2021
Getting a Real Time Factor Over 60 for Text-To-Speech Services Using NVIDIA Jarvis Technical Blog	0	453	August 25, 2020
Speech recognition TensorRT	1	441	May 3, 2021
processor details missing at ubuntu 16.04 Jetson TX1	20	2170	October 18, 2021

Do we have a text to speech feature?

Related topics