Jetson Nano 4GB can't deal with locale

I have a Jetson Nano 4GB version and I have a problem with reading accented characters or those with umlaut (i.e. äöü). My use-case is mounting an external hard drive, where those letters are sometimes part of a file path. As a result, the paths are read incorrectly.

As an example, I have the following string in a file name: Öga for Öga. My raspberry pi with the locale set to en_GB.UTF-8 reads it just fine. The Jetson Nano’s locale is set to en_US.UTF-8 and reads it as ‘$’\326’‘ga for ‘$’\326’‘ga. On the other hand, when I create as a test a file using the Jetson: touch "Öga for öga", it reads fine, but the raspberry pi reads it as ‘Ã’$’\302\226’‘ga for öga’. My laptop appears to align with the encoding on the raspberry pi and also reads the Jetson’s strings as gibberish.

I have tried to:

  1. generate all locales using dpkg-reconfigure locales
  2. installing the locales-all package
  3. changing the Jetson Nano language (using locale-update or by changing the variables using the locale command) all to no success.

Below is the output of the locale command

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

I am running the 4.6.1 version of Jetpack and have installed all updates as to the time of posting. Do you know how I can either align the encoding between the Jetson and Raspberry Pi or preferably change the encoding on the Jetson to read umlauts and accented letters correctly?

It won’t matter which flavor of Linux you use, nor the hardware, nor even if it is another operating system entirely…at the time of both producing and retrieving a file name, or path, or file content, the locale must be a match at retrieval time to what was used at creation time (though there is a lot of overlap where one locale will have a lot in common with another locale’s character set). I doubt this is anything related to being a Jetson, and I don’t know all of what is actually installed at your end, but I suspect it is just a case of needing to set the correct environment variable before reading or writing (setting to the environment which matches between your RPi and Jetson if this is what you want to match).

You probably already have the correct software. Do you see both encodings for en_US and en_GB from:
locale -a | egrep ''(en_US|en_GB)

On the Jetson, what does your user see from:
echo $LANG

On the RPi, what does your user see from:
echo $LANG

Do they differ still? If so, then displaying differently is working as expected when displaying a character which differs between the two character sets (e.g., no accent aigu in en_US, but it would exist in French…the lack of an en_US character with aigu implies reading the hexadecimal instead of displaying the correct graphics glyph).

From there, on the Jetson, assuming the RPi differs in “$LANG”, and assuming the RPi is “en_GB.UTF-8”, try this experiment (I will use file name “testing.txt”, but I suggest using a file name from your RPi which has non-en_US characters in its name…my example file name is because I’m replying to you from a computer which is en_US.UTF-8):

  1. Verify the file name shows special characters correctly on the RPi.
  2. Copy that file directly from the RPi to the Jetson, e.g., use scp.
  3. Verify the “echo $LANG” on the Jetson is still en_US.UTF-8.
  4. Verify the file name does not correctly display special characters.
  5. Now, on the Jetson, do this:
    export LANG=en_GB.UTF-8
  6. Now display the file name again on the Jetson. Do the special characters now show up correctly?

Thank you for your response. Now I am even more confused than I was before. Using your procedure with scp I transferred a file from the RPi to the Jetson and now the Jetson correctly displays the glyph. Upon issuing the locale command and grepping the en_ locales I get en_GB.UTF-8 for the RPi and en_US.UTF-8 for the Jetson. So now, I have no clue why is it that if I’m creating a file with umlaut or accents on the external hard drive using one system is read as gibberish on the other, where if I copy it from one system to the other (circumventing the external hard drive), it is displayed correctly. Could it be that I need to mount the hard drive with some extended character support? The fstab entries between both systems are identical //IP/path cifs username=username,password=password,vers=1.0,defaults,nofail,x-systemd.mount-timeout=30s 0 0

Hi, a quick update to whoever happens to find this. Now that I have narrowed down the problem to the same mounted volume displaying characters differently on two systems (Jetson and Raspberry pi) I ended up finding this thread Mounting network share - #13 by linuxdev. It appears that the Jetson Kernel is compiled without some character support which is reflected when using samba. However, mounting this hard drive using nfs resolved the problem for me.

Some background might help if you are still curious about this…

Every character set is basically a table of indices corresponding to particular characters. This table is independent of the graphics (glyphs) used to represent each character. One character set can work with any number of fonts/glyphs. UTF-8 is capable of containing the special characters you are interested in, but the display graphics determines what you actually see when that character’s index is used.

When you see something like “en_US”, consider this a set of glyphs to use for a given index. Consider “UTF-8” indices to have different meaning based on locale. If an index into a table of a character set is displayed using the correct glyphs, then it works. If you copy a file in binary format, then the index will remain constant.

Once you have that index the environment variable “$LANG” is how display picks the glyph. It is possible that many characters from en_US and en_GB have the same glyph. In cases where the glyph differs, then it will display correctly only if $LANG is set correctly.

Installing character sets will make available a table of indices, along with glyphs for a given locale. This does not cause those to be used, but it does make correct display to become possible if and only if $LANG is correct. The file does not change if $LANG is wrong, but character display does change. When your environment is en_US it will not display en_GB correctly, but exporting $LANG as en_GB would presumably fix this. Installing a font or character set is not what causes this behavior, it is merely a requirement for $LANG to be able to switch among glyphs. You must export the same $LANG on both RPi and Jetson when displaying a given UTF-8 file before they will match.

The $LANG is not “smart” and does not necessarily detect if a UTF-8 encoding is intended for a given glyph. However, active file sharing software (SAMBA in this case) does detect encoding at both sides, server and client. SAMBA, if it has all character sets and glyphs installed, will actively translate between en_GB and en_US. Active translation would not have been required if the Jetson had first exported the correct “export LANG=en_GB.UTF-8”. Tell it the wrong language, and the Jetson (and any computer) will happily display for the wrong language.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.