Troubleshooting Large File Loading Issues in Omniverse Nucleus on AWS Cloud

gabriele.greco · October 11, 2023, 12:28pm

Hi all,
I am from the NTT DATA team, and we have a cloud infrastructure on AWS with a dedicated machine for Omniverse Nucleus. We are experiencing issues when loading USD scenes composed of many (and large) files. For loading, we use the omni.client library with the copy_async function within a REST server. During the file loading process, after uploading some files, when a large-sized file (1-2 GB) is encountered, it results in an error, and the response we receive is “Result.Error.” So:

How can we obtain more detailed error logs from the omni.client library? “Result.Error” doesn’t provide much information.
After the first error, all Nucleus services stop working. Restarting the instance doesn’t change anything; we have to recreate the server from scratch to get it working again.
We are unable to determine which container logs need to be checked and whether there is a log level to set. The only error we found in the logs is in the nucleus_thumbnails container, and it is as follows:

File "/omni/create_thumbnails.py", line 428, in handle_task
  await create_thumbnails_cached(connection, file_transfer, file_path=path, file_hash=lm.hash_value,
File "/omni/prometheus_utils.py", line 97, in func_wrapper
  return await func(*args, **kwargs)
File "/omni/create_thumbnails.py", line 341, in create_thumbnails_cached
  system_thumb_hash = await upload_thumb(conn, file_transfer, system_thumb_path, thumbnail_data)
File "/omni/create_thumbnails.py", line 319, in upload_thumb
  async with await file_transfer.create(path=system_thumb_path,
File "/omni/_deps/omniverse_connection/omni/lft.py", line 69, in __aexit__
  await self.end()
File "/omni/_deps/omniverse_connection/omni/lft.py", line 165, in end
  raise FileTransferException(str(result.status))
omni.lft.FileTransferException: (INTERNAL_ERROR)

To provide more context, what we are doing is extracting files from a zip file (for testing purposes, this file weighs 4 GB, and when extracted, it’s approximately 20 GB). These files are then sent from the REST server to Nucleus. However, as I explained, after a certain number of files have been uploaded, Nucleus stops working. Just to provide additional information, the upload process always gets stuck with the same file (size: 1.8 GB). I tried uploading this file individually, and I didn’t encounter any problems. However, when I try to upload all the files, the issue occurs. Some of the tests I’ve conducted include the following, but none have yielded improvements:

I scaled the containers responsible for transfer (nucleus-lft) from 1 to 3.
I used an instance with more RAM and CPU.
I allocated 5 to 8 GB of RAM to the nucleus-lft containers.

Thanks for your support

Topic		Replies	Views
Unable to download any resources from Nucleus Navigator localhost Nucleus	5	59	September 4, 2024
Error with the Nucleus server installation General Discussion	8	1939	August 6, 2024
Omniverse Navigator File Upload Pending: Seeking Solutions Deprecated Apps & Extensions nucleus-navigator	4	386	March 26, 2024
Omniverse pipelines not free anymore Nucleus	12	1086	February 7, 2023
Unable to upload files using Linux based OmniClient library Nucleus	6	308	October 30, 2024
Error Accessing File on Network Nucleus Nucleus	4	1137	November 22, 2022
Fail to connect to localhost Nucleus	8	564	May 24, 2024
Unable to Upload Files/Folders from my Local Machine to Nucleus folders Nucleus	22	1482	May 24, 2024
Nucleus error - failed connection to Delta server General Discussion	13	2477	October 21, 2022
The Nucleus service appears to be unresponsive Nucleus nucleus	15	4526	March 19, 2022

Troubleshooting Large File Loading Issues in Omniverse Nucleus on AWS Cloud

Related topics