Hi all,
I am from the NTT DATA team, and we have a cloud infrastructure on AWS with a dedicated machine for Omniverse Nucleus. We are experiencing issues when loading USD scenes composed of many (and large) files. For loading, we use the omni.client library with the copy_async function within a REST server. During the file loading process, after uploading some files, when a large-sized file (1-2 GB) is encountered, it results in an error, and the response we receive is “Result.Error.” So:
-
How can we obtain more detailed error logs from the omni.client library? “Result.Error” doesn’t provide much information.
-
After the first error, all Nucleus services stop working. Restarting the instance doesn’t change anything; we have to recreate the server from scratch to get it working again.
-
We are unable to determine which container logs need to be checked and whether there is a log level to set. The only error we found in the logs is in the nucleus_thumbnails container, and it is as follows:
File "/omni/create_thumbnails.py", line 428, in handle_task
await create_thumbnails_cached(connection, file_transfer, file_path=path, file_hash=lm.hash_value,
File "/omni/prometheus_utils.py", line 97, in func_wrapper
return await func(*args, **kwargs)
File "/omni/create_thumbnails.py", line 341, in create_thumbnails_cached
system_thumb_hash = await upload_thumb(conn, file_transfer, system_thumb_path, thumbnail_data)
File "/omni/create_thumbnails.py", line 319, in upload_thumb
async with await file_transfer.create(path=system_thumb_path,
File "/omni/_deps/omniverse_connection/omni/lft.py", line 69, in __aexit__
await self.end()
File "/omni/_deps/omniverse_connection/omni/lft.py", line 165, in end
raise FileTransferException(str(result.status))
omni.lft.FileTransferException: (INTERNAL_ERROR)
To provide more context, what we are doing is extracting files from a zip file (for testing purposes, this file weighs 4 GB, and when extracted, it’s approximately 20 GB). These files are then sent from the REST server to Nucleus. However, as I explained, after a certain number of files have been uploaded, Nucleus stops working. Just to provide additional information, the upload process always gets stuck with the same file (size: 1.8 GB). I tried uploading this file individually, and I didn’t encounter any problems. However, when I try to upload all the files, the issue occurs. Some of the tests I’ve conducted include the following, but none have yielded improvements:
- I scaled the containers responsible for transfer (nucleus-lft) from 1 to 3.
- I used an instance with more RAM and CPU.
- I allocated 5 to 8 GB of RAM to the nucleus-lft containers.
Thanks for your support