Rtx 4090 based farm agent issue - it renders but for some reason it will not spit out completed frames

Hi, we have been really doing our best to setup and rely on using the omniverse farm system for 3d rendering using create over the past 6 months or so. It’s a great system and has been an overall positive journey that I remain optimistic about.

There is an issue though w a system recently added that is using an rtx 4090 gpu. No matter what we try, we can’t seem to get this system to write the frames that it renders to the desired folder location. Every other machine on our network is performing correctly, mostly setup with 3090 cards and some machines with 2080 cards.

My instinct tells me that this is probably due to a compatibility issue with the newer 40 series cards. Reason being that this system was working just fine with a 2080 gpu before we made the upgrade to 4090. We have tested just about everything you could think of in order to rule out the possibility of us making a fundamental mistake with our configuration.

  • Complete reinstallation of omniverse on this system, deletion of all omniverse files and running the cleaner before doing a fresh install

  • Setting up a separate nucleus server on a separate machine to ensure that this is not an issue with the nucleus server we have been using for a while.

  • Ensuring everything is updated to the latest farm agent and farm manager releases. +also making sure we are using the latest non beta release of create.

  • Experimenting with various different latest studio + recommended nvidia driver versions as outlined in the logs of the farm queue tasks.

  • Many tests where we are sending jobs set to use both path trace and iray settings as well as testing with the most basic of scenes with just a single object.

  • Testing and confirming that local rendering on this 4090 system works just fine when rendering with the movie maker in create.

Again my gut feeling says that we are not really making any mistakes as we have really had a good deal of experience learning how to setup and use this system overall, so I still currently believe that this is a 40 series compatibility issue. The 4090 systems farm agent works in so far that it accepts its tasks and clearly renders its frame sequence, but once a frame completes, it does not write this frame to its temp folder, and subsequently does not then copy this file to its set destination. It just sort of moves on to the next frame. Other machines post a string in the logs that goes something like this:

 [omni.services.render.file_manager] Processed C:/Users/#######/AppData/Local/Temp/x1d00.0\tmpm02ri6pw\capsule_frames\capsule.0062.png. Was task successful: True

The 4090 system simply does not post this step to it’s logs. For whatever reason it wants to skip this. All other non 4090 based systems we are using do.

I am hoping perhaps someone at nvidia can simply test this out and confirm if they are able to use a 4090 based farm agent to contribute to farm queue job tasks. And if my diagnosis is correct I would then say can we look into updating the farm agent and/or manager to be sure that this 40 series compatibility is expanded upon. If I can provide any more detail that would help get to the bottom of this, please let me know.

I am having a similar issue on a 4090 I just installed. It seems like the processed samples are not saving into the temp files that are created on my 4090 system. The same render runs fine on my 3090ti through the farm queue

Working on 3090ti

2022-12-30 19:23:37 [495,206ms] [Info] [omni.kit.capture.viewport.extension] Capturing C:/Users/####/AppData/Local/Temp/xtv0.0\tmpjfmfgiy7\euclid_test_iray128_frames\euclid_test_iray128.0331.png
2022-12-30 19:23:41 [498,885ms] [Info] [omni.services.render.file_manager] Processed C:/Users/####/AppData/Local/Temp/xtv0.0\tmpjfmfgiy7\euclid_test_iray128_frames\euclid_test_iray128.0331.png. Was task successful: True
2022-12-30 19:23:44 [502,088ms]

Not working on 4090

2022-12-30 19:21:02 [397,954ms] [Info] [omni.services.render.tracker] Processed samples 119 for frame 6. Elapsed frame time: 48.09s, Average frame time: 43.38s. Estimated time remaining: 3h 49m 6.03s
2022-12-30 19:21:06 [401,751ms] [Info] [omni.kit.capture.viewport.extension] Capturing C:/Users/####/AppData/Local/Temp/x94s.0\tmpftgshkdk\euclid_test_iray128_frames\euclid_test_iray128.0006.png
2022-12-30 19:21:07 [403,285ms] [Info] [omni.services.render.tracker] Processed samples 2 for frame 7. Elapsed frame time: 1.41s, Average frame time: 44.64s. Estimated time remaining: 3h 55m 51.28s
2022-12-30 19:21:12 [408,371ms]

It just skips the Processed (file path)… step of the log and does not save anything into the temp folder. Hope there is an easy solution that is found for this.

1 Like

I’m having the same problem on 4090, can anybody help? it’s really urgent.

1 Like

As I have by coincidence also purchased a 4090 for my home machine recently, I’ll go ahead and set this up with a few test jobs using the latest farm queue system. I’ll have a chance to try this out in a few days and report back with the results.

1 Like

While I’m sure it’s a pain to not have the full organized automation of farm queue, if there is a way you can keep your job moving by manually using the movie maker I can confirm this does work w the 4090. I haven’t explored for example testing the use of older versions of the farm manager/agents but there is a small chance those could perform correctly. I wouldn’t personally stake much on this method though. Best of luck

1 Like

Thank you for your response and I will look forward to your report as well. At the moment, I’m doing all the rendering manually, which is a huge pain to be honest, but I have no choice since I’m on a tight deadline. I hope omniverse developers give this bug some attention soon and I assume it wouldn’t be too difficult to fix?

Anyway, I wish everyone a happy new year.

Hello everyone! Thank you for posting your issues about this. I have sent this over to the dev team to evaluate and fix!

A development ticket was created from this post. OM-77396: Rtx 4090 based farm agent issue - it renders but for some reason it will not spit out completed frames

@deeplerning @aliaguero1616 Very sorry for the inconvenience! Could you please upload a full log so that we can take a closer look at the problem? Thank you!

@quchen Is there a way to send logs privately? there is a lot of information in these logs that could be sensitive, like system build info and what not. I could give specific portions of the log at request if not.

Ok I’ll work on getting the logs for you. In the mean time I also wanted to report that I did go ahead and run another test on my home system as planned and am getting the same exact phenomena. 4090 takes the job, renders, but frames never get generated and sent to the temp location.

If it’s OK, can you please upload it to some where and just give me access to download, and remove it after I download it? Thanks!

4090_logs_home_machine.rtf (1.6 MB)
4090_logs_home_machine.docx (829.7 KB)

here we go / this is the same info but posted in 2x file format options docx + rtf

@quchen Let me know when you have them. WeTransfer - Send Large Files & Share Photos Online - Up to 2GB Free

@cmilne I’ve downloaded the zip file and got the two log files. Thanks very much! I will look into it and update my findings here.

@deeplerning Thank you very much too for the log files. I will also look into them.

From your logs I found out that you’re still with Farm Agent 102.5, while the latest one that work with Create 2022.3.1 should be 104.0. So could you please help try to upgrade your Farm Agent and see if it can help? Thank you!

@quchen From the logs I submitted, I see farm agent 104.0.0. I could be looking in the wrong place in the logs possibly. I also double checked the installation and I have 104.0 installed.
image

I cannot speak for the others, but unless it is reverting to a previous installation without my knowledge, I am and have been running 104.0

2022-12-30 19:14:24 [5ms] [Info] [omni.ext.plugin] [ext: omni.services.facilities.base-1.0.2] registered (path: c:/users/####/appdata/local/ov/pkg/farm-agent-104.0.0/jobs/create-render/exts-job.omni.farm.render/omni.services.facilities.base-1.0.2)

I ran another test just now, I complete fresh install of all Omniverse tools (queue, agent, create). I did notice one reference in this to 102.5 in the first line. But then in the third line of the logs it references the correct ver 104.0.

[Info] [carb] Logging to file: C:/Users/####/.nvidia-omniverse/logs/Kit/omni.farm.agent/102.5/create-render_render.run_7d882f23-d357-4bf1-a0bb-0e6cbdb49918_20230104_162534.log
2023-01-05 00:25:57 [0ms] [Info] [omni.structuredlog.plugin] successfully loaded the settings file 'C:/Users/####/.nvidia-omniverse/config/privacy.toml' into the settings registry.
2023-01-05 00:25:57 [0ms] [Info] [omni.kit.app.plugin] App Name: 'Create.Next', App Version: '2022.3.1-rc.25', Kit Version: '104.1+release.387.3b4671f3.tc'
2023-01-05 00:25:57 [0ms] [Info] [omni.kit.app.plugin] Argv: [C:\Users\####\AppData\Local\ov\pkg\create-2022.3.1\kit\kit.exe, C:\Users\####\AppData\Local\ov\pkg\create-2022.3.1\apps/omni.create.kit, --/log/level=Info, --/log/fileLogLevel=Info, --/log/outputStreamLevel=Info, --/log/flushStandardStreamOutput=true, --/app/python/logSysStdOutput=true, --/app/python/interceptSysStdOutput=false, --/plugins/carb.scripting-python.plugin/logScriptErrors=true, --/app/fastShutdown=true, --merge-config=C:\Users\####\AppData\Local\ov\pkg\farm-agent-104.0.0/jobs\create-render\job.omni.farm.render.kit, --enable, omni.services.render, --/app/file/ignoreUnsavedOnExit=true, --/app/extensions/excluded/0=omni.kit.window.privacy, --/app/hangDetector/enabled=0, --/app/asyncRendering=false, --/rtx/materialDb/syncLoads=true, --/omni.kit.plugin/syncUsdLoads=true, --/rtx/hydra/materialSyncLoads=true, --/rtx-transient/resourcemanager/texturestreaming/async=false, --/rtx-transient/resourcemanager/enableTextureStreaming=false, --/exts/omni.kit.window.viewport/blockingGetViewportDrawable=true, --ext-folder, C:\Users\####\AppData\Local\ov\pkg\farm-agent-104.0.0/jobs\create-render/exts-job.omni.farm.render, --/exts/omni.services.farm.agent.runner/assigned_task=7d882f23-d357-4bf1-a0bb-0e6cbdb49918, --/log/file=C:/Users/####/.nvidia-omniverse/logs/Kit/omni.farm.agent/102.5/create-render_render.run_7d882f23-d357-4bf1-a0bb-0e6cbdb49918_20230104_162534.log, --enable, omni.services.farm.agent.runner, --no-window, --/exts/omni.services.farm.agent.runner/controller=http://localhost:8223/agent]
2023-01-05 00:25:57 [1ms]

Respectfully, I can fully confirm that this is not the case. 102.5 is several versions back and I have only ever installed the104.0.0 version on my home system. I am also double checking to make sure I’m using the latest farm queue manager and that also is most certainly the latest ver.

I would speculate that the way the tools are programmed currently attempt to reference 102.5 for one reason or another and have not been fully ironed out. Also keep in mind that everything works aok with all other non 4090 gpu based systems in our tests.