How to setup a multi machine Render Farm (a personal size one)?

Hello everyone !

I have been searching for quite a long time a way to correctly setup a personnal Render Farm (meaning only using my personnal machines, laptop or tower computers). I just couldn’t make it work properly and I thought that having a clear guide here of how to do it would be a good idea !

Let’s say we want to set up a rendering pipeline between Create, Farm Queue and Farm Agent.
After having installed the 3 softwares on all the machines planned to constitute the Render Farm, first question :
Is it necessary to have a dedicated Nvidia account per machine? Or is it possible to use the same account on all the machines that will be used for the Render Farm ?

Second question: Do you absolutely have to use Nucleus to be able to render in Render Farm ? Or is it possible, for example, to work on the Create project on the laptop, to queue it in Farm Queue (managed by the laptop) and to define that the images resulting from the rendering will be stored on the laptop’s hard drive ?
If it is possible to proceed in this way, what must be done, under a Windows system (10 or 11) so that the different machines can access, on the one hand, to the different jobs to be carried out and, on the other hand, to the rendering destination’s folders ?

Third question : I noticed that if the firewall doesn’t have any specific rules, it totally prevents the Render Farm system from working, what are the rules to apply to make the system fully work ? What processes should be allowed ? Which ports should be forwarded ? Etc…

Forth Question : Are there any specific environment variables to create for a local Render Farm such as this one to work ?

I don’t have any other ideas for questions in mind for the moment, but I wanted to open the questions as widely as possible in order to cover as many themes as possible, that’s why some questions may seem a little "stupid ".
Not having found all these answers on the Internet (and having, in fact, not been able to make the Render Farm work), I thought that having as much advices as possible in this topic could help as many people as possible !

Feel free to add even more advices to help build a personal Render Farm in this thread, I probably haven’t covered everything, and you probably have some tips to share with other creators !

Thank you in advance to all of you for your contribution.

Best Regards,
BillyJohn.

Hi @billyjohn1! Glad to hear you are playing with the Omniverse Farm! The first place I want to direct you, if you haven’t already been there, is to our documentation here: Omniverse Farm — Omniverse Farm Agent and Farm Queue documentation

I have asked the development team to take a look at this post so that they can help me answer all of your questions! I will post back here when I have more information!

Hi @billyjohn1 !

@WendyGram kindly reached out to let us know that you were trying out the rendering capabilities of Omniverse Farm on your own resources. That’s great! Sorry to hear you have ran into some issues while getting set up.

It should be possible to use your own personal render farm using the resources you mentioned, provided that:

  • The machines are all present on the network
  • The machines that will perform the renders all have Create installed
  • The machines that will perform the renders additionally have Farm Agent installed
  • One single machine has Farm Queue installed

You should be able to use the same account on all the machines that are part of your in-house farm. While we recommend using Nucleus as the storage solution for content to be rendered for the convenience it offers in terms of sharing, you should also be able to specify a network location or mapped drive where the scene to be rendered is located (provided the drive is mapped to the same letter on all render nodes, and network locations are all mounted).

Regarding firewall potentially blocking access to instances, you may wish to validate that:

  • The kit.exe process is not restricted
  • Inbound and outbound traffic is allowed on ports 8111, 8222 and 8223.

Other than these characteristics, there should be no need for setting other environment variables for rendering to be supported out-of-the-box.

To potentially help you diagnose what may be affecting your configuration, could you:

  1. Launch Farm Queue on the machine that will be hosting the instance of the task queue
  2. Note the machine’s internal IP address (often in the form of 192.168.1.x)
  3. From a machine that will be performing rendering, open a web browser and attempt to navigate to the IP noted above, at the following location: http://<IP address>:8222/queue/management/ui

You should be presented with a page listing any tasks that were previously submitted to the Farm Queue. If you are unable to access the page from machines where Farm Agents are running, this might indicate an issue with network connectivity.

The brief video introduction to Omniverse Farm at the top of this page runs through a similar workflow, and you may be interested to follow along to see if there are any particular steps at which you may encounter issues: Omniverse Farm — Omniverse Farm Agent and Farm Queue documentation

I hope this provides you with the information you were looking for!

Hi @WendyGram ! Hi @psawicki !

Thank you for your quick (and kind) answers !

I’ll try these advices to setup the Render Farm, still having only one (maybe stupid) question concerning the part where you were writing :

you should also be able to specify a network location or mapped drive where the scene to be rendered is located (provided the drive is mapped to the same letter on all render nodes, and network locations are all mounted).

Here is my question : If, on the computer managing the render (the laptop, in my previous example), the rendered files destination is set to “d:/Render” for example, and on the other machines that will compose the Render Farm, that desination is mapped to “x:/Render”, mainly because D is generally already used by a local drive, will it makes it impossible to work, as the drive letter is not the same ?

Best Regards.
BillyJohn.

Hi @billyjohn1 !

Using the example you provided where the rendering task is defined to use the D:/Render as output path when submitting a job from the Movie Capture tool in Create, all machines performing rendering will also attempt to write frames to their local D:/Render folder.

In cases where that location is unavailable, or if this location is write-protected on local machines, this would prevent rendering. This would also mean that you would then have to connect to each machine that performed rendering in order to copy frames back to one final location. There might also be data management challenges, such as local frames from other rendering tasks being deleted if the “Overwrite files” option has been selected when submitting the render task, as frames from one task may overwrite ones from a previous job.

This is one of the motivations for using Nucleus as a shared data storage solution over the network, as it facilitates collaborative work between Users and machines.

The alternative would otherwise be to map network shares mounted on the same letter for all machines (e.g. using a Z: drive on all machines). While possible, this, however, requires additional manual steps to be performed and may not scale as elegantly over time if you were to add/remove machines to your local farm.

I hope this brings some light to the questions you had.

Hi @psawicki !

I tried all the tricks you shared earlier…without success !
I managed to queue a job, still from the laptop, and I managed to connect the desktop computer to Farm Queue thanks to Farm Agent and the job is assigned to the desktop computer (I see it through the management page, on the laptop, and on the desktop computer, the job is statued to “Processing” on Farm Agent).

However, after a few seconds, the status of the job on Farm Agent on the desktop computer changes to “Idle” and, on the management page, on the laptop, the job changes to “Errored”.

It happens every time I tried, these are the scenarios I tried :

1 - I created a D:\Render folder on the 2 machines, so that the desktop computer can write the images resulting from the rendering in its folder, locally.

2 - I used Nucleus (on the laptop) to create a shared slot (and mounted to O: using Omniverse Drive), then I mounted that same slot, on the desktop computer, with Omniverse Drive, with the same letter O:, using the laptop’s local IP address. I then copied the whole project to this location, because I was wondering if, in scenario 1, the desktop computer could simply read the data necessary for the scene, thus hosted on the local disk of the laptop, in order to proceed with the rendering.

In each case, I re-created the job… and the job failed.

I added rules in my firewall, one for the ports to allow, and one per occurrence of “kit.exe”:
C:\Users<USERNAME>\AppData\Local\ov\pkg\create-2022.1.1\kit\kit.exe
C:\Users<USERNAME>\AppData\Local\ov\pkg\farm-queue-103.1.0\kit\kit.exe
C:\Users<USERNAME>\AppData\Local\ov\pkg\farm-agent-103.1.1\kit\kit.exe

What can I pass on to you so you can look for what’s blocking?

I can recreate the job (tell me which scenario would be best, or even a completely different scenario), then attempt once to assign the job to the desktop computer, let the job fail, and return the log to you.

Would it be interesting/wise ?

For information, when I put a job in queue on the laptop, and I connect this same laptop via Farm Agent, everything works correctly, and the laptop performs the different renderings listed in Farm Queue.

Best Regards,
BillyJohn.

Hi @billyjohn1 ,

You may wish to consult the job’s logs for additional information regarding the error reported by the Agent. This can be accessed by clicking the “Log” icon in the “Actions” column for your task the Job Management UI web page, and inspecting the log for potential errors or warnings regarding the task.

Otherwise, I would suggest simplifying the workflow to the strict minimum before attempting a more elaborate farm configuration. From this perspective, the video tutorial on the following page walks through a configuration example you may wish to replicate to confirm that your networking infrastructure is set up as expected: Omniverse Farm — Omniverse Farm Agent and Farm Queue documentation

I hope this simple example can be of valuable guidance for more elaborate workflows later on.

I had a huge project to complete in graduate school for my MFA thesis.

I needed to render out about a million animation frames and bake out maps for each frame. This took 3 days. That’s 3 days of sitting in front of the computer, waiting for the renders to finish. To make matters worse, at no point did I know how far along they were in the process; they could have been 50% done at the start, or they could have been 25% done. And that was just the beginning of the misery; once all those frames were rendered, I had to wait again while they baked and then wait some more while they uploaded to YouTube. Not so long ago, I discovered https://forgehub.net, which optimized this process for me. You can also use it for a render farm.