Morpheus bug with _srf_executor and C++ issue

Since This is a new account I can only link 1 image in this post – which is why all the output logs are in one image and marked by numbers here:

Currently our group goal is to get the basic pass-thru pipeline working consistently before moving onto the phishing model. I was able to build a pipeline and read in example data as shown in (1).
A screenshot of the pipeline outputting successfully is shown in (2).

I am now encountering two issues, one that is python based that occurs when building the pipeline, one that is C++ based after the pipeline is built.
First, the python issue. I have yet to find a consistent cause for this issue to occur; sometimes it will build the entire pipeline as anticipated, other times it breaks as shown in (3).

In the init function for the Pipeline class I see that the _srf_executor is initially set to none, but is then set to the value returned by the Executor’s init call in the Pipeline.build function, where it appears to remain None. I am inclined to believe that this is a byproduct of the runtime error: “intersection between user_cpuset and topo_cpuset is null”.

Next, the C++ issue we are encountering is documented in the error file (4) from one of the jobs we ran when the python issue noted above was not happening.

This issue appears to be with the CUDA DataFrame package being unable to query the file size of the starter data.

Any help on either of these problems is greatly appreciated. If there is any more information that is needed to assist here, let me know and I will provide it ASAP.

Hi there! Apologies on the delay in replying but the team was busy getting the Morpheus 22.11 release out the door. On that note, have you yet replicated your issue with the 22.11 release?

If you’re still seeing a problem could you please file a detailed issue here: Issues · nv-morpheus/Morpheus · GitHub

At that point, our team can take a closer look at your issue and possibly reproduce it.

Thanks,
\Pete

From when I last worked on it the problem persisted. Would you recommend updating to the 22.11 release and then testing again to see if that potentially solves the problem, or would staying on the 22.09 release be a better option? Additionally, is there anywhere I can look to see what was altered between the 09 and 11 releases?

Thank you for your response!

If you could try again with 22.11 that would be great. Are you introducing any new code to the example at Basic Pass Through? If not, we will try to reproduce here also.

@pmackinnon

I have attempted to re-run the pass_thru.py example after updating the shell scripts I used to use the 22.11 runtime.

For context, we are students attempting to learn how to use Morpheus for a senior design project. Since we are on the school’s HPC, we are not given docker level access and instead are using singularity. We have been translating the Github docker commands into singularity, which I figured was worth noting. Admittedly I am not too well versed in the difference between the two.

A quick overview of my workflow:

I created a shell script passthru_runner.sh that exports the MORPHEUS_ROOT and runs the run_passthru.py file noted below:

#!/bin/bash
#SBATCH --gpus=1

export MORPHEUS_ROOT="/data/sdp/cybersecurity_ai/sif/morpheus-22.11-runtime.sif"
. /opt/conda/etc/profile.d/conda.sh

python /data/sdp/cybersecurity_ai/files/pass_thru/run_passthru.py

Below is the run_passthru.py file.

import logging
import os

from morpheus.config import Config
from morpheus.pipeline import LinearPipeline
from morpheus.stages.general.monitor_stage import MonitorStage
from morpheus.stages.input.file_source_stage import FileSourceStage
from morpheus.utils.logger import configure_logging

from pass_thru import PassThruStage


def run_pipeline():
    #print("DEBUG: " + MonitorStage.__path__)
    # Enable the Morpheus logger
    configure_logging(log_level=logging.DEBUG) 

    root_dir = os.environ['MORPHEUS_ROOT'] 
    # input_file = os.path.join(root_dir, 'examples/data/email_with_addresses.jsonlines')
    input_file = "/data/sdp/cybersecurity_ai/files/pass_thru/email_with_addresses.jsonlines"

    config = Config()

    # Create a linear pipeline object
    pipeline = LinearPipeline(config)

    # Set source stage
    pipeline.set_source(FileSourceStage(config, filename=input_file, iterative=False))

    # Add our own stage
    pipeline.add_stage(PassThruStage(config))

    # Add monitor to record the performance of our new stage
    pipeline.add_stage(MonitorStage(config))

    # Run the pipeline (This is where it fails based on the traceback)
    pipeline.run()

if __name__ == "__main__":
    run_pipeline()

Lastly, my pass_thru.py file:

import typing

import srf

from morpheus.cli.register_stage import register_stage
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair


@register_stage("pass-thru")
class PassThruStage(SinglePortStage):
    """
    A Simple Pass Through Stage
    """

    @property
    def name(self) -> str:
        return "pass-thru"

    def accepted_types(self) -> typing.Tuple:
        return (typing.Any, )

    def supports_cpp_node(self) -> bool:
        return False

    def on_data(self, message: typing.Any):
        # Return the message for the next stage
        return message

    def _build_single(self, builder: srf.Builder, input_stream: StreamPair) -> StreamPair:
        node = builder.make_node(self.unique_name, self.on_data)
        builder.make_edge(input_stream[0], node)

        return node, input_stream[1]

I make sure to have a GPU allocated for me to run on, and then conda activate morpheus to ensure we are using the container. From there, I run the passthru_runner.sh script. When I run this script, I get this output:

====Registering Pipeline====
Error occurred during Pipeline.build(). Exiting.
Traceback (most recent call last):
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 277, in build_and_start
    self.build()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 175, in build
    self._srf_executor = srf.Executor(self._exec_options)
RuntimeError: intersection between user_cpuset and topo_cpuset is null
Traceback (most recent call last):
  File "/data/sdp/cybersecurity_ai/files/pass_thru/run_passthru.py", line 40, in <module>
Exception occurred in pipeline. Rethrowing
Traceback (most recent call last):
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 251, in join
    await self._srf_executor.join_async()
AttributeError: 'NoneType' object has no attribute 'join_async'
====Pipeline Complete====
    run_pipeline()
  File "/data/sdp/cybersecurity_ai/files/pass_thru/run_passthru.py", line 37, in run_pipeline
    pipeline.run()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 517, in run
    asyncio.run(self._do_run())
  File "/opt/conda/envs/morpheus/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/envs/morpheus/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 495, in _do_run
    await self.join()
  File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 251, in join
    await self._srf_executor.join_async()
AttributeError: 'NoneType' object has no attribute 'join_async'

I am attaching an image of the same output above, as in my terminal it is color coded and that does not translate to text here:

Any help is appreciated, as getting this basic python stage functional would be pivotal in allowing us to attempt to create our own custom modules. If you need any more information, let me know and I’ll get it to you ASAP. Thank you for your help.

Hi, I can’t reproduce this but after consulting with our Engineering team we have a theory of what may be happening. I’ve created this issue based on that: [FEA]: Improve pipeline cpuset logic · Issue #551 · nv-morpheus/Morpheus · GitHub

We suspect you are running in an environment that has one or more cpusets. Currently, Morpheus doesn’t account for that scenario, thus the issue I filed on your behalf.

Thank you for created the issue [FEA]: Improve pipeline cpuset logic · Issue #551 · nv-morpheus/Morpheus · GitHub