Error accessing S3 bucket with Parabricks

Hi, first time using Parabricks and I’m hitting an error when running a deepvariant_germline pipeline configured to write the output files to an S3 bucket. I have parabricks running in an AWS ec2 instance (from the AWS marketplace) and can run the pipeline successfully on the instance using the sample data provided in the tutorials.

Command I am running:

pbrun deepvariant_germline \
  --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
  --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
  --out-bam s3://TEST_BUCKET/output.bam \
  --out-variants s3://TEST_BUCKET/variants.vcf

The error:

FileNotFoundError: [Errno 2] No such file or directory: '/opt/parabricks/aws_assets/aws_check_output_file': '/opt/parabricks/aws_assets/aws_check_output_file'

Traceback:

Traceback (most recent call last):
  File "/usr/bin/pbrun", line 10, in <module>
    runArgs = pbargs.getArgs()
  File "/opt/parabricks/pbargs.py", line 2645, in getArgs
    return PBRun(sys.argv)
  File "/opt/parabricks/pbargs.py", line 892, in __init__
    self.runArgs = getattr(self, args.command)(argList)
  File "/opt/parabricks/pbargs.py", line 1972, in deepvariant_germline
    args = deepvariant_germline_parser.parse_args(argList[2:])
  File "/usr/lib/python3.6/argparse.py", line 1743, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/usr/lib/python3.6/argparse.py", line 1775, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/usr/lib/python3.6/argparse.py", line 1981, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/usr/lib/python3.6/argparse.py", line 1921, in consume_optional
    take_action(action, args, option_string)
  File "/usr/lib/python3.6/argparse.py", line 1833, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "/usr/lib/python3.6/argparse.py", line 2274, in _get_values
    value = self._get_value(action, arg_string)
  File "/usr/lib/python3.6/argparse.py", line 2303, in _get_value
    result = type_func(arg_string)
  File "/opt/parabricks/pbutils.py", line 136, in IsFileStreamWritable
    return IsS3FileWriteable(outputName)
  File "/opt/parabricks/pbutils.py", line 172, in IsS3FileWriteable
    statusCode = subprocess.check_call([scriptDir + "/aws_assets/aws_check_output_file", outputName])
  File "/usr/lib/python3.6/subprocess.py", line 306, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 287, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)

The ec2 instance has an IAM role giving it access to the bucket, and I have confirmed that I can copy files from the instance to the S3 bucket with the aws cli, but the parabricks command is not allowing the output paths to be S3 paths, even though the documentation indicates an S3 output path can be used.

If anyone has suggestions of what I am missing please let me know, thanks!

Hey @alec8,

Unfortunately this is an error in the documentation. We no longer support S3 paths directly as the output for files. I’d recommend you write to your local ec2 instance storage and upload to S3 as a separate step. Or, you can try using EFS storage, which is more suited for these read/writes.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.