Hi, first time using Parabricks and I’m hitting an error when running a deepvariant_germline
pipeline configured to write the output files to an S3 bucket. I have parabricks running in an AWS ec2 instance (from the AWS marketplace) and can run the pipeline successfully on the instance using the sample data provided in the tutorials.
Command I am running:
pbrun deepvariant_germline \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
--out-bam s3://TEST_BUCKET/output.bam \
--out-variants s3://TEST_BUCKET/variants.vcf
The error:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/parabricks/aws_assets/aws_check_output_file': '/opt/parabricks/aws_assets/aws_check_output_file'
Traceback:
Traceback (most recent call last):
File "/usr/bin/pbrun", line 10, in <module>
runArgs = pbargs.getArgs()
File "/opt/parabricks/pbargs.py", line 2645, in getArgs
return PBRun(sys.argv)
File "/opt/parabricks/pbargs.py", line 892, in __init__
self.runArgs = getattr(self, args.command)(argList)
File "/opt/parabricks/pbargs.py", line 1972, in deepvariant_germline
args = deepvariant_germline_parser.parse_args(argList[2:])
File "/usr/lib/python3.6/argparse.py", line 1743, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/usr/lib/python3.6/argparse.py", line 1775, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/usr/lib/python3.6/argparse.py", line 1981, in _parse_known_args
start_index = consume_optional(start_index)
File "/usr/lib/python3.6/argparse.py", line 1921, in consume_optional
take_action(action, args, option_string)
File "/usr/lib/python3.6/argparse.py", line 1833, in take_action
argument_values = self._get_values(action, argument_strings)
File "/usr/lib/python3.6/argparse.py", line 2274, in _get_values
value = self._get_value(action, arg_string)
File "/usr/lib/python3.6/argparse.py", line 2303, in _get_value
result = type_func(arg_string)
File "/opt/parabricks/pbutils.py", line 136, in IsFileStreamWritable
return IsS3FileWriteable(outputName)
File "/opt/parabricks/pbutils.py", line 172, in IsS3FileWriteable
statusCode = subprocess.check_call([scriptDir + "/aws_assets/aws_check_output_file", outputName])
File "/usr/lib/python3.6/subprocess.py", line 306, in check_call
retcode = call(*popenargs, **kwargs)
File "/usr/lib/python3.6/subprocess.py", line 287, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
The ec2 instance has an IAM role giving it access to the bucket, and I have confirmed that I can copy files from the instance to the S3 bucket with the aws cli, but the parabricks command is not allowing the output paths to be S3 paths, even though the documentation indicates an S3 output path can be used.
If anyone has suggestions of what I am missing please let me know, thanks!