Change tokenizer from pipeline NLP - Nvidia Morpheus

Dear Nvidia,
I would like to change de tokenizer of the nlp pipeline.
I am using other model (Roberta and Bertin) for Spanish NLP, but i want to change the tokenizer. When i execute the helm for nlp-pipeline, we can declare a file that have hashes inside (now it have a bert-uncase tokenizer).
I would like to know how i can generate this file using other model and how i can use it when i called with the helm.
Thanks!

Hi there! We’ve taken note of your requirement and Engineering will be evaluating the level of effort for this. It actually turns out that this is a request that has already been tracking in the RAPIDS cuDF project:

Morpheus currently uses the cuDF GPU-accelerated BERT tokenizer only. However, it may be possible to do this using a CPU-based BPE like this one as a proof-of-concept:

1 Like

Thanks you!