I would like to change de tokenizer of the nlp pipeline.
I am using other model (Roberta and Bertin) for Spanish NLP, but i want to change the tokenizer. When i execute the helm for nlp-pipeline, we can declare a file that have hashes inside (now it have a bert-uncase tokenizer).
I would like to know how i can generate this file using other model and how i can use it when i called with the helm.
Hi there! We’ve taken note of your requirement and Engineering will be evaluating the level of effort for this. It actually turns out that this is a request that has already been tracking in the RAPIDS cuDF project:
Morpheus currently uses the cuDF GPU-accelerated BERT tokenizer only. However, it may be possible to do this using a CPU-based BPE like this one as a proof-of-concept: