I run this notebook :NeMo-Curator/tutorials/synthetic-data-hello-world/Synthetic Data Generation - Hello World Examples.ipynb at main · NVIDIA/NeMo-Curator · GitHub .Why is the length of the output list 374 instead of 20?
Here are the relevant screenshots .
Hi @ambo. When applied to a string, the Python len()
function returns the number of characters in that string - hence 376. If you want to count the number of words in the string, try len(open_qa_questions.split())
.
oh my bad - sorry for not reading this fully. let me take a look now! thanks for your patience.
Hey!
It’s certainly not supposed to generate so many open-lines.
Would you be able to post a snippet of the output?
It looks like the model may have been overzealous in generating topics or subtopics.
Thanks,
Chris
Thanks @ambo. What does open_qa_questions
return?
I believe this is an artifact of model verbosity - can you confirm you’re using Meta’s Llama 405B model?