I stopped trying to make txt2kg generate triples for me

I am new. I never had a ML capable machine until I got my Spark. So, I am probably only helping new people with this.

I am scraping captions from videos. And I believed txt2kg could do the triples for me. That was probably my first wrong idea. I think it must be the wrong tool.

For me it is so very much easier to break the transcripts into sizes that are not larger than context for

and every time I refresh the page (to reset context) I feed it with this prompt:

Extract factual relationships from spoken instructional text.

Output format:

Subject || Relation || Object

Rules:

Ignore teaching language and repetition

Resolve references

Do not explain

Text:

Then NVIDIA AI Workbench I cloned the RAPIDS container to perform bulk on the previous output I saved into text files.

  • Robust parsing (skips markdown headers)

  • Normalization + dedup across files

  • Provenance preserved (sources, source_count, occurrences)

  • Outputs: nodes/edges CSVs, NetworkX exports, cuGraph analytics (if available), Neo4j Browser Cypher.

And I can visualize with local installation of Neo4j Browser. It will export json files. It is good enough for me for now. I’m saving the raw text in case the future I find I want to try something else.