Match strings on a huge dataset

Hi all,

I want to do matching over two lists of strings for a extremely huge dataset (30 GB), but I totally have no idea how to make it efficiently. Would anyone being willing to tell me some correlated materials about how to do this?

Sorry for asking such a question that may be silly. I am a beginner of Fortran and I searched online but failed to find an answer.

Best,
Zejin

Hi Zejin,

Fortran isn’t really a good language to use for string matching and probably why you can’t find any good examples.

I’d recommend using python if this is all you need to do. Or if this is part of a larger Fortran application, you can write the string comparison portion in C, and then have the Fortran code call the C routine.

-Mat

Hi Mat,

Thank you very much for your suggestion!

I am going to write a totally new program, so that I would better to just use Python or other languages that are powerful dealing with strings.

Thanks,
Zejin