Lexicographic Sorting - Poor performance

Yes, I have GTX 470 downstairs. If your code is self-contained and will compile in Linux, I’m more than happy to try it out.

As of now, the swapping takes place between the pointers and not the entire words themselves.

As a note of correction, ptr array holds the offsets to the start of each word in the Words array. So, its not an address its storing, just an unsigned int type offset.

I had considered utilizing the texture memory.

The only advantage I’ll be leveraging on using the texture memory is reducing the global (scattered) reads and writes to the ptr array.

But since the texture memory supports storage of Read-only data, I really can’t use it to my benefit for doing operations on ptr.

There still remains the bigger issue of the global reads from the Words array, which is done once the offsets from ptr are read; and I perform only global reads from Words to do the comparison. Hence, I’d be looking forward to placing Words in the texture memory.

But now, Words is a little too demanding in memory. It shall easily take around 10^6 bytes or 100 KB.

Hence, the dilemma :|

As of now, the swapping takes place between the pointers and not the entire words themselves.

As a note of correction, ptr array holds the offsets to the start of each word in the Words array. So, its not an address its storing, just an unsigned int type offset.

I had considered utilizing the texture memory.

The only advantage I’ll be leveraging on using the texture memory is reducing the global (scattered) reads and writes to the ptr array.

But since the texture memory supports storage of Read-only data, I really can’t use it to my benefit for doing operations on ptr.

There still remains the bigger issue of the global reads from the Words array, which is done once the offsets from ptr are read; and I perform only global reads from Words to do the comparison. Hence, I’d be looking forward to placing Words in the texture memory.

But now, Words is a little too demanding in memory. It shall easily take around 10^6 bytes or 100 KB.

Hence, the dilemma :|

I’ve developed my code on Microsoft Visual Studio 2005.

Hence, I haven’t explicitly made the make files and stuff.

Would it still be ok?

I’ve developed my code on Microsoft Visual Studio 2005.

Hence, I haven’t explicitly made the make files and stuff.

Would it still be ok?

Is it easy to compile, with only a few files? I have no access to Windows.

Is it easy to compile, with only a few files? I have no access to Windows.

Umm, how about this - I’ll shrink my whole application down just to operate the sorting function.

It’ll be a .cpp file taking a text file(considerable sized) as input which shall call two .cu headers.

One of the .cu headers shall have the kernel definition and the other .cu shall have a kernel definition for the comparison of given two strings, which shall be called from the first .cu file.

How about this? It’s straightforward I hope?

Umm, how about this - I’ll shrink my whole application down just to operate the sorting function.

It’ll be a .cpp file taking a text file(considerable sized) as input which shall call two .cu headers.

One of the .cu headers shall have the kernel definition and the other .cu shall have a kernel definition for the comparison of given two strings, which shall be called from the first .cu file.

How about this? It’s straightforward I hope?

Yeah, sounds good. Feel free to PM me a link to where to download it, and I’ll give it a try on both the GTX 470 and the GTX 295 here.

Yeah, sounds good. Feel free to PM me a link to where to download it, and I’ll give it a try on both the GTX 470 and the GTX 295 here.

Sure thing. Shall do.

Sure thing. Shall do.

Hi, I am very interested in this code. I would like to collaborate as well. I’m specifically interested in construction in a suffix array. I have access to Fermi architecture. Can I help?

Hi, I am very interested in this code. I would like to collaborate as well. I’m specifically interested in construction in a suffix array. I have access to Fermi architecture. Can I help?

Sure :)

The thing is, I’m avoiding suffix arrays on purpose, due to reasons specific to my research.

Hence, I would be looking forward to performing the sort operations on set of strings provided as a 2D matrix, where each row represents one word and the columns shall contain the characters of each of these words.

If the above functionality fits your work, I would be more than happy to receive inputs from you :)

I’ve PM-ed you a link to my code.

Sure :)

The thing is, I’m avoiding suffix arrays on purpose, due to reasons specific to my research.

Hence, I would be looking forward to performing the sort operations on set of strings provided as a 2D matrix, where each row represents one word and the columns shall contain the characters of each of these words.

If the above functionality fits your work, I would be more than happy to receive inputs from you :)

I’ve PM-ed you a link to my code.

Have PM-ed the link. Please let me know the results it gives on your device.

I’ve provided a mail ID in the read me file, for correspondence.

Thanks for your time.

Shashank

PS:

I’ve simplified the file further. You’d just have to compile the lone cuda file providing the two input files through command line. No .cpp and linking to worry about.

Have PM-ed the link. Please let me know the results it gives on your device.

I’ve provided a mail ID in the read me file, for correspondence.

Thanks for your time.

Shashank

PS:

I’ve simplified the file further. You’d just have to compile the lone cuda file providing the two input files through command line. No .cpp and linking to worry about.

Thanks for the link. Am looking it up.