Algorithm for computing Cosine Similarity in K-dim

Don’t know if this is the right forum for this question, but here it goes anyway.
And sorry for the lenghty post upfront:

Some background info:
I would like to use vertex/pixel shaders (using cuda or other means, I’m not sure yet) to efficiently calculate the similarity of items (for instance products) in a k-dimensional feature space. maybe some mor info:

Each item has a feature-value for each feature. And 2 featurevalues have a defined (fixed) similarity-score for that particualr feature. The overall similarity of 2 items is than the (weighted) average over all feature-similarities.

Based on this similarity measure it would then be possible to for instance get the top-10 similar items (for instance a product) for a featured item (product).

For a top-10 to be returned, first all similarity-scores for all items need to be calculated. Since there are a) a lot of items and b ) similarity can be calulated in isolation (no other information is needed than the featue-values of 2 items) it seems to me that this is an algorithm that can be efficiently programmed on the GPU.

My Questions
First of all, Am i correct in thinking that this computation can in deed be efficiently done?

How can this be done? The main trouble i see is that GPUs are optimized for calculations in 3 (and 2) dimensions. If items would only have 3 dimensions than the similarity between 2 items (when these items were represented as vertices v1 and v2) is simply given by computing: cos(v1,v2).

This asks for a totally different method, and I have done some thinking, which i try to formulate below. What i would like to know: Am I on the right track? Performance wise? Is there a better alternative that maps much better to the problem I’m describing?

So here is what I’m thinking about

Present each feature as a 2d texture
– each texel in the texture presents the similarity ([0…1]) of 2 feature-values
– the first feature value (x-coord) comes from featured item with which all items need to be compared
– the second feature-value (y-coord) comes from one of the items which are compared against the featured item.

A item is than presented as a vertex with the following information:
– id of the item (which can be saved in the position-info for instance)
– 1 feature-value for each value, which are presented as texture-coords as mentioned.

The similarity of the 2 items is than simply the (weighted) average of all texture-lookups (or texel-values)

This score can than be outputted as a colorvalue for each vertex (item).
Then if we calculate this in screen-space (and somehow make sure that 1 vertext is rendered as 1 pixel)

Than the output (rendered to texture or something) represents the items and their scores (as pixels and their colors).

This means that the algorithm will basically be written in the vertex shader, which does all the texture lookups and makes the average and output the color information.

The vertex shader than needs to be suppllied with the information of the featured item . The featured item is switched every frame, so that the algorithm computes a top-N (or complete sort based on similarity) each frame.

I hope this explanation made any sense at all… :)

Am I anywhere looking in the right direction? Other methods that would perform much faster? any help is highly appreciated.

Thanks in advance,
Joost