Mind that OptiX is fully programmable in all this!

There are many ways to define a camera coordinate system.

ray_direction = normalized(pixel position - eye) ?<<

Whatever you define the pixel position to be, yes, this is a one way to define some camera.

in Optix, one thread treats one pixel?<<

If you programmed it to do that, yes, normally you’d let one launch_index handle one primary ray when generating images.

(OptiX is a generic ray casting SDK, it doesn’t necessarily need to synthesize images, check the collision example)

in UVW, W determines ‘center’ of a ‘rectangle’ (not plane)?<<

UVW spans an arbitrary parallelogram and W points from its local coordinate system origin to the center of that parallelogram. (That also defines a plane if you do not limit the region with the extends of UV.)

A rectangle is a special case of that, mind that W doesn’t need to be perpendicular to that parallelogram, alas that UVW could define a sheared view frustum. It’s a simple but mighty construct.

Your example uses orthonormal vectors so just have a square of 2x2 units in 1 unit distance along the negative z-axis, similar to what OpenGL does, in right-handed world coordinates looking down the negative z-axis. (The UVW camera coordinate system itself (projection) is actually left-handed.)

U and V determine the range of the rectangle?<<

Yes, the size of the upper right quadrant of the parallelogram.

the rectangle covers the area from corner (min(d.x)*u,min(d.y)*v) to (max(d.x)*u,max(d.y）*v) ?<<

Right.

If all the above is right, d.x*U + d.y*V + W is one pixel．ray_direction should be normalized(d.x*U+d.y*V+W-eye). Where is my mistake?<<

You’re mixing positions and vectors.

That -eye is implicit by using the eye position(!) as ray origin when defining the ray.

It starts at the eye and points into the ray direction which is a vector(!).

Vectors do not define positions in space, they just point into some direction.

(If you know what homogeneous coordinates are, vectors have w == 0.0, positions have w != 0.0.)

Let’s make it more figurative, you define a camera coordinate system (projection) with those UVW coordinates. It’s like holding an image frame and looking through it. The UVW vectors define how(!) you hold it, not where. Only if you place it relative to your eye position which becomes the root point (origin) of that local camera coordinates system it becomes a fixed view into the world.

If that is not clear maybe grab some standard computer graphics books first.