CUDA Ray Tracing 2x Faster Than RTX: My CUDA Ray Tracing Journey

See comments as to why this is misleading https://www.reddit.com/r/GraphicsProgramming/comments/1ljmf0...

I think the title of this should be changed, at the moment it's click bait. It should be something like:

CUDA Ray Tracing 2x Faster Than RTX when Rendering Spheres

As far as I can see, this renderer can't do anything else except spheres (and maybe planes).

It's no bad achievement to beat a general purpose production renderer at one specific thing, but a renderer that can only do spheres is just a hyper-optimized toy, and here it's being presented as far more than that.

I’m guessing it’s because they’re using all the computing power the GPU has to offer in CUDA mode, as opposed to sharing the GPU with other functions (when in RTX).

More likely it's because the scene they're using is completely unrepresentative of what people are interested in: almost no triangles, primarily procedural nodes (for spheres), and in general a fairly simple scene.

Yup this is an "assume spherical cow" situation where it's not dishonest, but you can't draw any real world conclusions from the experiment unless you happen to be working in a very restricted space.

Wouldn't you need to in a real world scenario make the CUDA cores aware of the game geometry adding more work on the CPU?

Ideally you don't make the cuda cores aware but rather the ray-tracing circuitry. RT cores are designed to perform ray-triangle intersections in a BVH. You get the teraflops and memory bandwidth (or more of it) if you fit the RT-core computing model.

And in most cases it's ok to spend time on one CPU function (creating and loading the BVH) against the hundred thousands of frames you'll be drawing on GPU.

A whole lot of stuff is going on during gaming and graphics rendering with trick upon trick to squeeze out every last bit of performance. Unless you're an expert in a graphics rendering stack or a game engine it's hard to have these conversations in a meaningful way.

wow, bypassing a rendering backend makes things go faster, what a surprise!

This only runs on nvidia, vulkan is designed to be cross-compatible with not only gpus, but operating systems as well. Vulkan is pretty direct compared to something like dx11 thought so I guess it is interesting to see performance improvement non the less.

> FMA performance here is a non-issue, I'm not just flexing—I'm showing off my CUDA prowess. But hey, got to demonstrate I know my hardware!

This article is pretty embarrassing, and as others have noted, very misleading due to the RTX units hardly being used.

> __restrict__ Pointers

Ahh, my favorite nitpick from C++ not having sane default aliasing rules spills to the CUDA-land.

Is hard to have them, when one of the original goals was being mostly copy paste compatible with C89.

Yes, though C has restrict in the language now, but C++ does not.

Because no one has ever bothered to create a WG21 paper proposal to include it.