Gaussian Point Splatting

It will be interesting to see the first AAA game that uses these methods instead of rendering a 3D world. Even if made from CGI worlds, it would be a very interesting approach and with somewhat predictable performances.

Reminds me of Ecstatica [1], a 1994 game that had intense visuals with a very odd/different rendering engine made of 3D ellipsoids; in a way really crude splats in gouraud shading.

[1] https://ecstatica.fandom.com/wiki/Ecstatica

Note that the first published work of rendering Gaussian Volumes was in this 1991 paper (https://articles.tomasparks.name/publications/Westover1991.p...) - so 3DGS is really a rehash of an old method from the 90s!

The contributions of 3DGS lie in how fast you can make them in modern GPU hardware (tiling + sorting with threads), and how to make the pipeline differentiable so that you can fit the Gaussian splats with photogrammetry data. Similar to the history of deep learning, it became technically feasible once the GPU hardware was powerful enough.

There was this FPS demo recently https://playcanv.as/p/qxGSuzYq/

People have also converted some small sections of Unreal 5 demos into splats https://superspl.at/scene/692c4f91

Or perhaps use a real world scan - it was suggested this one would make an ideal setting for zombies https://superspl.at/scene/6359774f

Bladerunner: Revelations used a similar technique to bake down large CGI worlds with expensive lighting into something that ran on a Pixel 1 at VR specs.

Its honestly really very hard to work with this stuff because you ultimately need to be able to meshes inside these scenes triangle seas and you need to do it in a way that plausibly fits in the world. You can't have unlit characters walking around a baked lit scene and have them fit in. That's just from a visual design perspective.

You also always want to have bounce light from your dynamic things onto the baked scene and depending on the tech, you might not even be able to spatially place a dynamic thing and have it properly occlude what splats it needs to occlude.

As is, its a niche technology for games. That might change one day.

https://github.com/googlevr/seurat https://www.youtube.com/watch?v=Pf5Q3bvXj8E

Many years ago there was a game called Casebook[1], a small little detective game where you investigated rooms for clues. But unlike similar FMV games where you jumped from point to point, it had photorealistic environments that could be smoothly walk around in, much like later lightfield or gaussian splatting experiments.

[1] https://www.youtube.com/watch?v=o-VAaC5BgVE

Any idea on how they achieved this?

I can't say I know how they actually did it, but taking a look at the trailer I can point out that it looks like the spaces are confined and your character is on rails. I'm mainly going off of the instant direction changes that don't appear to be 45 degrees off from the camera direction. Once it's constrained down to a single line/path you could do some wild things like cube mapping a video, where the position in the video is tied to the characters position. I can't say I know how they would take that video though, my best guess there is the scenes are constructed in 3d software, just it was to expensive for real time rendering.

Cube mapping a video sounds plausible, this is commonly known as 360° video. Putting the camera on rails (though I don't really notice rails in this case) and tying the video playback speed to the speed of the rail movement has also been done in the past in some pre-rendered PlayStation games, though without cube mapping. But I think it's not pre-rendered in this case. It looks far too realistic for a game that is at least 17 years old. My best guess: they captured the 360 degree videos with a real camera (stabilized in some way) and edited the equipment out frame by frame.

This is "rendering a 3D world". It's basically the exact same techniques that traditional rendering uses, just with a different primitive that's not triangles. Everything else pretty much carries over.

If you mean the technique of splatting specifically, Dreams for PS4 [1] is prior art.

If you mean pre-rendering, there's Myst and games like the original FF7 for PS1.

[1] https://en.wikipedia.org/wiki/Dreams_(video_game)

Dreams for PS4 used point splatting and has a very unique look as a result. The splats were created from distance fields instead of being scanned, so they don't look like modern gaussian splats. They have a painterly look instead. https://youtu.be/2ltgkcoQzow

People are rendering huge splat scenes on mobile devices using LOD. This (currently) requires CUDA and an NVidia GPU to work. I would have been much more impressed to see a demo where it was running on low end mobile hardware faster than current splat renderers can.

I'm probably being a bit of a grinch about it but the abstract doesn't address performance or hardware constraints either so I guess I'm going to have to read the damn paper.

I really wantt to get into splatting and I have the tools: good camera, v comfy in blender, comfy with graphics programming ideas, 4080. But I haven't found a good 'all in one intro' to it yet. Possibly because I'm foss-biased and have dismissed proprietary options. But does anyone know of a good 'vertical tutorial' on this stuff?

Maybe not exactly the kind of tutorial you're looking for but very enjoyable none the less: https://youtu.be/eekCQQYwlgA

I recently got into splatting. I looked for some good all-in-one tutorials, but didn't find any, and mostly muddled through through trial and error and LLM assistance. I present this workflow as a straight-line pipeline, though in practice it took a lot of iteration and backtracking and rework to get the final result. Here's what worked for me:

I captured a video on a smartphone camera, using the OpenCamera app. Specifically, this video was captured with exposure locked, framerate locked, focus locked, fairly high framerate and resolution. I walked slowly and carefully around an outdoor scene, trying to get fairly good coverage from multiple angles. I took roughly 20 minutes of video, weighing 19GB.

This video was sampled into individual image frames at about 5fps using ffmpeg. There's room for experimentation and improvement here, an adaptive, coverage-aware sampling strategy would be better. But fixed 5fps was Good Enough (tm). This resulted in roughly 8,000 images at 4k. This was a pretty hefty dataset for my limited 1080, but I made it work.

I then generated masks for these images, to ignore transient objects during the splat training. (i.e. to cut out people who transiently walked through the scene). For this I used Cutie (https://github.com/hkchengrex/Cutie). For outdoor scenes, it can also make sense to mask out low-parallax areas like faraway mountains or especially the sky, as these are difficult to train correctly. If masks are generated for some images, you'll need at least placeholder masks for the all of them. In the end I've got about 8,000 PNGs that are monochrome black/white masks.

Then the images are handed to COLMAP (https://github.com/colmap/colmap), using the 'global mapper' option. This registers the camera positions in 3D space, and generates a crude point cloud that's good for sanity-checking. This step required a fair bit of iteration to get right. The full reconstructed output from COLMAP is not necessary, only the pose-estimate .bin files. The output directory here was about 500MB for this step for me.

With COLMAP registration done, the next step is the actual training. I found two useful pieces of software for this, with different tradeoffs.

Brush (https://github.com/ArthurBrussee/brush). Was very straightforward to install and use, requiring very little in external dependencies and setup. It was also pretty speedy on training, and gave good results. Minor modifications to the training process were possible by editing source, though I didn't get too wild here. Brush takes the *.bin files from COLMAP, plus the original images directory, and the masks directory if it exists. Run on its own, this could produce gaussian splat .ply files, 500-800MB in size, containing 1-10M splats. More than that and my poor little 8GB of VRAM OOM'd.

nerfstudio (https://github.com/nerfstudio-project/nerfstudio) Was also useful, as many research papers get implemented in its framework. In particular, for this outdoor scene, I used wild-gaussians (https://github.com/jkulhanek/wild-gaussians/) to generate just a sky sphere (to help seed low-parallax areas in my particular dataset), stopped training, and used this as an init.ply to pass to brush.

I then set up a very simple viewer website, using SuperSplat (https://github.com/playcanvas/supersplat). I used supersplat's editor to align the splat's coordinate system with the rotation and scaling that I wanted, and then exported an optimized .sog file, roughly 1/10th the size. .sog is nominally open-standards, though I'm not aware of any other projects using the format. This gave fairly good framerates and adequate controls across a variety of platforms.

As a little bit extra, supersplat's splat-transform CLI tool was used to generate a crude collision mesh for the scene, enabling a walking mode that respected object boundaries.

If there's interest I can post my results, I got a bit sidetracked with other projects and other splats, and this particular one I got fiddling with some more cleanup. I can get it up with a few more hours work. But hopefully that's a good start, all of these are fully FOSS, and resulted in a good-looking splat.

Thank you for sharing!

When looking at their linked interactive viewer it looks like they need 128 spp for the image quality to equal 3dgs. Maybe you can reduce that with some temporal tricks and noise reduction filtering, but that's still a lot of samples.

Did not read the paper (sorry) but I wonder how this compares to mesh splatting (https://meshsplatting.github.io/). I feel like mesh splatting can produce higher quality results because triangles are very good at representing sharp features, and gaussians aren't.

But only in the same sense that triangles are bad at representing curves, right? It seems that’s a wash.

Can someone point to a resource/tutorial for learning point splatting (the 90s rendering technique)? Gaussian Splatting has completely over taken the search results, and the original technique is now near impossible to find.

Westover’s thesis https://www.cs.unc.edu/techreports/91-029.pdf

It's going to be even more impossible to find now because the present paper introduces "Gaussian point splatting".

I love this site design. It uses the entire width of the monitor rather than a slender column of pixels down the middle with large blocks of unused space on either side, with a font for my old man eyes.

> It uses the entire width of the monitor rather than a slender column of pixels down the middle with large blocks of unused space on either side

Umm on my machine it has 560px margin on both sides with the content being only 474px sliver in the middle?

Imo they need to pad it just a bit. My scrollbar overlaps.

Maybe use Tampermonkey?

Could this be a new direction for Google Streetview perhaps?

It seems like there are fairly regular posts on HN about splatting, and most appear to be fairly technical or proof-of-concept level. While the outputs look nice, I’m not sure that I could distinguish them from a nice ray-traced scene. What I think I’m missing is the “why?” of splatting. What are the material benefits of this area of research?

At the moment, combining your statement "I’m not sure that I could distinguish them from a nice ray-traced scene" and adding "your graphics card can move through them in real time so cheaply that it can easily be used as a component in other tech even at high frame rates" covers it pretty nicely. There's some research into how to make them move or do other things they don't do very well, but the fact that you can swoop through them in real time on cell-phone level of power means they fit a lot of niches. Plus the fact you can "record" them from a real-world physical environment without ever having to "model" it opens up a lot of utility too.

Personally I suspect they are getting a bit more attention then they "deserve"; people aren't talking about their weaknesses very much and I think that's resulting in some overexcitement. Some of the "we can replace everything with splats!" reminds me of the people who still don't understand why "if GPUs are thousands of times faster than CPUs why don't we run everything on GPUs?" is basically not even a sensible question. I don't see them as ever being the foundation of a graphics stack, but they definitely have a place as part of a well-rounded menu of techniques that can be brought to bear on a wide range of problems.

> Plus the fact you can "record" them from a real-world physical environment without ever having to "model" it opens up a lot of utility too.

This is the big thing imho. Sure, you can do traditional photogrammetry to capture meshes and textures but getting the shaders exactly right is afaik non-trivial etc, and if you want real-time rendering then you likely need some further post-processing of the assets. With 3dgs you can pretty much bypass all that complexity and the whole pipeline from photos to rendered frame is much more straightforward.

Really nice idea for 3DGS rendering - though the main problem is the noise (an unfortunate issue for all Monte-Carlo based methods).

I think future papers would probably continue improving on this method and focus on how to sample the points more efficiently while being unbiased (similar to how ray-tracing solved their performance issues). Or maybe... we can just add a deep-learning based denoiser and call it a day!

My dumb idea... do outdoor scans, and then convert the contents into 1m^2 blocks... And then, just dumbly stitch them together.

Kind of like Minecraft... but with user-generated gaussian-splat blocks.

This feels like Monte Carlo rendering applied to rasterization. I'm wondering if it's a brand-new or a well established methodology

It's not new - that was sort of my point with my other comment.

At least if it's progressive (so refines and resolves over time), this has been done with pointclouds in the VFX industry in GPU shaders for years in terms of stochastically drawing different points so eventually the whole point set gets rasterised to a fidelity threshold.

ookay, thanks for the clarification! So, the interesting part here seems to be the 3DGS-specific opacity correction and GPU workload mapping. Am I wrong?

Possibly yeah.

Or the per-pixel coord atomic I guess?

Right, that part seems to be based on Schütz et al. 2021 https://arxiv.org/abs/2104.07526

Monte Carlo in 3dgs is established enough that Spark [1] has been doing it for a while in the browser.

https://github.com/sparkjsdev/spark

Cannot find anything related to Monte Carlo methods in the source code. I thought Spark implemented a conventional 3DGS pipeline with LoD optimizations (And it seems they do the sorting on the CPU using Rust/WebAssembly because of WebGL limitations)

that goes all the way back to the Kajiya rendering equation https://en.wikipedia.org/wiki/Rendering_equation

Sorting the gaussians is the compute heavy part in gaussian splatting. So, Im guessing this will give only marginal improvement in terms rendering speed.

I'm not sure it does a sort. Each group of threads only handles a select number of gaussians

Yea, I think avoiding sorting is kinda the whole point here

Their point splatting method is orthogonal to level-of-detail rendering (they reference a few papers which try to do this), so both point splatting and LoD could be combined in the future for an even greater performance gain during rendering. They already implement occlusion and frustum culling.

Point splatting does introduce a lot of noise though, and their denoiser introduces ghosting, but they say a more sophisticated denoiser would give considerably better quality.

> millions of threads

Really?! What OSs can handle that many native threads?

Also, this seems quite similar to stochastic progressive drawing of pointclouds for realtime that has been done for > 15 years in the VFX industry with GPU shaders in a tiled/bucketed fashion, unless this isn't progressive maybe? (The fact it's been accepted for Siggraph likely indicates it's slightly different).

I believe they mean GPU threads. Plenty of cuda files in their repository.

Fair enough, but that's then only absolutely max 1024 threads per SM, which wouldn't get anywhere near 1 million, given 5090 only has 192 SMs...

Future proofing I guess...

You can launch much more logical threads than the available physical threads. The GPU scheduler will automatically dispatch the work to the SMs.

Just like 2 threads can execute on the same core at the "same" time, i.e. no synchronization, the same is true for GPU threads/ thread groups.

I guess they never say that they execute at the same time technically haha

Video overview of the technology: https://www.youtube.com/watch?v=X8yRlA7jqEQ

Ordinarily I don't prefer video, but the visuals are helpful here.

Also, an online interactive, but it seems to only work in Chrome: https://superspl.at/scene/ff1d0393