If you haven't tried AI modeling pipelines in the last year you'll be surprised.
The star of the show here is https://platform.worldlabs.ai/ (author works there, I don't) which is really good. There's also Meshy.ai (which this repo doesn't seem to use?) for non-scene stuff that's right up there in quality. There's texturing, auto-rigging, etc.
The latest VLLM models have true pixel image grounding which means you can totally ask your AI about pixel coordinates of things, so you get 3d perception for edits and anything else you need.
I'm actually surprised I don't see this stuff being used more; I think it's because most pipelines are hard-baked with assumption that your 3D assets are files you get from an artist, not something you can imagine up in minutes in a script. The technology is moving faster than the industry can keep up with.
> I'm actually surprised I don't see this stuff being used more;
There's very little incentive to publicly admit you're using this tech. In fact there are a lot of reasons not to.
What's the best option for converting house blueprints or 3D rendered images back to models?
This is cool as hell.
I remember like seventeen years years ago, Microsoft had "PhotoSynth", which would make 3D environments based on a bunch of images, and seventeen-year-old-tombert thought it was one of the most amazing things to ever be done on a computer.
Doing this with just one image makes this at least an order of magnitude cooler. I will be playing with this over the weekend.
Photosynth was awesome, I really miss it, but it was more of a panorama tool than a 3d environment.
My pixel6 has a photo sphere mode on the camera which is the same thing
You could actually make it have a rough 3D environment as well. Their demo had a model of Piazza San Marco with dots to estimate the actual buildings and the like.
Oh yes, I remember that now!
I see it used worldlabs, i’ve tested it quite a bit and no results were not really that usable, it hallucinated so many parts outside of the wall that made no sense. He will be fine if hallucinated and it made sense but if it doesn’t make sense, I’m not sure what the point of inputting a single image is. I’ve actually found better luck using gpt image 2 instead.
So Blade Runner's Esper photo analysis went from ruining the suspension of disbelief to reality quicker then most magic.
Well, in blade runner he looks around a corner and zooms in microscopic detail on something not visible from the photo.
But the esper interface is all voice activated, and doesn't talk back - which I think is very prescient, and more likely the way things will go. I'd much rather voice assistants just did the thing that I want them to do rather than talk back to me
I've never forgotten this SIGGRAPH demo from almost twenty years ago, in which the authors effectively switch camera and light source computationally (... in a static scene) [1]
Ever since then, I have viewed scenes such as the "lingerie store scene" from Enemy of the State [2] with a little bit less eye rolling...
I went to high school with the sales clerk, Ivana Miličević.
It's always weird to see her in stuff.
I’m ready to make a game with this, or something similar. Open to suggestions on tooling and asset pipelines that utilize AI, if anyone has any suggestions or guides.
If you haven't tried AI modeling pipelines in the last year you'll be surprised.
The star of the show here is https://platform.worldlabs.ai/ (author works there, I don't) which is really good. There's also Meshy.ai (which this repo doesn't seem to use?) for non-scene stuff that's right up there in quality. There's texturing, auto-rigging, etc.
The latest VLLM models have true pixel image grounding which means you can totally ask your AI about pixel coordinates of things, so you get 3d perception for edits and anything else you need.
I'm actually surprised I don't see this stuff being used more; I think it's because most pipelines are hard-baked with assumption that your 3D assets are files you get from an artist, not something you can imagine up in minutes in a script. The technology is moving faster than the industry can keep up with.
> I'm actually surprised I don't see this stuff being used more;
There's very little incentive to publicly admit you're using this tech. In fact there are a lot of reasons not to.
What's the best option for converting house blueprints or 3D rendered images back to models?
This is cool as hell.
I remember like seventeen years years ago, Microsoft had "PhotoSynth", which would make 3D environments based on a bunch of images, and seventeen-year-old-tombert thought it was one of the most amazing things to ever be done on a computer.
Doing this with just one image makes this at least an order of magnitude cooler. I will be playing with this over the weekend.
Photosynth was awesome, I really miss it, but it was more of a panorama tool than a 3d environment.
My pixel6 has a photo sphere mode on the camera which is the same thing
You could actually make it have a rough 3D environment as well. Their demo had a model of Piazza San Marco with dots to estimate the actual buildings and the like.
Oh yes, I remember that now!
I see it used worldlabs, i’ve tested it quite a bit and no results were not really that usable, it hallucinated so many parts outside of the wall that made no sense. He will be fine if hallucinated and it made sense but if it doesn’t make sense, I’m not sure what the point of inputting a single image is. I’ve actually found better luck using gpt image 2 instead.
So Blade Runner's Esper photo analysis went from ruining the suspension of disbelief to reality quicker then most magic.
Well, in blade runner he looks around a corner and zooms in microscopic detail on something not visible from the photo.
But the esper interface is all voice activated, and doesn't talk back - which I think is very prescient, and more likely the way things will go. I'd much rather voice assistants just did the thing that I want them to do rather than talk back to me
I've never forgotten this SIGGRAPH demo from almost twenty years ago, in which the authors effectively switch camera and light source computationally (... in a static scene) [1]
Ever since then, I have viewed scenes such as the "lingerie store scene" from Enemy of the State [2] with a little bit less eye rolling...
[1] - https://www.youtube.com/watch?v=p5_tpq5ejFQ
[2] - https://youtu.be/3EwZQddc3kY?t=6
I went to high school with the sales clerk, Ivana Miličević.
It's always weird to see her in stuff.
I’m ready to make a game with this, or something similar. Open to suggestions on tooling and asset pipelines that utilize AI, if anyone has any suggestions or guides.