Skip to content

Nvidia’s Magic3D creates 3D models of written descriptions, thanks to AI

    A poison dart frog rendered as a 3D model by Magic3D.
    Enlarge / A poison dart frog rendered as a 3D model by Magic3D.

    Nvidia

    On Friday, researchers at Nvidia announced Magic3D, an AI model that can generate 3D models from text descriptions. After entering a prompt such as “A blue poison dart frog sitting on a water lily”, Magic3D generates a 3D mesh model complete with colored texture in about 40 minutes. With modifications, the resulting model can be used in video games or CGI art scenes.

    In its academic paper, Nvidia describes Magic3D as a response to DreamFusion, a text-to-3D model that Google researchers announced in September. Similar to how DreamFusion uses a text-to-image model to generate a 2D image that is then optimized in Neural Radiance Field (Neural Radiance Field) volumetric data, Magic3D uses a two-stage process that uses a coarse model that is generated in low resolution and optimizes it to a higher resolution. According to the authors of the article, the resulting Magic3D method can generate 3D objects two times faster than DreamFusion.

    Magic3D can also perform prompt-based editing of 3D meshes. Given a low-resolution 3D model and a basic prompt, it is possible to change the text to change the resulting model. Also, the Magic3D authors demonstrate preserving the same subject over multiple generations (a concept often referred to as coherence) and applying the style of a 2D image (such as a Cubist painting) to a 3D model.

    Nvidia has not released any Magic3D code along with its academic paper.

    The ability to generate 3D from text feels like a natural evolution in today’s diffusion models, which use neural networks to synthesize new content after intense training on a batch of data. In 2022 alone, we’ve seen the emergence of capable text-to-image models like DALL-E and Stable Diffusion and rudimentary text-to-video generators from Google and Meta. Google also debuted the aforementioned DreamFusion text-to-3D model two months ago, and people have since adapted similar techniques to work with it as an open source model based on Stable Diffusion.

    As for Magic3D, the researchers hope it will allow anyone to create 3D models without special training. Once refined, the resulting technology could accelerate video game (and VR) development, perhaps eventually finding applications in special effects for film and TV. Towards the end of their paper, they write, “We hope that with Magic3D we can democratize 3D synthesis and open up everyone’s creativity in creating 3D content.”