REST3D research resource · physically stable 3D scene reconstruction

REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

REST3D turns a single casual RGB image into a visually consistent and physically stable interactive 3D scene. REST3D focuses on simulation-ready digital assets, scene-tree reasoning, physics-constrained refinement, Isaac Gym simulation, and VR human-object interaction.

Authors: Xiaoxuan Ma · Jiashun Wang · Nicolás Ugrinovic · Yehonathan Litman · Kris Kitani Carnegie Mellon University arXiv:2605.30338 · 2026 Code status: coming soon
1
single RGB image

REST3D starts from one casual image, not a dense scan or multi-view capture.

3
pipeline stages

Scene-tree construction, scene initialization/canonicalization, and physics-constrained optimization.

95%+
reported stable rates

REST3D reports high stability on Replica, ScanNet++, and Custom scene sets.

VR
real interaction demo

REST3D demonstrates hand-based human-object interaction with Meta Quest Pro and Isaac Gym.

01 · What is REST3D?

REST3D is single-image 3D reconstruction built for physical stability, not just visual plausibility.

REST3D means REconstructing physically STable 3D scenes. The central idea is brutal and simple: a 3D scene that looks correct is not enough if objects float, intersect, explode under gravity, or collapse in a simulator. REST3D uses physical scene understanding and physics-constrained optimization so the reconstructed scene can behave like a usable digital asset.

02 · REST3D audience demand

REST3D searchers want proof, demos, code status, metrics, and a fast explanation of why physical stability matters.

Researchers

They look for the REST3D paper, method diagram, baselines, datasets, metrics, ablations, limitations, BibTeX, and reproducibility notes.

3D / VR / game creators

They care about whether REST3D can convert a casual image into simulation-ready assets, stable layouts, VR demos, and interactive scenes.

Robotics and embodied AI builders

They care about gravity, support relations, collision rate, stable rate, real-to-sim, Isaac Gym, object contact, and reliable manipulation scenes.

03 · REST3D abstract

REST3D abstract: Reconstructing Physically Stable 3D Scenes from a Single Image

Reconstructing physically stable 3D scenes from a single RGB image enables casual images to be converted into simulation-ready digital assets for applications such as immersive interaction and content creation. However, existing single-image reconstruction methods fall short in capturing the physical structure of a scene. As a result, they often produce geometrically plausible but physically inconsistent results, including object floating and penetration, which lead to unstable behavior in physics simulations. Image-conditioned scene generation methods improve physical plausibility but often rely on strong scene priors, yielding plausible yet inaccurate object arrangements that fail to match the input image. We propose REST3D, a single-image reconstruction framework that can REconstruct physically STable 3D scenes by integrating physical scene understanding with physics-constrained refinement. We first introduce an agentic physical scene understanding technique that constructs a scene-tree representation capturing object physical states and inter-object relationships from a gravity-support perspective, providing a structural prior for reconstruction. Leveraging this structure, we initialize the scene using image-to-3D models, followed by scene-tree-guided alignment and physics-constrained optimization to resolve physical violations while preserving visual consistency with the input image. Experiments show that our method significantly reduces physical errors and improves simulation stability on both synthetic and real-world datasets while maintaining strong reconstruction quality. We further demonstrate the reconstructed scenes in VR-based human-object interaction, showing their potential for immersive applications.

04 · REST3D TL;DR

From a single casual image to a visually consistent and physically stable interactive 3D scene.

This is the REST3D promise in one sentence: the scene should not merely look plausible; it should settle, support objects, avoid severe interpenetration, and survive physics simulation.
05 · Why REST3D exists

Single-image 3D reconstruction often fails when gravity asks the obvious question: should this object stand, fall, or explode?

Object floating

Visually plausible reconstructions can place objects above their support surfaces. REST3D targets support consistency.

Object penetration

Objects may overlap in 3D. Under physics, that collision can cause explosive separation and unstable behavior.

Plausible but inaccurate generation

Image-conditioned scene generation can produce a physically plausible scene that does not match the input image. REST3D is designed to preserve visual consistency.

06 · REST3D pipeline overview

REST3D combines scene-tree construction, scene-tree-guided alignment, and physics-constrained optimization.

Scene-Tree Construction

REST3D infers a hierarchical scene tree that captures objects, physical states, and inter-object spatial support relationships from a gravity-support perspective.

Scene Initialization and Canonicalization

REST3D initializes object meshes using image-to-3D models, then uses the scene tree to correct global orientation and enforce coarse support constraints.

Physics-Constrained Optimization

REST3D refines object poses through simulation-based optimization to reduce floating, penetration, drift, and instability while preserving the input image layout.

07 · Scene-tree construction keyword cluster

REST3D scene-tree construction models gravity-support relationships: ground, wall, ceiling, and ground-wall.

A REST3D scene tree is not a decorative hierarchy. It is the structural prior that says which object supports which object: table on ground, plant on table, poster attached to wall, radiator supported by ground-wall. This is the hidden skeleton that lets REST3D keep visual reconstruction and physical behavior aligned.

REST3D scene treesupport relationonhangingattached togroundwallceilingground-wall
08 · Agentic physical scene understanding

REST3D uses agentic physical scene understanding to identify objects, segment instances, and reason about spatial support.

Open-vocabulary object list analysis

REST3D asks a vision-language model to identify distinct objects with descriptive attributes, not just coarse labels.

Agentic instance segmentation

REST3D uses a segmentation agent and verifier loop to refine prompts and masks for each object instance.

Spatial relationship reasoning

REST3D infers support parents and support types from a gravity-aware perspective.

09 · Scene initialization and canonicalization

REST3D initializes the 3D scene with image-to-3D models, then canonicalizes the layout so physics has a fighting chance.

REST3D starts with raw image-to-3D output, then uses the scene tree to correct coarse orientation, enforce support, and produce a structured initial scene. Canonicalization alone is not enough; it improves stability but still needs the full REST3D physics-constrained optimization stage.

10 · Physics-constrained optimization

REST3D physics-constrained optimization resolves physical violations while preserving visual consistency.

Local group optimization

REST3D decomposes complex scenes according to the scene tree and optimizes smaller support groups so crowded scenes can converge more reliably.

Global group optimization

REST3D then refines the whole scene to reduce collision, drift, velocity, and instability under simulated gravity.

11 · Single image to simulation-ready digital assets

REST3D targets simulation-ready digital assets for immersive interaction, content creation, gaming, and embodied AI.

The high-value promise of REST3D is practical conversion: one casual image becomes a 3D scene with object meshes and world-frame layout that can be imported into physics simulation. For users searching REST3D, the phrase simulation-ready digital assets should appear early and repeatedly because that is the real difference from ordinary image-to-3D reconstruction.

12 · Interactive 3D Physics Simulation

REST3D Interactive 3D Physics Simulation in Isaac Gym

Explore the physics simulation of reconstructed scenes in Isaac Gym. Users can rotate by dragging, zoom by scrolling, inspect simulation, press Play, Reset, adjust Speed, and compare synchronized methods.

Controls included

▶ Play · ↻ Reset · Speed · Run Simulation · Click or press Space · Loading scene...

Methods included

Input Image · Ours · DigitalCousins · Gen3DSR · SceneGen · SAM3D.

PaintingSimpson RoomScanNet++ 1ScanNet++ 2Room 1Room 4Room 5WorldLabReplica
13 · Baseline failure mode

REST3D highlights why baseline methods can explosively separate when gravity is applied.

Due to object interpenetration in baseline methods, applying gravity in a physics simulator can cause objects to explosively separate and become unstable. REST3D is built around the opposite expectation: reconstructed scenes should quickly settle into stable states.

14 · REST3D Our Results

REST3D results show high-resolution physics simulation of reconstructed scenes in Isaac Gym.

Objects are placed sequentially for clarity and then simulated jointly. REST3D reconstructed scenes are simulation-ready and quickly settle into stable states.

Simpson RoomRoom 0Room 1Room 2ReplicaScanNet++
15 · REST3D Real-world VR Interaction

REST3D reconstructs an immersive and physically grounded 3D scene for VR hand-based interaction.

REST3D includes an interactive VR system that reconstructs an immersive, physically grounded 3D scene from a single image, enabling users to naturally interact with stable virtual objects through hand-based interactions. The demo was recorded with Meta Quest Pro and played back at 3× speed.

In the paper, hand motions are tracked and mapped to a dexterous robotic hand in Isaac Gym, with the simulation rendered back to a VR headset.
16 · REST3D Comparison with SOTA Methods

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

The REST3D comparison focuses on physics simulation of reconstructed scenes in Isaac Gym. Existing methods struggle to balance reconstruction fidelity and physical stability, while REST3D produces stable, simulation-ready scenes that settle with only minor adjustments.

PaintingSimpson RoomRoom 1Room 2Room 3Room 4Room 5Room 6WorldLab 1WorldLab 2ReplicaScanNet++ 1ScanNet++ 2ScanNet++ 3
17 · REST3D physical metrics

REST3D metric snapshot: low collision, high stability, low drift.

DatasetMethodFailure RateCollision RateStable RatePosition DriftLinear VelocityAngular Velocity
ReplicaREST3D / Ours0.0%0.0%95.8%0.094 m0.152 m/s0.557 rad/s
ScanNet++REST3D / Ours0.0%5.9%93.6%0.080 m0.159 m/s1.039 rad/s
CustomREST3D / Ours0.0%1.2%95.5%0.017 m0.140 m/s0.468 rad/s
18 · REST3D datasets

REST3D is evaluated on synthetic Replica, real-world ScanNet++, and a challenging Custom set.

Replica

A synthetic dataset with ground-truth scene meshes, used for physical metrics and geometric metrics.

ScanNet++

A real-world dataset covering scenes such as meeting rooms, classrooms, and offices.

Custom casual images

A harder set including bedrooms, living rooms, and cartoon-style scenes to test REST3D robustness.

19 · REST3D evaluation metrics

REST3D reports physical plausibility and geometric reconstruction quality.

Physical metrics

Failure rate, collision rate, stability rate, position drift, peak linear velocity, and peak angular velocity.

Geometric metrics

Chamfer Distance, [email protected], and B-IoU are used when ground-truth meshes exist.

Alignment

Replica and ScanNet++ reconstructions are aligned to ground truth with ICP before geometric evaluation.

20 · REST3D vs DigitalCousins

REST3D differs from DigitalCousins by emphasizing input-faithful reconstruction plus physics stability.

DigitalCousins-style approaches can improve physical plausibility by retrieving and assembling 3D assets, but retrieval can be constrained by the asset database and may yield mismatched objects. REST3D instead uses image-to-3D priors and physics-constrained refinement to preserve visual consistency while reducing physical errors.

21 · REST3D vs Gen3DSR

REST3D targets physical stability beyond divide-and-conquer scene reconstruction.

Gen3DSR is a strong single-view 3D scene reconstruction baseline. REST3D compares to Gen3DSR and focuses on the failure mode that matters in simulation: a scene can be reconstructed but still physically unstable under gravity.

22 · REST3D vs SceneGen

REST3D prioritizes physical consistency with the observed image, while scene generation can trade accuracy for plausibility.

SceneGen-style methods synthesize multiple 3D assets and positions from a single scene image. REST3D argues that generation priors can be physically plausible yet inaccurate relative to the input. REST3D is framed as reconstruction: match the image and obey physics.

23 · REST3D vs SAM3D

REST3D pushes beyond object-level reconstruction toward scene-level physical validity.

SAM3D can recover high-fidelity individual objects, but scene-level reconstruction also needs global orientation, wall attachment, support, collision handling, and stable contacts. REST3D explicitly focuses on those scene-level physical constraints.

24 · REST3D use cases

REST3D use cases cluster around interactive 3D, VR, game content, robotics, and real-to-sim.

Content creation and gaming

REST3D can become a reference point for converting casual images into stable, editable, simulation-ready scenes for immersive production.

25 · REST3D tools users expect

REST3D.org should make every high-intent tool one click away.

Paper tools

arXiv, abstract, citation, BibTeX, author links, project page, publication status, and release notes.

Demo tools

Interactive 3D, Play, Reset, Speed, synchronized method comparison, high-resolution videos, and VR demos.

Reproducibility tools

GitHub repository, code status, datasets, baselines, metrics, implementation details, limitations, and future work.

26 · REST3D SEO keyword map

REST3D keyword clusters for headings, internal anchors, meta tags, and long-tail search coverage.

Core keyword

REST3D, REST3D.org, REST3D paper, REST3D arXiv, REST3D code, REST3D GitHub, REST3D demo, REST3D citation.

Long-tail keywords

REST3D reconstructing physically stable 3D scenes from a single image; REST3D single RGB image to simulation-ready 3D assets; REST3D physics-constrained optimization; REST3D scene-tree construction.

Surrounding keywords

single image 3D reconstruction, physically plausible 3D scene, Isaac Gym 3D simulation, VR human-object interaction, DigitalCousins, Gen3DSR, SceneGen, SAM3D, object penetration, object floating.

27 · REST3D visual system

REST3D.org uses a dark lab-grade palette with cyan, violet, and green accents for AI, 3D, simulation, and VR audiences.

The REST3D audience expects a technical research interface, not a lifestyle landing page. This design uses deep navy for spatial depth, cyan for reconstruction and links, violet for generative AI cues, and green for stable physics. The page also includes a light theme toggle and high-contrast text for readability.

Deep navy

#07111f background for 3D depth.

Cyan

#43e6ff for REST3D links and technical highlights.

Violet

#a98bff for AI and scene generation accents.

Green

#83f7bd for stable, settled, simulation-ready cues.

28 · REST3D limitations and future directions

REST3D is strong, but not magic: VLM robustness and deformable objects remain future-work territory.

REST3D relies on the robustness of vision-language models for physical scene understanding and may fail in challenging cases. The current REST3D paper focuses on rigid objects and does not explicitly model deformable or non-rigid objects, leaving those cases for future work.

29 · REST3D code and reproducibility status

REST3D code repository exists, but the public README currently says code coming soon.

Do not oversell the implementation. Link to GitHub, invite users to star/watch the repository, and clearly say that the code release should be checked there.
30 · REST3D Citation

Cite REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

@article{ma2026rest3d,
  title     = {REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image},
  author    = {Ma, Xiaoxuan and Wang, Jiashun and Ugrinovic, Nicol'{a}s and Litman, Yehonathan and Kitani, Kris},
  booktitle = {arXiv preprint arXiv:2605.30338},
  year      = {2026}
}
31 · REST3D acknowledgement

REST3D acknowledgement

The authors would like to thank Yuxuan Kuang, Yufei Wang, and Maxwell Jones for their insightful discussions.

32 · REST3D FAQ

REST3D FAQ for searchers, researchers, creators, and builders.

What is REST3D?

REST3D is a single-image reconstruction framework that reconstructs physically stable 3D scenes by integrating physical scene understanding with physics-constrained refinement.

What does REST3D stand for?

REST3D expands as REconstructing physically STable 3D scenes.

What is the main REST3D difference from ordinary image-to-3D?

REST3D focuses on scene-level physical plausibility: support relations, collision reduction, stability under gravity, and simulation-ready behavior.

Is REST3D code available?

The REST3D GitHub repository exists, but the current public README says Code coming soon. This site links to the repository without claiming a released implementation.

What are the REST3D baselines?

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

What are the REST3D datasets?

REST3D evaluates on Replica, ScanNet++, and a Custom set with casual images including bedrooms, living rooms, and cartoon-style scenes.

What are the REST3D applications?

REST3D is relevant to immersive interaction, content creation, gaming, simulation-ready assets, robotics, embodied AI, real-to-sim, and VR human-object interaction.

33 · REST3D glossary

REST3D glossary: terms users search after they understand the headline.

Scene tree

A support-relation structure that represents which objects are on, hanging from, or attached to other objects or surfaces.

Physics-constrained optimization

Simulation-based refinement that moves object poses toward stable, low-collision configurations while preserving the input layout.

Stable rate

A physical plausibility metric indicating whether reconstructed scenes settle into stable states under simulation.

34 · REST3D primary sources

REST3D primary links