3D-in-the-Wild

Overview

The goal of this workshop is to bring together researchers and practitioners interested in modeling, reconstructing, or generating (dynamic) 3D objects/scenes in challenging, in-the-wild settings. With recent advances in 3D learning, the widespread availability of 2D and 3D visual data, and the predominance of image/video generative models, we believe now is a pivotal moment to tackle these challenges and make 3D vision more robust, accessible, and cost-effective. By fostering communication and highlighting important work in these areas, we hope to inspire new research topics and breakthroughs. Additionally, given recent advances in video generative models and dynamics modeling, we strongly encourage contributions not only from standard 3D topics but also from broader 4D-related directions.

Invited Speakers (more coming soon...)

Angela Dai

Technical University of Munich

Angela Dai is an Associate Professor at the Technical University of Munich where she leads the 3D AI Lab. Her research focuses on understanding how real-world 3D scenes around us can be modeled and semantically understood. Her research has been recognized through an ECVA Young Researcher Award, ERC Starting Grant, Eurographics Young Researcher Award, German Pattern Recognition Award, Google Research Scholar Award, and an ACM SIGGRAPH Outstanding Doctoral Dissertation Honorable Mention.

Andrea Vedaldi

University of Oxford

Andrea Vedaldi is a Professor of Computer Vision and Machine Learning and a co-lead of the VGG group at the Engineering Science department of the University of Oxford. His research focuses on developing computer vision and machine learning methods to understand the content of images and videos automatically, with little to no manual supervision, in terms of semantics and 3D geometry.

Georgia Gkioxari

Caltech

Georgia Gkioxari is an Assistant Professor at Caltech and a Hurt Scholar. The goal of her work is to design advanced visual perception models that extend the boundaries of current visual capabilities. Her research explores new spatial tasks and visual representations that transform images into 2D and 3D outputs.

Noah Snavely

Cornell Tech, Google Research

Noah Snavely is a Professor of Computer Science at Cornell Tech, specializing in computer vision and computer graphics, and a member of the Cornell Graphics and Vision Group. He also works at Google Research in NYC. His research focuses on computer vision and graphics, particularly on 3D understanding and depiction of scenes from images.

Vincent Sitzmann

Massachusetts Institute of Technology

Vincent Sitzmann is an Assistant Professor at MIT. His interest lies in building AI that perceives and models the world the way that humans do (e.g., perceiving 3D, semantics, etc). Specifically, he works towards models that can learn to reconstruct a rich state description of their environment, such as reconstructing its 3D structure, materials, semantics, etc. from vision. These models should also be able to model the impact of their own actions on that environment, i.e., learn a "mental simulator" or "world model".

Spotlight Speakers

Qianqian Wang

UC Berkeley

Qianqian Wang is a Postdoctoral scholar at University of California, Berkeley. Her research focuses on understanding the dynamic 3D world from everyday images and videos.

Hanwen Jiang

UT Austin

Hanwen Jiang is a Research Scientist at Adobe Research. He is interested in building scalable learning paradigms and models for spatially grounded intelligence.

Chih-Hao Lin

UIUC

Chih-Hao Lin is a PhD student at the University of Illinois Urbana-Champaign. His research interests lie at the intersection of 3D vision, neural rendering, and simulation.

Lea Müller

UC Berkeley

Lea Müller is a Postdoctoral Researcher at University of California, Berkeley. She specializes in understanding and reconstructing physical contact humans make with themselves and with other people during interaction.

Schedule

Coming soon...

Call For Paper

We accept either 4-page extended abstracts or 8-page full papers submissions, excluding reference. The workshop papers are non-archival and we welcome submissions that were already submitted/accepted to other venues or the ICCV main conference. All submissions should follow the ICCV 2025 author guidelines.

Submission Portal: OpenReview
Paper Submission Deadline: September 1, 2025, 23:59:59 PST
Notification to Authors: September 12, 2025
Camera-ready submission: September 19, 2025

Accepted papers will be invited for poster/oral presentation and will be displayed on the workshop website.

Topics of Interest

Data and Modality: What type of data provides the most useful information for (dynamic) 3D modeling? Do we need explicit 3D data or is video data sufficient? What is currently lacking in this area? What datasets and benchmarks are crucial to validate the effectiveness of 3D/4D algorithms in the wild?
Alignment: How can we align observations that exhibit significant variations in appearance, motion (articulation), lighting, contents, and viewpoints? How can we register images or videos with little or no overlap?
Modeling: How can we construct accurate 3D models from sparse, noisy, incomplete, or dynamic observations?
Representation: What are the most suitable representations for 3D modeling and reasoning? Do we truly need explicit 3D representations, or could view synthesis and video generative models be sufficient?
Knowledge and Reasoning: How can we represent, learn, and encode commonsense knowledge of 3D objects and scenes -- such as part structures, articulations, physical stability, and affordances -- and leverage it for various 3D tasks, including reasoning, dynamic modeling, reconstruction, and generation?
4D (Dynamic 3D): What is the best way to represent and model the dynamic 3D world? What priors are critical for its success? How can we improve 3D understanding via 4D modeling?
Risks and ethical considerations: How can we mitigate the risks of these robust 3D modeling and reasoning techniques? How do we address relevant ethical questions, such as invasion of privacy and spreading misinformation.
Applications: What new applications can be unlocked by developing more robust 3D algorithms, and what modifications are needed? For example, how can we adapt existing (dynamic) 3D modeling techniques to better support robots operating in challenging environments? How can we leverage 3D priors learned from images to enable photorealistic content creation? How can we build on video foundation models to enhance our 3D understanding of the world? Are there other exciting applications for in-the-wild 3D modeling for domains such as construction, agriculture, and remote sensing?