Wild3D

3D Modeling, Reconstruction, and Generation in the Wild

in conjunction with ICCV 2025, Honolulu, Hawaii, United States.

Time: October 20th, 2025. Location: Room 312.


Speakers Schedule Related Workshop Call For Paper Organizers Previous offerings

Overview

The goal of this workshop is to bring together researchers and practitioners interested in modeling, reconstructing, or generating (dynamic) 3D objects/scenes in challenging, in-the-wild settings. With recent advances in 3D learning, the widespread availability of 2D and 3D visual data, and the predominance of image/video generative models, we believe now is a pivotal moment to tackle these challenges and make 3D vision more robust, accessible, and cost-effective. By fostering communication and highlighting important work in these areas, we hope to inspire new research topics and breakthroughs. Additionally, given recent advances in video generative models and dynamics modeling, we strongly encourage contributions not only from standard 3D topics but also from broader 4D-related directions.

Invited Speakers

Anpei Chen

Westlake University

Anpei Chen is an Assistant Professor and the head of the Inception3D Lab at Westlake University. His research lies at the intersection of computer graphics and computer vision, focusing on visual representation, content generation, and spatial intelligence, with the goal of efficiently understanding 3D from 2D observations.

Angela Dai

Technical University of Munich

Angela Dai is an Associate Professor at the Technical University of Munich where she leads the 3D AI Lab. Her research focuses on understanding how real-world 3D scenes around us can be modeled and semantically understood. Her research has been recognized through an ECVA Young Researcher Award, ERC Starting Grant, Eurographics Young Researcher Award, German Pattern Recognition Award, Google Research Scholar Award, and an ACM SIGGRAPH Outstanding Doctoral Dissertation Honorable Mention.

Andrea Vedaldi

University of Oxford

Andrea Vedaldi is a Professor of Computer Vision and Machine Learning and a co-lead of the VGG group at the Engineering Science department of the University of Oxford. His research focuses on developing computer vision and machine learning methods to understand the content of images and videos automatically, with little to no manual supervision, in terms of semantics and 3D geometry.

Georgia Gkioxari

Caltech

Georgia Gkioxari is an Assistant Professor at Caltech and a Hurt Scholar. The goal of her work is to design advanced visual perception models that extend the boundaries of current visual capabilities. Her research explores new spatial tasks and visual representations that transform images into 2D and 3D outputs.

Jun Gao

University of Michigan, NVIDIA

Jun Gao is an Assistant Professor at the University of Michigan as well as a senior research scientist at NVIDIA. His research stays at the intersection of 3D computer vision, computer graphics and generative models. He is interested in developing controllable generative AI models to create photorealitic, diverse and interactive virtual environments.

Noah Snavely

Cornell Tech, Google DeepMind

Noah Snavely is a Professor of Computer Science at Cornell Tech, specializing in computer vision and computer graphics, and a member of the Cornell Graphics and Vision Group. He also works at Google Research in NYC. His research focuses on computer vision and graphics, particularly on 3D understanding and depiction of scenes from images.

Qianqian Wang

Harvard University

Qianqian Wang is an incoming Assistant Professor at Harvard University and the Kempner Institute. Her research focuses on understanding and modeling the dynamic 3D world from everyday images and videos. Her long-term goal is to build intelligent systems that can perceive, understand and continually learn from the ever-changing physical world.

Schedule

09:00 - 09:10 Opening Remarks
09:10 - 09:45 Invited Talk 1 Qianqian Wang (Harvard University)
09:45 - 10:20 Invited Talk 2 Anpei Chen (Westlake University)
10:20 - 11:00 Coffee Break + Poster Session
11:00 - 11:15 Spotlight Presentation 1 Hanwen Jiang (Adobe Research)
11:15 - 11:50 Invited Talk 3 Noah Snavely (Cornell Tech & Google DeepMind)
11:50 - 13:30 Lunch Break
13:30 - 14:05 Invited Talk 4 Andrea Vedaldi (University of Oxford)
14:05 - 14:40 Invited Talk 5 Angela Dai (Technical University of Munich)
14:40 - 14:55 Spotlight Presentation 2 Chih-Hao Lin (UIUC)
14:55 - 15:15 Coffee Break
15:15 - 15:50 Invited Talk 6 Jun Gao (University of Michigan & NVIDIA)
15:50 - 16:25 Invited Talk 7 Georgia Gkioxari (Caltech)
16:25 - 16:35 Closing Remarks

Call For Paper

We accept either 4-page extended abstracts or 8-page full papers submissions, excluding reference. The workshop papers are non-archival and we welcome submissions that were already submitted/accepted to other venues or the ICCV main conference. All submissions should follow the ICCV 2025 author guidelines.
  • Submission Portal: OpenReview
  • Paper Submission Deadline: September 1, 2025, 23:59:59 PST
  • Notification to Authors: September 12, 2025
  • Camera-ready submission: September 19, 2025
Accepted papers will be invited for poster/oral presentation and will be displayed on the workshop website.


Topics of Interest

  • Data and Modality: What type of data provides the most useful information for (dynamic) 3D modeling? Do we need explicit 3D data or is video data sufficient? What is currently lacking in this area? What datasets and benchmarks are crucial to validate the effectiveness of 3D/4D algorithms in the wild?
  • Alignment: How can we align observations that exhibit significant variations in appearance, motion (articulation), lighting, contents, and viewpoints? How can we register images or videos with little or no overlap?
  • Modeling: How can we construct accurate 3D models from sparse, noisy, incomplete, or dynamic observations?
  • Representation: What are the most suitable representations for 3D modeling and reasoning? Do we truly need explicit 3D representations, or could view synthesis and video generative models be sufficient?
  • Knowledge and Reasoning: How can we represent, learn, and encode commonsense knowledge of 3D objects and scenes -- such as part structures, articulations, physical stability, and affordances -- and leverage it for various 3D tasks, including reasoning, dynamic modeling, reconstruction, and generation?
  • 4D (Dynamic 3D): What is the best way to represent and model the dynamic 3D world? What priors are critical for its success? How can we improve 3D understanding via 4D modeling?
  • Risks and ethical considerations: How can we mitigate the risks of these robust 3D modeling and reasoning techniques? How do we address relevant ethical questions, such as invasion of privacy and spreading misinformation.
  • Applications: What new applications can be unlocked by developing more robust 3D algorithms, and what modifications are needed? For example, how can we adapt existing (dynamic) 3D modeling techniques to better support robots operating in challenging environments? How can we leverage 3D priors learned from images to enable photorealistic content creation? How can we build on video foundation models to enhance our 3D understanding of the world? Are there other exciting applications for in-the-wild 3D modeling for domains such as construction, agriculture, and remote sensing?

Accepted Papers / Extended Abstracts