3D-Consistent Multi-View Editing by Diffusion Guidance

1Chalmers University of Technology   2Korea University  

Abstract

Recent advancements in diffusion models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian Splat models.

We propose a training-free diffusion framework that enforces multi-view consistency during the image editing process. The key assumption is that corresponding points in the unedited images should undergo similar transformations after editing. To achieve this, we introduce a consistency loss that guides the diffusion sampling toward coherent edits. The framework is flexible and can be combined with widely varying image editing methods, supporting both dense and sparse multi-view editing setups. Experimental results show that our approach significantly improves 3D consistency compared to existing multi-view editing methods. We also show that this increased consistency enables high-quality Gaussian Splat editing with sharp details and strong fidelity to user-specified text prompts.

Multi-view Consistent Editing

Image editing methods applied independently to multi-view images often produce inconsistent edits across views. Our method improves multi-view consistency by guiding the diffusion process to generate edited images with improved consistency.

We show that our method gives edited images with improved multi-view consistency compared to other state-of-the-art 3D editing methods (EditSplat and DGE).

Method

Our method guides the diffusion editing process to improve milti-view consistency.

Given a set of input images, each view is edited sequentially by guiding the diffusion process based on the previously edited images. The guidance is based on the assumption that matching points in the unedited images should be edited similarly. During the diffusion process the noise estimate $\epsilon(z_t,t)$ is modified according to a consistency loss $\mathcal{L}_c$ resulting in multi-view consistent edits.

Sparse Editing

Our method can also be utilized for sparse-view editing.

Our method can easily be combined with different 2D Editing methods

We here show additional results combining our method with pix2pix-turbo, a editing method based on one-step diffusion.

BibTeX

If you use this work or find it helpful, please consider citing:

@misc{bengtson20253dconsistentmultivieweditingdiffusion,
title={3D-Consistent Multi-View Editing by Diffusion Guidance}, 
author={Josef Bengtson and David Nilsson and Dong In Lee and Fredrik Kahl},
year={2025},
eprint={2511.22228},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.22228},}