3D-aware Blending with Generative NeRFs

ICCV 2023


BlendNeRF automatically aligns and composes two images,
even for images with different camera poses and object shapes.


Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes.

To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space.

Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.

Comparison with Baselines

Comparison with the existing blending methods. Red lines denote target blending parts. (a) 2D blending. 2D blending methods compose two images without any 3D-aware alignment. (b) 2D blending with 3D-aware alignment. To address misalignment, we apply our 3D-aware alignment method to existing 2D blending methods. (c) Proposed method. We propose 3D-aware blending after applying our 3D-aware alignment. Note that all methods do not use 3D labels or 3D morphable models.

3D-aware Alignment


Global alignment is an essential part of our blending method, as slight misalignment of two images in terms of rotation can make immense degradation in blending quality. In contrast to 2D GANs, generative NeRFs G can resolve this issue by novel-view synthesis.


Local alignment is a fine-grained alignment between the target regions of two images. Even though we have matched two images through global rotation, the scale and translation of target regions (e.g., face, eyes, ears, etc.) need to be further aligned, as the location and size of each object parts differ across two object instances.

3D-aware Blending

We aim to find the best latent code wedit to synthesize a seamless and natural output. To achieve this goal, we exploit both 2D pixel constraints (RGB value) and 3D geometric constraints (volume density). With the proposed image-blending and density-blending losses, we optimize the latent code wedit.

Multi-view Blending Results



EG3D ShapeNet-Car



      title={3D-aware Blending with Generative NeRFs}, 
      author={Hyunsu Kim and Gayoung Lee and Yunjey Choi and Jin-Hwa Kim and Jun-Yan Zhu},