BlendNeRF

3D-aware Blending with Generative NeRFs

ICCV 2023

Hyunsu Kim¹, Gayoung Lee¹, Yunjey Choi¹, Jin-Hwa Kim^1,2, Jun-Yan Zhu³

¹NAVER AI Lab, ²SNU AIIS, ³CMU

BlendNeRF automatically aligns and composes two images,
even for images with different camera poses and object shapes.

Abstract

Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes.

To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space.

Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.

Comparison with Baselines

Comparison with the existing blending methods. Red lines denote target blending parts. (a) 2D blending. 2D blending methods compose two images without any 3D-aware alignment. (b) 2D blending with 3D-aware alignment. To address misalignment, we apply our 3D-aware alignment method to existing 2D blending methods. (c) Proposed method. We propose 3D-aware blending after applying our 3D-aware alignment. Note that all methods do not use 3D labels or 3D morphable models.

3D-aware Alignment

Global

Global alignment is an essential part of our blending method, as slight misalignment of two images in terms of rotation can make immense degradation in blending quality. In contrast to 2D GANs, generative NeRFs G can resolve this issue by novel-view synthesis.

Local

Local alignment is a fine-grained alignment between the target regions of two images. Even though we have matched two images through global rotation, the scale and translation of target regions (e.g., face, eyes, ears, etc.) need to be further aligned, as the location and size of each object parts differ across two object instances.

3D-aware Blending

We aim to find the best latent code w_edit to synthesize a seamless and natural output. To achieve this goal, we exploit both 2D pixel constraints (RGB value) and 3D geometric constraints (volume density). With the proposed image-blending and density-blending losses, we optimize the latent code w_edit.

Multi-view Blending Results

EG3D FFHQ

Original (recon.)

Reference (recon.)

Blend Face

Original (recon.)

Reference (recon.)

Blend Nose

Original (recon.)

Reference (recon.)

Blend Eyes

Original (recon.)

Reference (recon.)

Blend Hair

Original (recon.)

Reference (recon.)

Blend Lip

Original (recon.)

Reference (recon.)

Blend Face

Original (recon.)

Reference (recon.)

Blend Nose

Original (recon.)

Reference (recon.)

Blend Eyes

Original (recon.)

Reference (recon.)

Blend Hair

EG3D AFHQv2-Cat

Original (recon.)

Reference (recon.)

Blend Face

Original (recon.)

Reference (recon.)

Blend Ears

Original (recon.)

Reference (recon.)

Blend Eyes

Original (recon.)

Reference (recon.)

Blend Ears

Original (recon.)

Reference (recon.)

Blend Face

Original (recon.)

Reference (recon.)

Blend Face

Original (recon.)

Reference (recon.)

Blend Ears

Original (recon.)

Reference (recon.)

Blend Eyes

EG3D ShapeNet-Car

Original

Reference

Blend Front

Original

Reference

Blend Wheels

Original

Reference

Blend Front

Original

Reference

Blend Top

Original

Reference

Blend Front

Original

Reference

Blend Front

StyleSDF FFHQ

Original

Reference

Blend Eyes

Original

Reference

Blend Hair

Original

Reference

Blend Lip

Original

Reference

Blend Face

Original

Reference

Blend Eyebrows

BibTeX

@inproceedings{kim20233daware,
      title={3D-aware Blending with Generative NeRFs}, 
      author={Hyunsu Kim and Gayoung Lee and Yunjey Choi and Jin-Hwa Kim and Jun-Yan Zhu},
      booktitle={ICCV},
      year={2023}
    }

BlendNeRF

3D-aware Blending with Generative NeRFs

ICCV 2023

BlendNeRF automatically aligns and composes two images, even for images with different camera poses and object shapes.

Abstract

Comparison with Baselines

3D-aware Alignment

Global

Local

3D-aware Blending

Multi-view Blending Results

EG3D FFHQ

EG3D AFHQv2-Cat

EG3D ShapeNet-Car

StyleSDF FFHQ

BibTeX

BlendNeRF automatically aligns and composes two images,
even for images with different camera poses and object shapes.