Given a set of images from different views and their corresponding camera poses, PAPR learns a point-based surface representation of the scene and a rendering pipeline from scratch. Additionally, PAPR enables practical applications such as geometry editing, object manipulation, texture transfer, and exposure control.
We show the RGB rendering of the scene in the first row and the corresponding learnt point cloud in the second row.
Our method deforms the initial point cloud to correctly represent the target geometry. In contrast, the baselines either fail to recover the geometry, produce noisy results, or lack structural details in the learnt geometry.
We can edit the geometry of the scene by simply manipulating the point cloud of the scene without any additional supervision. Here we demonstrate rigid bending motions applied to the ficus branch and the Lego bulldozer's arm in the first column, roation of the statue's head and the ship in the second column, and non-volume preserving stretching transformations applied to the tip of the microphone and the back of the chair in the third column.
Gaussian Splatting produces significant noise after the non-volume preserving stretching transformation. In contrast, our method successfully avoids creating holes and effectively preserves the texture details after the transformation.
We can edit the scene by adding, removing or duplicating points in the point cloud. Here we demonstrate the addition of an extra hotdog to the plate (left), and the removal of certain material balls while duplicating others (right).
We can transfer the texture from one part of the scene to another by transferring the associated feature vectors of the corresponding points. Here we transfer the texture of the mustard to the ketchup by transferring the features of the points that correspond to the mustard (highlighted in yellow) to a subset of points that correspond to the ketchup (highlighted in red).
We introduce an additional latent code input into our model and train it using a technique called conditional Implicit Maximum Likelihood Estimation (cIMLE). During test time, we can manipulate the exposure of the rendered image by changing the latent code input.
@inproceedings{zhang2023papr,
title={PAPR: Proximity Attention Point Rendering},
author={Yanshu Zhang and Shichong Peng and Seyed Alireza Moazenipourasil and Ke Li},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}