Skip to content

Latest commit

 

History

History
119 lines (97 loc) · 4.83 KB

README.md

File metadata and controls

119 lines (97 loc) · 4.83 KB

Logo

X-Ray: A Sequential 3D Representation for Generation.

Introduction

We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected surfaces. This process efficiently condenses the whole 3D object into a multi-frame video format, motivating the utilize of a network architecture similar to those in video diffusion models. This design ensures an efficient 3D representation by focusing solely on surface information. We demonstrate the practicality and adaptability of our X-Ray representation by synthesizing the complete visible and hidden surfaces of a 3D object from a single input image, which paves the way for new 3D representation research and practical applications.


The example of X-Ray.


The overview of 3D synthesis via X-Ray.

Getting Started

Installation

$ conda create -n xray python=3.10
$ pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
$ pip install -U xformers==v0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118
$ pip install -r requirements.txt

Download Dataset from Huggingface.

$ cat 0*.zip > Objaverse_XRay.zip
$ unzip Objaverse_XRay.zip
$ ln -s /path/to/Objaverse_XRay Data/Objaverse_XRay

Preprocess rendering image and obtain X-Ray for your own dataset.

  • Render the mesh to obtain the image and camera parameters.
$ cd preprocess/get_image
$ bash custom/render_mesh.sh
  • Obtain the X-Ray representation.
$ cd preprocess/get_xray
$ python get_xray.py
  • load xray from .npz file
from scipy.sparse import csr_matrix
import numpy as np

def load_xray(xray_path):
    loaded_data = np.load(xray_path)
    loaded_sparse_matrix = csr_matrix((loaded_data['data'], loaded_data['indices'], loaded_data['indptr']), shape=loaded_data['shape'])
    original_shape = (16, 1+3+3, 256, 256)
    restored_array = loaded_sparse_matrix.toarray().reshape(original_shape)
    return restored_array
xray = load_xray('example/dataset/xrays/0a0bc2921e5246a28732bf5584c251d1/000.npz')
  • A minimal dataset is located in ./example/dataset

Training

Train Diffusion Model

$ bash scripts/train_diffusion.sh

Train Upsampler

$ bash scripts/train_upsampler.sh

Evaluation

$ python evaluate_diffusion.py --exp_diffusion Objaverse_XRay --date_root Data/Objaverse_XRay
$ python evaluate_upsampler.py --exp_diffusion Objaverse_XRay --exp_upsampler Objaverse_XRay_upsampler

TODO list

  • Release paper details.
  • Release the dataset.
  • Release the training and testing source code.
  • Release the preprocessing code.
  • Release the pre-trained model.
  • Release the gradio demo.

Authors

Tao Hu et al.

Acknowledgement

  • The model is related to Diffusers and Stability AI;
  • The source code is mainly based on SVD Xtend, which can train Stable Video Diffusion from scratch.

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{
hu2024xray,
title={X-Ray: A Sequential 3D Representation For Generation},
author={Tao Hu and Wenhang Ge and Yuyang Zhao and Gim Hee Lee},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=36tMV15dPO}
}