ArtSplat

Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views

Inseo Lee1, Yoonji Kim2, Eugene Sohn1, Jiwoong Lee1, Jungmin You1, Joonseok Lee1,†, Jin-Hwa Kim1,3,†
1Seoul National University    2Sogang University    3NAVER AI Lab
Co-corresponding authors

TL;DR

ArtSplat reconstructs both geometry and joint parameters of an articulated object from sparse, uncalibrated RGB views across multiple articulation states — in a single feed-forward pass, over 400× faster than per-object optimization baselines.

Method

ArtSplat overall architecture.
Overall Architecture

ArtSplat predicts camera, depth, 3D Gaussians, and articulation in a single forward pass over two articulation states. Articulation is represented as a dense per-pixel joint map (11 channels: type, axis, pivot, angle, displacement), so the joint at each pixel directly drives the Gaussian primitive built on that pixel — making articulation end-to-end differentiable with geometry. A learnable state token per state, combined with a single Cross-State Attention block, lets each state's token attend to the patch tokens of the other state, capturing discrete inter-state motion. A dual-branch DPT head then decodes the joint map, splitting it into a 9-channel invariant group (type, axis, pivot) and a 2-channel variant group (angle, displacement).

Results

ArtSplat qualitative and quantitative results.
Qualitative Comparison

BibTeX

@article{lee2026artsplat,
  title   = {ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views},
  author  = {Lee, Inseo and Kim, Yoonji and Sohn, Eugene and Lee, Jiwoong and You, Jungmin and Lee, Joonseok and Kim, Jin-Hwa},
  journal = {arXiv preprint arXiv:2605.24304},
  year    = {2026}
}
Visitors