ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views

Lee, Inseo; Kim, Yoonji; Sohn, Eugene; Lee, Jiwoong; You, Jungmin; Lee, Joonseok; Kim, Jin-Hwa

ArtSplat

Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views

Inseo Lee¹, Yoonji Kim², Eugene Sohn¹, Jiwoong Lee¹, Jungmin You¹, Joonseok Lee^1,†, Jin-Hwa Kim^1,3,†

¹Seoul National University ²Sogang University ³NAVER AI Lab
^†Co-corresponding authors

arXiv Code Hugging Face (Coming Soon)

TL;DR

ArtSplat reconstructs both geometry and joint parameters of an articulated object from sparse, uncalibrated RGB views across multiple articulation states — in a single feed-forward pass, over 400× faster than per-object optimization baselines.

Method

ArtSplat overall architecture. — Overall Architecture

ArtSplat predicts camera, depth, 3D Gaussians, and articulation in a single forward pass over two articulation states. Articulation is represented as a dense per-pixel joint map (11 channels: type, axis, pivot, angle, displacement), so the joint at each pixel directly drives the Gaussian primitive built on that pixel — making articulation end-to-end differentiable with geometry. A learnable state token per state, combined with a single Cross-State Attention block, lets each state's token attend to the patch tokens of the other state, capturing discrete inter-state motion. A dual-branch DPT head then decodes the joint map, splitting it into a 9-channel invariant group (type, axis, pivot) and a 2-channel variant group (angle, displacement).

Results

BibTeX

@article{lee2026artsplat,
  title   = {ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views},
  author  = {Lee, Inseo and Kim, Yoonji and Sohn, Eugene and Lee, Jiwoong and You, Jungmin and Lee, Joonseok and Kim, Jin-Hwa},
  journal = {arXiv preprint arXiv:2605.24304},
  year    = {2026}
}