3D style transfer has achieved promising results, driven by advances in neural representations such as Neural Radiance Fields and 3D Gaussian Splatting. However, existing approaches struggle to capture high-level style semantics from reference images and often produce results with limited structural clarity and poor instance separation. To address these issues, we propose a novel two-stage pipeline that leverages prior knowledge from 2D diffusion models. In the first stage, stylized key views are generated using a diffusion-based process, augmented with a Cross-View Style Alignment module that introduces cross-view attention to ensure style fidelity and instance-level consistency across viewpoints. In the second stage, an Instance-level Style Transfer mechanism propagates these coherent stylistic characteristics to the underlying 3D representation. Our pipeline yields well-structured, semantically coherent stylization and outperforms state-of-the-art approaches across diverse scenes, as validated by extensive experiments.
Two Stage Stylization. We decompose the 3D style transfer task into two sequential stages: the stylization of key views and the stylization of the 3D Gaussian Splatting (3DGS) representation based on those stylized key views. In Stage 1, given a style reference image along with RGB and depth images rendered from the 3DGS, we design a diffusion model to effectively transfer style semantics to the selected key viewpoints. In Stage 2, leveraging group matching between the key viewpoints and training views, we introduce an Instance-level Style Transfer approach that hierarchically transfers the style semantics onto the 3DGS representation.
This webpage integrates components from many websites, including StyleRF, RefNeRF, RegNeRF, DreamFusion, and Richard Zhang's template. We sincerely thank the authors for their great work and websites.