Recent advancements in neural representations, such as Neural Radiance Fields and 3D Gaussian Splatting, have increased interest in applying style transfer to 3D scenes. While existing methods can transfer style patterns onto 3D-consistent neural representations, they struggle to effectively extract and transfer high-level style semantics from the reference style image. Additionally, the stylized results often lack structural clarity and separation, making it difficult to distinguish between different instances or objects within the 3D scene. To address these limitations, we propose SSGaussian, a 3D style transfer pipeline that effectively integrates prior knowledge from pretrained 2D diffusion models. Our pipeline consists of two key stages: First, we leverage diffusion priors to generate stylized renderings of key viewpoints. Then, we transfer the stylized key views onto the 3D representation. This process incorporates two innovative designs. The first is Cross-View Style Alignment, which inserts cross-view attention into the last upsampling block of the UNet, allowing feature interactions across multiple key views. This ensures that the diffusion model generates stylized key views that maintain both style fidelity and instance-level consistency. The second is Instance-level Style Transfer, which effectively leverages instance-level consistency across stylized key views and transfers it onto the 3D representation. This facilitates a more structured, visually coherent, and artistically enriched stylization. Extensive qualitative and quantitative experiments demonstrate that our 3D style transfer pipeline significantly outperforms state-of-the-art methods across a wide range of scenes, from forward-facing to challenging 360-degree environments.
Two Stage Stylization. We decompose the 3D style transfer task into two sequential stages: the stylization of key views and the stylization of the 3D Gaussian Splatting (3DGS) representation based on those stylized key views. In Stage 1, given a style reference image along with RGB and depth images rendered from the 3DGS, we design a diffusion model to effectively transfer style semantics to the selected key viewpoints. In Stage 2, leveraging group matching between the key viewpoints and training views, we introduce an Instance-level Style Transfer approach that hierarchically transfers the style semantics onto the 3DGS representation.
This webpage integrates components from many websites, including StyleRF, RefNeRF, RegNeRF, DreamFusion, and Richard Zhang's template. We sincerely thank the authors for their great work and websites.