Pinterest Labs

Visual understanding

The Pinterest visual embedding system is trained with a large multimodal pre-training contrastive task, and fine-tuned with dozens of visual-specific and retrieval-specific datasets, enabling state-of-the-art visual search.

The Pinterest visual embedding system is at the center of the visual understanding that underpins Pinterest. The current iteration of this technology is trained with a large multimodal pre-training contrastive task to align representations from the Pinterest graph-sampled image-text pairs. A series of fine-tuning stems with dozens of visual-specific and retrieval-specific datasets, result in a state-of-the-art representation system for retrieval.

These embeddings are then optimized for compatibility with multimodal query understanding, diffusion model conditioning, LLM image projection, and more, and are deployed throughout the Pinterest product across every major ML surface.

Join us and create the career you love

Work in a collaborative, open-ended, publish-friendly environment, and build AI technology on top of the rich visual graph structure inherent to Pinterest, and ship products to 550M+ users.

Explore jobs
people together