Snapchat’s Data Revolution: NVIDIA GPUs Power Faster Innovation for 940 Million Users
The relentless pace of feature development on platforms like Snapchat demands equally rapid advancements in the underlying infrastructure. To meet this challenge, Snap has strategically adopted open data processing libraries from NVIDIA, leveraging the power of Google Cloud services to dramatically accelerate its development cycles. This shift isn’t merely about speed; it’s about enabling a more innovative and responsive user experience for Snapchat’s massive global audience.
Every new feature, from subtle interface tweaks to groundbreaking augmented reality experiences, undergoes rigorous A/B testing before reaching Snapchat’s over 940 million monthly active users. This process involves analyzing nearly 6,000 distinct metrics – encompassing user engagement, application performance, and monetization effectiveness – across carefully segmented user groups. The sheer volume of data generated by these experiments is staggering.
Snap processes over 10 petabytes of data daily, completing this intensive analysis within a critical three-hour window each morning. Traditionally reliant on Apache Spark, the company has now integrated Apache Spark accelerated by NVIDIA cuDF, unlocking a fourfold increase in runtime speed without increasing hardware resources. This represents a significant leap in efficiency and a cost-effective pathway to scalability.
The integration of NVIDIA’s GPU-optimized software, including the comprehensive CUDA-X libraries, with Google’s robust infrastructure management tools like Google Kubernetes Engine, has created a full-stack platform optimized for large-scale data processing. This synergy allows Snap to push the boundaries of experimentation and deliver a more dynamic and engaging platform.
“Experimentation is the lifeblood of our company,” explains Prudhvi Vatala, Senior Engineering Manager at Snap. “Transitioning our data infrastructure from CPUs to GPUs allows us to scale our experimentation efforts – encompassing more features, a wider range of metrics, and a larger user base – with unprecedented efficiency. The more experiments we can run, the more innovative experiences we can deliver to Snapchat users.”
Scaling Innovation Sustainably
Snapchat users are accustomed to a constant stream of new features, from arrival notifications to AI-powered creative tools. However, behind the visible innovations lies a continuous cycle of behind-the-scenes improvements – performance optimizations, compatibility updates for evolving operating systems, and ongoing refinements to the core user experience. All of this relies on robust and efficient A/B testing.
Now, all A/B testing is powered by cuDF, enabling developers to seamlessly deploy existing Apache Spark applications on NVIDIA GPUs without requiring any code modifications. This open library builds upon the foundation of the NVIDIA cuDF GPU DataFrame library, extending its capabilities to the Apache Spark distributed computing framework.
Internal Snap data, collected between January 1st and February 28th, reveals a remarkable 76% reduction in daily costs when utilizing NVIDIA GPUs on Google Kubernetes Engine compared to traditional CPU-only workflows. This cost savings is not merely incremental; it’s transformative.
“We were facing a potential roadblock in our scaling plans,” Vatala admits. “Our projected computing costs were unsustainable based on our existing infrastructure. Switching to GPU-accelerated pipelines with cuDF provided a solution, effectively flattening the scaling curve, and the results have exceeded our expectations.”
To facilitate this migration, Snap leveraged cuDF’s suite of microservices, automating the qualification, testing, configuration, and optimization of Spark workloads for GPU acceleration at scale. Collaboration with NVIDIA experts further refined the pipelines, optimizing them for Google Cloud’s G2 virtual machines powered by NVIDIA L4 GPUs.
This optimization resulted in a dramatic reduction in GPU requirements – just 2,100 GPUs running concurrently, compared to the initial projection of approximately 5,500, based on data collected between January 1st and March 13th. This represents a significant reduction in both cost and energy consumption.
“The initial results were astonishing,” says Joshua Sambasivam, a Backend Engineer on the A/B testing team. “We saw far greater cost savings than anticipated. The Spark accelerator is a perfect fit for our workloads.”
Snap plans to expand the use of the Spark accelerator beyond the A/B testing team, integrating it into a wider range of production workloads. “We’ve only scratched the surface of what’s possible,” Vatala concludes. “We’ve migrated our two largest pipelines so far, but the potential for further optimization is immense.”
Learn more by watching Vatala’s session at NVIDIA GTC, taking place Tuesday, March 17th at 1 p.m. PT.
Read more about NVIDIA cuDF and get started with GPU acceleration for Apache Spark.
Image courtesy of Snap, depicting an A/B test of its Maps feature.
How will this increased experimentation speed impact the future of Snapchat features? And what other social media platforms might adopt similar strategies to stay ahead of the curve?
Frequently Asked Questions About Snapchat and NVIDIA GPU Acceleration
What is the primary benefit of Snapchat using NVIDIA GPUs for data processing?
The primary benefit is a significant increase in processing speed – a fourfold improvement in runtime – allowing Snapchat to run more experiments, analyze more data, and deliver innovative features to its users faster and more cost-effectively.
How does NVIDIA cuDF contribute to Snapchat’s data processing efficiency?
NVIDIA cuDF allows Snapchat developers to run existing Apache Spark applications on NVIDIA GPUs without requiring any code changes, simplifying the migration process and accelerating data analysis.
What cost savings has Snapchat realized by switching to GPU-accelerated pipelines?
Snapchat has achieved a remarkable 76% reduction in daily costs by utilizing NVIDIA GPUs on Google Kubernetes Engine compared to CPU-only workflows, based on data collected between January 1st and February 28th.
What role does Google Cloud play in Snapchat’s data infrastructure?
Snapchat leverages Google Cloud’s infrastructure management services, such as Google Kubernetes Engine, in conjunction with NVIDIA’s GPU-optimized software to create a full-stack platform for large-scale data processing.
Is Snapchat planning to expand its use of GPU acceleration beyond A/B testing?
Yes, Snap plans to integrate the Spark accelerator into a broader range of production workloads, recognizing the significant potential for optimization across its entire platform.
What is the significance of the NVIDIA L4 GPU in Snapchat’s infrastructure?
The NVIDIA L4 GPU, used on Google Cloud’s G2 virtual machines, played a key role in optimizing Snapchat’s pipelines, reducing the required number of GPUs from a projected 5,500 to just 2,100.
Disclaimer: This article provides information for general knowledge and informational purposes only, and does not constitute professional advice. Snapchat’s technology and strategies are subject to change.
Share this article with your network to spark a conversation about the future of data processing and innovation in the social media landscape! What are your thoughts on the impact of GPU acceleration on user experience?
Discover more from Archyworldys
Subscribe to get the latest posts sent to your email.