JetStream 3.0: A Comprehensive Overhaul of Web Performance Benchmarks

Browser performance benchmarks are essential tools for developers striving to deliver faster, more responsive web experiences. The release of JetStream 3.0—a collaborative effort by Apple, Google, and Mozilla—marks a significant evolution in how we measure and optimize modern web applications. This article delves into the key changes behind JetStream 3, focusing on the WebKit team's engineering breakthroughs and the shift from legacy metrics to real-world relevance.

Why a New Benchmark Suite Was Necessary

The web is not static. As new best practices emerge and application complexity grows, older benchmarks inevitably lose their relevance. JetStream 2, while groundbreaking in its time, began to show its age—especially in measuring WebAssembly workloads. The original suite rewarded optimizations that no longer reflected real-world usage patterns, and in some cases, even produced mathematically infinite scores. JetStream 3 addresses these shortcomings by recalibrating the test set and introducing a more nuanced performance measurement framework.

JetStream 3.0: A Comprehensive Overhaul of Web Performance Benchmarks — Source: webkit.org

Rethinking WebAssembly Benchmarking

One of the most transformative updates in JetStream 3 is how it handles WebAssembly (Wasm) workloads. To appreciate the change, we must revisit the origins of Wasm benchmarking.

The Old Approach: Startup vs. Runtime

When JetStream 2 launched, WebAssembly was still nascent. Early adopters were large C/C++ codebases (e.g., video games) that previously compiled to asm.js. These applications tolerated a high one-time startup cost in exchange for sustained runtime performance. Consequently, JetStream 2 partitioned Wasm performance into two distinct phases: Startup (module instantiation) and Runtime (execution throughput). This separation made sense at the time, but it quickly became inadequate as engines optimized instantiation to near-zero times.

The Infinity Problem

Browser engines, including WebKit's JavaScriptCore, aggressively optimized Wasm startup. For smaller modules, instantiation time dropped to less than 1 millisecond—below the precision of Date.now(), which rounds down to whole milliseconds. In JetStream 2, each subtest's score was computed as 5000 / time. When time registered as zero milliseconds, the score mathematically became infinity. A temporary fix clamped the maximum score to 5000, but the fundamental issue remained: the benchmark could no longer distinguish between engines when startup was effectively invisible.

More importantly, a perfect startup time in a microbenchmark no longer reflected real-world usage, where Wasm now appears in image decoders, UI frameworks, and critical-path libraries. JetStream 3 resolves this by integrating Wasm performance into a unified scoring model that treats startup and runtime as interdependent factors, not separate silos.

Scaling to Modern Application Demands

JetStream 3 introduces larger, more realistic workloads across both JavaScript and WebAssembly. The new suite includes complex data processing tasks, animation-heavy interactions, and multi-threaded scenarios that mirror today's web applications. This ensures that optimizations driven by the benchmark translate to genuine user benefits rather than narrow mathematical victories.

Expanded test suite: Over 60 subtests covering everything from DOM manipulation to Wasm-compiled image filtering.
Real-world data sets: Benchmarks now use actual web content sizes and interaction patterns.
Unified scoring: No more separate startup/runtime scores; each subtest yields a single score based on total execution time.

WebKit's Engineering Improvements

The WebKit team contributed several key optimizations that are reflected in JetStream 3 scores:

Zero-cost Wasm instantiation: JavaScriptCore now shares compiled module instances across multiple invocations, reducing per-call overhead to near zero.
Incremental streaming compilation: The engine can begin executing code as soon as the first function is compiled, rather than waiting for the entire module to be ready.
Improved memory allocation: Faster malloc for Wasm linear memory reduces startup jitter and improves runtime throughput.

The Future of Performance Measurement

JetStream 3 represents more than a version bump—it is a fundamental shift in how browser engines are tested. By retiring outdated scoring models and embracing the complexity of real-world web applications, the suite sets a higher bar for performance engineering. For developers, it offers a trustworthy yardstick to gauge the impact of their optimizations. As the web continues to evolve, JetStream 3 ensures that benchmarks evolve with it, not against it.

Container Orchestration