JIT Optimizations and Deoptimization

The Dart VM uses adaptive optimizing compilation driven by runtime execution profiles to generate high-performance code.

Compilation Pipeline Overview

The VM has two compilers:

Unoptimizing Compiler - Fast compilation, collects type feedback
Optimizing Compiler - Slower compilation, applies speculative optimizations

Unoptimizing Compiler

When a function is first called, it’s compiled by the unoptimizing compiler:

Pipeline

Kernel AST → Unoptimized IL → Machine Code

Parse Kernel AST - Walk serialized function body
Build CFG - Generate control flow graph with basic blocks
Generate IL - Stack-based intermediate language instructions
Emit code - Direct one-to-many lowering to machine code

Goals

Compile as quickly as possible
No optimizations applied
Collect execution profile:
- Inline caches - Track receiver types at call sites
- Execution counters - Track hot functions and basic blocks

Lazy Compilation

All functions initially point to LazyCompileStub:

┌──────────┐
│ Function │
│          │     LazyCompileStub
│  code_ ━━━━━━▶ ┌─────────────────────────────┐
│          │     │ code = CompileFunction(...) │
└──────────┘     │ return code(...);           │
                 └─────────────────────────────┘

First invocation triggers compilation, then tail-calls the generated code.

Inline Caching

Dynamic calls use inline caching for fast method resolution:

Structure

ICData object - Maps receiver class → method + frequency counter
Lookup stub - Searches cache, increments counter, tail-calls method
Runtime miss handler - Resolves method, updates cache

Example

class Dog {
  get face => '🐶';
}

class Cat {
  get face => '🐱';
}

sameFace(animal, face) {
  animal.face == face;  // Call site with IC
}

sameFace(Dog(), ...);  // IC: [Dog, Dog.get:face, 1]
sameFace(Dog(), ...);  // IC: [Dog, Dog.get:face, 2]
sameFace(Cat(), ...);  // IC: [Dog, Dog.get:face, 2,
                       //      Cat, Cat.get:face, 1]

Cache States

Monomorphic - One class observed (fastest)
Polymorphic - Few classes observed (fast)
Megamorphic - Many classes observed (slower, switches to different dispatch)

Optimizing Compiler

When a function’s execution counter reaches threshold (optimization_counter_threshold), it’s submitted to the background optimizing compiler.

Pipeline

Kernel AST → Unoptimized IL → SSA IL → Optimized SSA IL → Machine Code

Build unoptimized IL - Same as unoptimizing compiler
Convert to SSA - Static single assignment form
Apply optimizations - Multiple passes using type feedback
Lower to machine code - Linear scan register allocation + lowering

Optimization Passes

Major optimizations include:

Inlining

Replace function calls with function body
Reduces call overhead
Enables further optimizations
Controlled by heuristics (size, depth, hotness)

Flags:

--inlining_hotness=<count>              # Call count threshold
--inlining_size_threshold=<nodes>       # Max caller size
--inlining_callee_size_threshold=<nodes> # Max callee size  
--inlining_depth_threshold=<depth>      # Max nesting depth
--inline_getters_setters_smaller_than=<nodes>

Type Propagation

Propagate type information through IL graph
Uses type feedback from inline caches
Enables devirtualization and specialization

Range Analysis

Infer integer value ranges
Eliminate bounds checks on array access
Eliminate overflow checks

Representation Selection

Choose optimal representation (boxed vs unboxed)
Unbox integers and doubles where possible
Reduces allocation and improves performance

Common Subexpression Elimination (CSE)

Eliminate redundant computations
Reuse previously computed values

Loop-Invariant Code Motion (LICM)

Move computations out of loops
Reduces work in hot loops

Load/Store Forwarding

Forward stored values to subsequent loads
Eliminate redundant memory accesses

Global Value Numbering (GVN)

Identify equivalent computations globally
Eliminate duplicates

Allocation Sinking

Delay or eliminate temporary object allocations
Move allocations to where actually needed

Speculative Optimizations

Optimizations based on runtime feedback:

Call specialization - Convert dynamic calls to direct calls based on observed types
Class hierarchy analysis (CHA) - Use class hierarchy assumptions
Unboxing - Assume Smi or double based on feedback

Speculative optimizations require guards to check assumptions. If assumptions fail, code must deoptimize.

Deoptimization

When optimized code encounters a case it can’t handle, it deoptimizes to unoptimized code.

Types of Deoptimization

Eager Deoptimization

Inline checks fail at the use site:

CheckSmi:1(v1)               // Deoptimizes if v1 is not a Smi
CheckClass:1(v1, Dog)        // Deoptimizes if v1 is not a Dog
BinarySmiOp:1(+, v1, v2)     // May deopt on overflow

Example:

void printAnimal(obj) {
  print('Animal {');
  print('  ${obj.toString()}');  // Optimized for Cat
  print('}');
}

// Call with Cat 50000 times - optimizes assuming obj is Cat
for (var i = 0; i < 50000; i++)
  printAnimal(Cat());

// Call with Dog - optimized code can't handle Dog, deoptimizes
printAnimal(Dog());

Lazy Deoptimization

Global guards trigger when runtime state changes:

Class finalization adds subclass (violates CHA assumptions)
Dynamic code loading invalidates assumptions
Runtime finds invalid optimized code on stack
Frames marked for deoptimization, applied on return

Deoptimization Process

Match deopt ID - Maps optimized code position → unoptimized code position
Reconstruct state - Build unoptimized frame(s) from optimized state
Transfer execution - Continue in unoptimized code
Discard optimized code - Usually discarded, will reoptimize later with updated feedback

Deopt Instructions

Deoptimization uses mini-interpreter executing deopt instructions:

Generated during compilation at each potential deopt location
Describe how to reconstruct unoptimized state from optimized state
Handle multiple unoptimized frames from single optimized frame (inlining)

On-Stack Replacement (OSR)

For long-running loops, switch from unoptimized to optimized code while function is running:

Loop executes in unoptimized code
Loop back-edge counter reaches threshold
Background compile optimized version with OSR entry point
On next iteration, jump to optimized code
Stack frame transparently replaced

Optimization Control Flags

Compilation Control

# Threshold to trigger optimization (-1 = never optimize)
--optimization_counter_threshold=30000

# Optimization level (1=size, 2=default, 3=speed)
--optimization_level=2

# Background compilation
--background_compilation=true
--no-background-compilation          # Compile on main thread

# OSR (on-stack replacement)
--use_osr=true

# Deoptimization for testing
--deoptimize_every=0                 # Deopt every N stack overflow checks
--deoptimize_alot=false              # Deopt before returning from native
--deoptimize_on_runtime_call_every=0 # Deopt on every Nth runtime call

Debugging Output

# Print IL for compilations
--print-flow-graph                   # Print unoptimized IL
--print-flow-graph-optimized         # Print optimized IL only
--print-flow-graph-filter=foo,bar    # Limit to functions matching filter

# Disassemble machine code
--disassemble                        # Disassemble all compiled code
--disassemble-optimized              # Disassemble optimized code only
--disassemble-relative               # Use offsets instead of absolute PCs

# Compiler passes
--compiler-passes=help               # List available passes
--compiler-passes=[pass1,pass2]      # Run specific passes only

# Trace compilation
--trace_compiler=false               # Trace all compilations
--trace_optimizing_compiler=false    # Trace optimizing compilations
--trace_optimization=false           # Print optimization details  
--trace_deoptimization=false         # Trace deoptimizations
--trace_deoptimization_verbose=false # Detailed deopt instruction trace

# Determinism (for benchmarking)
--deterministic=false                # Disable non-deterministic sources

Feature Flags

# Class hierarchy analysis
--use_cha_deopt=true                 # Allow CHA to cause deoptimization

# Field tracking
--use_field_guards=true              # Track field types
--trace_field_guards=false           # Trace field guard changes

# Inline allocation
--inline_alloc=true                  # Use inline allocation fast paths

# Other
--reorder_basic_blocks=true          # Reorder blocks for better cache locality
--truncating_left_shift=true         # Optimize left shift to truncate
--polymorphic_with_deopt=true        # Allow polymorphic calls with deopt
--guess_icdata_cid=true              # Artificially create type feedback

Optimization Levels

The --optimization_level flag controls which optimizations are applied:

Level 1 (Os - Optimize for Size)

Skip O2 optimizations that increase code size
Introduce optimizations favoring code size over speed
Example: Less aggressive inlining

Level 2 (O2 - Default)

Balanced compile-time, code speed, and code size
All standard optimizations with proper heuristics
Default for production

Level 3 (O3 - Optimize for Speed)

More detailed analysis for speed improvements
Accept longer compile-time and larger code size
More aggressive optimization heuristics

Optimization levels should not be used as a substitute for proper heuristics. An optimization that improves speed with minimal size increase belongs in O2, not O3.

Example: Optimization in Action

(a, b) => a + b;

Unoptimized IL:

LoadLocal('a')
LoadLocal('b')
InstanceCall('+')
Return

After type feedback (both a and b observed as Smi): Optimized IL:

v1 <- Parameter('a')
v2 <- Parameter('b')
CheckSmi:1(v1)              // Guard: deopt if not Smi
CheckSmi:1(v2)              // Guard: deopt if not Smi  
v3 <- BinarySmiOp:1(+, v1, v2)
Return(v3)

Machine code:

movq rax, [rbp+...]   # Load a
testq rax, 1          # Check Smi tag
jnz ->deopt@1         # Deopt if not Smi
movq rbx, [rbp+...]   # Load b
testq rbx, 1          # Check Smi tag  
jnz ->deopt@1         # Deopt if not Smi
addq rax, rbx         # Add (no tag removal needed!)
jo ->deopt@1          # Deopt on overflow
retq

Key Source Files

runtime/vm/compiler/compiler_pass.cc - Optimization pass pipeline
runtime/vm/compiler/jit/compiler.cc - JIT compiler entry points
runtime/vm/compiler/jit/jit_call_specializer.cc - Type feedback specialization
runtime/vm/compiler/backend/il.h - IL instruction definitions
runtime/vm/compiler/backend/inliner.cc - Inlining logic
runtime/vm/compiler/backend/range_analysis.cc - Range analysis
runtime/vm/compiler/backend/type_propagator.cc - Type propagation
runtime/vm/deopt_instructions.cc - Deoptimization machinery
runtime/docs/compiler/optimization_levels.md - Optimization level design

Getting Started

Building the SDK

Core Concepts

Compilers

Core Libraries

Runtime & VM

Tools & CLI

Development

Documentation Index

​Compilation Pipeline Overview

​Unoptimizing Compiler

​Pipeline

​Goals

​Lazy Compilation

​Inline Caching

​Structure

​Example

​Cache States

​Optimizing Compiler

​Pipeline

​Optimization Passes

​Inlining

​Type Propagation

​Range Analysis

​Representation Selection

​Common Subexpression Elimination (CSE)

​Loop-Invariant Code Motion (LICM)

​Load/Store Forwarding

​Global Value Numbering (GVN)

​Allocation Sinking

​Speculative Optimizations

​Deoptimization

​Types of Deoptimization

​Eager Deoptimization

​Lazy Deoptimization

​Deoptimization Process

​Deopt Instructions

​On-Stack Replacement (OSR)

​Optimization Control Flags

​Compilation Control

​Debugging Output

​Feature Flags

​Optimization Levels

​Level 1 (Os - Optimize for Size)

​Level 2 (O2 - Default)

​Level 3 (O3 - Optimize for Speed)

​Example: Optimization in Action

​Key Source Files

Compilation Pipeline Overview

Unoptimizing Compiler

Pipeline

Goals

Lazy Compilation

Inline Caching

Structure

Example

Cache States

Optimizing Compiler

Pipeline

Optimization Passes

Inlining

Type Propagation

Range Analysis

Representation Selection

Common Subexpression Elimination (CSE)

Loop-Invariant Code Motion (LICM)

Load/Store Forwarding

Global Value Numbering (GVN)

Allocation Sinking

Speculative Optimizations

Deoptimization

Types of Deoptimization

Eager Deoptimization

Lazy Deoptimization

Deoptimization Process

Deopt Instructions

On-Stack Replacement (OSR)

Optimization Control Flags

Compilation Control

Debugging Output

Feature Flags

Optimization Levels

Level 1 (Os - Optimize for Size)

Level 2 (O2 - Default)

Level 3 (O3 - Optimize for Speed)

Example: Optimization in Action

Key Source Files