Code Generation and Lowering

Code generation is the final phase of compilation where Intermediate Language (IL) instructions are lowered to native machine code. This process is implemented in runtime/vm/compiler/backend/flow_graph_compiler.cc and architecture-specific files.

Code Generation Architecture

The code generation process consists of several stages:

IL Finalization: Prepare IL for code generation
Register Allocation: Assign registers to values
Instruction Lowering: Convert IL to machine instructions
Code Emission: Generate native code bytes
Metadata Generation: Create debugging and deoptimization info

FlowGraphCompiler

The FlowGraphCompiler class (flow_graph_compiler.cc:135) orchestrates code generation:

FlowGraphCompiler::FlowGraphCompiler(
    compiler::Assembler* assembler,
    FlowGraph* flow_graph,
    const ParsedFunction& parsed_function,
    bool is_optimizing,
    ZoneGrowableArray<const ICData*>* deopt_id_to_ic_data,
    CodeStatistics* stats)

Key Responsibilities

Block Ordering

Determines the order blocks are emitted in native code.

block_order_(*flow_graph->CodegenBlockOrder())

Optimized for:

Cache locality
Branch prediction
Fall-through optimization

Exception Handling

Generates exception handler tables.

exception_handlers_list_ = 
    new ExceptionHandlerList(parsed_function().function());

Maps try-catch blocks to code locations.

Deoptimization Support

Creates deoptimization metadata for optimized code.

deopt_infos_()  // Stores deopt information

Enables fallback to unoptimized code when assumptions fail.

Static Call Tracking

Maintains table of static call targets.

static_calls_target_table_()

Used for patching and reoptimization.

Register Allocation

FlowGraphAllocator allocator(*flow_graph);
allocator.AllocateRegisters();

Location Summary

Each instruction defines its location requirements:

virtual LocationSummary* MakeLocationSummary(Zone* zone, 
                                             bool optimizing) const = 0;

Location types (runtime/vm/compiler/backend/locations.h):

Location::RequiresRegister(): Needs a CPU register
Location::RequiresFpuRegister(): Needs FPU register
Location::RegisterLocation(reg): Specific register
Location::StackSlot(index): Stack location
Location::Constant(value): Constant value

Example (il.h instruction):

LocationSummary* BinarySmiOp::MakeLocationSummary(Zone* zone, 
                                                   bool opt) const {
  const intptr_t kNumInputs = 2;
  const intptr_t kNumTemps = 0;
  LocationSummary* summary = new(zone) LocationSummary(
      zone, kNumInputs, kNumTemps, LocationSummary::kNoCall);
  summary->set_in(0, Location::RequiresRegister());
  summary->set_in(1, Location::RequiresRegister());
  summary->set_out(0, Location::RequiresRegister());
  return summary;
}

Instruction Emission

Each IL instruction implements EmitNativeCode:

virtual void EmitNativeCode(FlowGraphCompiler* compiler);

Architecture-Specific Code Generation

Code generation is split across architecture files:

il_x64.cc: x64 (Intel/AMD 64-bit)
il_arm64.cc: ARM64 (Apple Silicon, etc.)
il_arm.cc: ARM32
il_ia32.cc: x86 (32-bit Intel)
il_riscv.cc: RISC-V

Example: Binary Smi Operation on x64

void BinarySmiOp::EmitNativeCode(FlowGraphCompiler* compiler) {
  const Register left = locs()->in(0).reg();
  const Register right = locs()->in(1).reg();
  const Register result = locs()->out(0).reg();
  
  switch (op_kind()) {
    case Token::kADD:
      __ addq(result, right);  // x64 add instruction
      __ j(OVERFLOW, slow_path->entry_label());
      break;
    case Token::kSUB:
      __ subq(result, right);
      __ j(OVERFLOW, slow_path->entry_label());
      break;
    // ... other operations
  }
}

Lowering Stages

IL instructions are lowered in multiple stages:

Stage 1: High-Level Lowering

Before register allocation:

SelectRepresentations

Choose value representations (boxed vs unboxed).

flow_graph->SelectRepresentations();

Decisions:

Unbox doubles for arithmetic (avoid heap allocation)
Keep Smis unboxed when possible
Box only when necessary for calls/stores

Example:

double x = 1.0;
double y = x + 2.0;  // Unboxed double ops
double z = x * y;    // Stay unboxed
obj.field = z;       // Box only here

InsertMoveArguments

Insert move instructions for call arguments.

flow_graph->InsertMoveArguments();

Explicitly represents argument passing in IL.

Stage 2: Post-Optimization Lowering

After all optimizations:

ExtractNonInternalTypedDataPayloads

Lower typed data access patterns.

flow_graph->ExtractNonInternalTypedDataPayloads();

Separates base pointer from offset calculations.

Sanitizer Instrumentation

Add runtime checks if sanitizers enabled.

flow_graph->AddAsanMsanInstrumentation();  // Address/Memory sanitizer
flow_graph->AddTsanInstrumentation();      // Thread sanitizer

Helps catch bugs during development.

Memory Model and Calling Conventions

Stack Frame Layout

Typical stack frame structure:

+------------------+
| Return Address   |
+------------------+
| Saved FP         |
+------------------+ <- FP (Frame Pointer)
| Spill Slots      |
+------------------+
| Local Variables  |
+------------------+
| Outgoing Args    |
+------------------+ <- SP (Stack Pointer)

Calling Convention

Dart uses platform-specific calling conventions defined in dart_calling_conventions.cc: x64:

Arguments: RDI, RSI, RDX, RCX, R8, R9, [stack]
Return: RAX (integers), XMM0 (doubles)
Preserved: RBX, R12-R15

ARM64:

Arguments: R0-R7, [stack]
Return: R0 (integers), V0 (doubles)
Preserved: R19-R28

Code Emission Examples

Example 1: Loading a Field

void LoadField::EmitNativeCode(FlowGraphCompiler* compiler) {
  const Register instance = locs()->in(0).reg();
  const Register result = locs()->out(0).reg();
  
  // Load field at offset
  __ LoadFieldFromOffset(result, instance, offset());
  
  // Emit null check if needed
  if (calls_initializer()) {
    compiler->GenerateCallWithDeopt(
        source(), deopt_id(),
        *StubCode::InitInstanceField_entry());
  }
}

Example 2: Array Element Access

void LoadIndexed::EmitNativeCode(FlowGraphCompiler* compiler) {
  const Register array = locs()->in(0).reg();
  const Register index = locs()->in(1).reg();
  const Register result = locs()->out(0).reg();
  
  // Calculate element address
  const intptr_t element_size = Instance::ElementSizeFor(cid);
  __ LoadElementAddressForRegIndex(
      result, array, index, element_size,
      data_offset());
  
  // Load element
  __ LoadFromOffset(result, result, 0);
}

Example 3: Static Call

void StaticCall::EmitNativeCode(FlowGraphCompiler* compiler) {
  // Setup arguments (already in correct locations)
  
  // Generate call
  compiler->GenerateStaticCall(
      deopt_id(),
      source(),
      function(),
      ArgumentCount(),
      locs());
  
  // Result already in RAX/R0 per calling convention
}

Deoptimization Metadata

Optimized code includes deoptimization points:

class CompilerDeoptInfo {
  Environment* env_;           // Deopt environment
  intptr_t deopt_id_;          // Unique deopt ID
  DeoptReasonId reason_;       // Why deopt occurred
};

Deoptimization Environment

Captures program state for deoptimization:

class Environment {
  GrowableArray<Value*> values_;  // Live values
  Environment* outer_;            // Outer scope
  intptr_t fixed_parameter_count_;
};

When deopt occurs:

Collect values from registers/stack per environment
Reconstruct unoptimized frame
Continue execution in unoptimized code

PC Descriptors

Map machine code addresses to source positions:

pc_descriptors_list_ = new DescriptorList(
    zone(), 
    &code_source_map_builder_->inline_id_to_function());

Descriptor types:

kDeopt: Deoptimization point
kIcCall: Instance call site
kUnoptStaticCall: Unoptimized static call
kReturn: Return instruction
kOther: Other significant points

Optimization Examples

Example 1: Smi Fast Path

// Dart code:
int add(int a, int b) => a + b;

// Generated code (x64):
// Fast path - assume Smis:
movq rax, rdi        // Load a
addq rax, rsi        // Add b
jo slow_path         // Jump if overflow
ret

slow_path:
  // Call runtime for boxed arithmetic
  call _add_runtime
  ret

Example 2: Bounds Check Elimination

// Dart code:
for (var i = 0; i < arr.length; i++) {
  sum += arr[i];
}

// With range analysis, bounds check eliminated:
for (var i = 0; i < arr.length; i++) {
  // No check - range analysis proved 0 <= i < length
  sum += arr[i];  
}

Example 3: Inlined Field Access

class Point {
  final double x;
  final double y;
}

double distance(Point p) => p.x * p.x + p.y * p.y;

// Generated code (no call overhead):
// movsd xmm0, [rdi + offset_x]  // Load x directly
// mulsd xmm0, xmm0               // x * x
// movsd xmm1, [rdi + offset_y]  // Load y directly  
// mulsd xmm1, xmm1               // y * y
// addsd xmm0, xmm1               // sum
// ret

Architecture-Specific Optimizations

SIMD Support

Vector operations for performance:

void SimdOp::EmitNativeCode(FlowGraphCompiler* compiler) {
  switch (kind()) {
    case SimdOpKind::kFloat32x4Add:
      __ addps(result, left, right);  // x64 SIMD add
      break;
    // ... other SIMD ops
  }
}

Branch Prediction Hints

Optimize for common paths:

// Likely path:
__ j(CONDITION, &target, compiler::Assembler::kNearJump);

// Unlikely (e.g., error path):
__ j(CONDITION, &target, compiler::Assembler::kFarJump);

Loop Alignment

Align hot loops for better performance:

if (FLAG_align_all_loops) {
  __ Align(32);  // 32-byte alignment for loop header
}

Code Statistics

Track generated code metrics:

CodeStatistics* stats = new CodeStatistics(
    assembler,
    flow_graph->function());

Collects:

Instruction counts per type
Code size breakdown
Optimization effectiveness

Debugging Generated Code

IL Printing

Print IL at various stages:

dart --print-flow-graph file.dart

Disassembly

View generated machine code:

dart --disassemble-optimized file.dart

Tracing

Trace compilation:

dart --trace-compiler file.dart

Performance Considerations

Instruction Selection

Use platform-specific instructions when available
Prefer register operations over memory
Minimize moves between register classes

Memory Access Patterns

Keep hot data in cache lines
Align frequently accessed data
Minimize pointer chasing

Call Overhead

Inline small functions aggressively
Use direct calls over indirect when possible
Specialize polymorphic calls

VM C API

Compiler Internals

Kernel Format

Documentation Index

​Code Generation Architecture

​FlowGraphCompiler

​Key Responsibilities

​Register Allocation

​Location Summary

​Instruction Emission

​Architecture-Specific Code Generation

​Example: Binary Smi Operation on x64

​Lowering Stages

​Stage 1: High-Level Lowering

​Stage 2: Post-Optimization Lowering

​Memory Model and Calling Conventions

​Stack Frame Layout

​Calling Convention

​Code Emission Examples

​Example 1: Loading a Field

​Example 2: Array Element Access

​Example 3: Static Call

​Deoptimization Metadata

​Deoptimization Environment

​PC Descriptors

​Optimization Examples

​Example 1: Smi Fast Path

​Example 2: Bounds Check Elimination

​Example 3: Inlined Field Access

​Architecture-Specific Optimizations

​SIMD Support

​Branch Prediction Hints

​Loop Alignment

​Code Statistics

​Debugging Generated Code

​IL Printing

​Disassembly

​Tracing

​Performance Considerations

​Instruction Selection

​Memory Access Patterns

​Call Overhead

​Further Reading