Intermediate Representation (IR)

Lowering

The process of transforming high-level source code to its equivalent intermediate representation is called lowering.

Lowering takes place over multiple translation phases, each of which resulting in IR code at a different stage:

High-level source code is translated to raw IR.
Raw IR is translated to legal IR.
Legal IR is translated to canonical IR.

A program in raw IR may contain incomplete deinitialization and flow-sensitive type errors. A program in legal or canonical IR shall not violate any of the static semantic rules of the Hylo programming language.

Modules

IR code is organized into modules. A module consists of an application programming interface (API), an application binary interface (ABI), a collection of lowered type definitions, a collection of lowered functions, and a collection of resources.

An API is a collection of type, function, subscript, and property signatures which document how to interact with Hylo code present in the module at source or IR level.

An ABI is a collection of type and function signatures which document how to interact with compiled Hylo code present in the module.

A resource is an arbitrary file that is part of a module and not processed as a source file.

A module is represented using a target-independent binary format called the hylomodule format. Module files should use .hylomoduleextension and application/octet-stream media type.

TBD: Describe the valmodule format.

Lowered functions

A lowered function is a set of IR instructions packaged as a single unit of computation. A lowered function has a name that must be unique across all linked modules.

A lowered function has a signature describing the type and passing conventions of its inputs and outputs.

A lowered function may have a body defined as a set of one or more basic blocks, one of which is designated as the function entry. A lowered function that does not define a body is called a stub and denotes the API of a function defined in another module or the ABI of a function linked from a binary.

Lowered function signatures

TBD

Basic blocks

A basic block is a non-empty sequence of instructions. The first and last instructions of a basic block are called its entry and exit points, respectively. Control flow shall always enter at entry points and exit at exit points. The last instruction of a basic block shall be a terminator instruction.

A basic block may accept arguments.

Given two basic blocks bb1 and bb2, bb1 is successor of bb2 (or, equivalently, bb2 is predecessor of bb1) if the exit point of bb2 may cause control flow to jump to bb1.

A basic block is reachable if it is an entry point or a successor of a reachable block. All basic blocks of a function shall be reachable in legal or canonical IR.

Given two basic blocks bb1 and bb2:

bb1 dominates bb2 if control flow must go through bb1 before it can reach bb2 or if bb1 is bb2;
bb1 strictly dominates bb2 if it dominates bb2 and it is not bb2;
bb1 immediately dominates bb2 if it strictly dominates bb2 and does not strictly dominate any other basic block that strictly dominates bb2.

Operands

Instructions in Hylo IR may accept operands embodying values to be operated upon. There exists 5 types of operands:

Type references: A type reference denotes a lowered type definition.
Basic block references: A basic block reference denotes a basic block.
Constants: A constant is a value that is computed at compile time and immutable at runtime.
Basic block arguments: A basic block argument denotes the value passed to a basic block when control flow entered to its entry point.
Instruction results: An instruction result denotes the one of the values produced by the evaluation of an instruction.

Each basic block argument and instruction result is assigned to a unique local register in the function. The value of a register

Uses

An instruction i is said to be a user of an operand o if o is argument of i. Each occurrence of an operand in one of its users is called a use.

Given two uses u1 and u2, u2 is sequenced after u1 (or, equivalently, u1 is sequenced before u2) if and only if:

the user of u2 is sequenced after the user of u1, or
u1 and u2 have the same user and u1 appears before u2.

Abstract machine

The semantics of Hylo IR is defined in terms of an abstract machine. A language implementation shall interpret or compile native programs that follow the same behavior.

The Hylo abstract machine consists of a collection of threads and a global memory space. A thread consists of a register map that associates local register names to objects or locations in the machine’s global memory, a control stack that contains instructions, and a call stack that contains register maps.

When a basic block is loaded into a thread, its instructions are pushed onto the stack in reverse order (i.e., the last instruction is pushed first).

Program execution

Program execution starts by adding a thread to the abstract machine, which gets designated as the main thread. The register map and call stack of the main thread are initialized empty. The control stack is initialized by loading the entry block of the main function of the program’s entry module.

Instruction reference

An instruction is a basic unit of computation in Hylo IR that modifies the state of the executing machine. An instruction may accept zero or more arguments and return zero or one value. An instruction has an associated type corresponding to the instruction’s result.

There are multiple categories of instructions:

terminator instructions
memory access and addressing instructions
value state instructions
control flow instructions
instructions that are desugared

Terminator instructions

An instruction is called a terminator instruction if it causes control flow to jump to another basic block or exit the current function.

`return` instruction

Syntax

return

Overview

Returns control flow to the caller.

This instruction is only about control flow. Return values are stored in the return registers before control flow leaves the function.

Well-formedness

There must be no other instructions after a return instruction in a basic block.
Every path through a function must terminate with a return or unreachable instruction.

Semantics

When the return instruction is executed, the control flow returns back to the caller.

TODO

branch
cond_branch
switch
union_switch
unreachable

Memory access and addressing instructions

`alloca` instruction

Syntax

<result> = alloca <storage-type>, #align(of: <alignment-source>)

Overview

Allocates memory on the stack frame of the currently executing function; this memory is automatically released when the function returns to the caller.

Arguments

storage-type is a Hylo type and represents the storage type that needs to be allocated on the stack. It may be any type with a statically known size (including {}).

t is a type for which we want to copy the alignment from.

Result

An address that can store storage-type values.

Well-formedness

All allocas need to be emitted in the entry block of the function.
The alignment of t must be compatible with the minimal alignment requirements for type storage-type.
The result of the alloca instruction must be in an uninitialized state when the current function returns to the caller.

Semantics

Memory is allocated and an address to it is returned. The allocated memory is uninitialized. If there is not enough space to grow the stack, the operation is undefined.

If the size to be allocated is zero bytes, the returned address may not be unique. Otherwise, the address is unique across all other non-zero allocations in the function.

The memory is automatically released when the function returns to the caller.

`load` instruction

Syntax

<result> = load <source>

Overview

Loads a value from memory to register.

Arguments

source represents the memory address from which to load.

Result

The dereferenced value of source.

Well-formedness

source needs to be an access instruction.
The operation requires exclusive access to source.
source must be initialized
If source is not a machine type, after this instruction, source becomes uninitialized.
If source is a machine type, after this instruction, source remains initialized.
The size of the value being loaded must be known at compile-time.

Semantics

The location of memory pointed to is loaded, and the resulting value is returned.

`store` instruction

Syntax

store <value> to <target>

Overview

Writes a value to memory.

Arguments

value represents the value that needs to be stored into memory. target represents the memory address in which we need to store the value.

Well-formedness

target needs to be an access instruction.
The operation requires exclusive access to target.
If source is not a machine type, target must be uninitialized.
After this instruction, target becomes initialized.
value cannot be used after the operation.
The type of target is an address of the type of value

Semantics

The memory pointed to by target is updated to contain value.

`subfield` instruction

Syntax

<result> = subfield <base> at <path> as <type>

Examples:

%i11: &ptr = subfield %i3 at [0, 0] as ptr
%i12: &Int = subfield %i3 at [0, 1] as Int
%i13: &ptr = subfield %b0#1 at [1] as ptr

Overview

Computes the address of storage for a field or sub-field of a record. The address calculation is done without accessing memory.

Arguments

base is an address of storage for the record from which we want to extract the field or sub-field.

path is a non-empty list of compile-time indices. If base is a pointer and we are performing a pointer dereference, the first index is always 0; if we are performing an array indexing, the first index can be greater than zero. Subsequent indices select fields of record types only (no pointer dereference).

The type argument is used to avoid calculating the resulting type, when the instruction is referenced. It must match with the type of the field or subfield obtained from base by applying the indices in path, but without being an address.

Result

The address of the subfield of base indicated by path.

Well-formedness

base is an address.
path indicates a well-formed subfield within base.
type matches the type of the subfield obtained through path from base, but without being an address.

Semantics

Given the pointer to base, this instruction computes the address needed to access the subfield of base obtained through path.

Value state instructions

access — Creates an access on a storage.
end access — .
AssumeState — Unsafely assumes the initialization state of memory storage.

Control flow instructions

Apply — Invokes an IR function
ApplyBuiltin — Invokes a function of the Builtin module.

Instructions that are desugared

Move — Relocates a value to different storages.
Getter — Returns the getter of a property in an opaque record.
SyntheticConformance — Returns the synthesized the conformance of a type to a core trait.

Intermediate Representation (IR)

Lowering

Modules

Lowered functions

Lowered function signatures

Basic blocks

Operands

Uses

Abstract machine

Program execution

Instruction reference

Terminator instructions

return instruction

Syntax

Overview

Well-formedness

Semantics

See also

TODO

Memory access and addressing instructions

alloca instruction

Syntax

Overview

Arguments

Result

Well-formedness

Semantics

See also

load instruction

Syntax

Overview

Arguments

Result

Well-formedness

Semantics

See also

store instruction

Syntax

Overview

Arguments

Well-formedness

Semantics

See also

subfield instruction

Syntax

Overview

Arguments

Result

Well-formedness

Semantics

See also

Value state instructions

Control flow instructions

Instructions that are desugared

`return` instruction

`alloca` instruction

`load` instruction

`store` instruction

`subfield` instruction