Skip to content

Intermediate Representation (IR)

The process of transforming high-level source code to its equivalent intermediate representation is called lowering.

Lowering takes place over multiple translation phases, each of which resulting in IR code at a different stage:

  1. High-level source code is translated to raw IR.
  2. Raw IR is translated to legal IR.
  3. Legal IR is translated to canonical IR.

A program in raw IR may contain incomplete deinitialization and flow-sensitive type errors. A program in legal or canonical IR shall not violate any of the static semantic rules of the Hylo programming language.

IR code is organized into modules. A module consists of an application programming interface (API), an application binary interface (ABI), a collection of lowered type definitions, a collection of lowered functions, and a collection of resources.

An API is a collection of type, function, subscript, and property signatures which document how to interact with Hylo code present in the module at source or IR level.

An ABI is a collection of type and function signatures which document how to interact with compiled Hylo code present in the module.

A resource is an arbitrary file that is part of a module and not processed as a source file.

A module is represented using a target-independent binary format called the hylomodule format. Module files should use .hylomoduleextension and application/octet-stream media type.

TBD: Describe the valmodule format.

A lowered function is a set of IR instructions packaged as a single unit of computation. A lowered function has a name that must be unique across all linked modules.

A lowered function has a signature describing the type and passing conventions of its inputs and outputs.

A lowered function may have a body defined as a set of one or more basic blocks, one of which is designated as the function entry. A lowered function that does not define a body is called a stub and denotes the API of a function defined in another module or the ABI of a function linked from a binary.

TBD

A basic block is a non-empty sequence of instructions. The first and last instructions of a basic block are called its entry and exit points, respectively. Control flow shall always enter at entry points and exit at exit points. The last instruction of a basic block shall be a terminator instruction.

A basic block may accept arguments.

Given two basic blocks bb1 and bb2, bb1 is successor of bb2 (or, equivalently, bb2 is predecessor of bb1) if the exit point of bb2 may cause control flow to jump to bb1.

A basic block is reachable if it is an entry point or a successor of a reachable block. All basic blocks of a function shall be reachable in legal or canonical IR.

Given two basic blocks bb1 and bb2:

  • bb1 dominates bb2 if control flow must go through bb1 before it can reach bb2 or if bb1 is bb2;
  • bb1 strictly dominates bb2 if it dominates bb2 and it is not bb2;
  • bb1 immediately dominates bb2 if it strictly dominates bb2 and does not strictly dominate any other basic block that strictly dominates bb2.

Instructions in Hylo IR may accept operands embodying values to be operated upon. There exists 5 types of operands:

  • Type references: A type reference denotes a lowered type definition.
  • Basic block references: A basic block reference denotes a basic block.
  • Constants: A constant is a value that is computed at compile time and immutable at runtime.
  • Basic block arguments: A basic block argument denotes the value passed to a basic block when control flow entered to its entry point.
  • Instruction results: An instruction result denotes the one of the values produced by the evaluation of an instruction.

Each basic block argument and instruction result is assigned to a unique local register in the function. The value of a register

An instruction i is said to be a user of an operand o if o is argument of i. Each occurrence of an operand in one of its users is called a use.

Given two uses u1 and u2, u2 is sequenced after u1 (or, equivalently, u1 is sequenced before u2) if and only if:

  • the user of u2 is sequenced after the user of u1, or
  • u1 and u2 have the same user and u1 appears before u2.

The semantics of Hylo IR is defined in terms of an abstract machine. A language implementation shall interpret or compile native programs that follow the same behavior.

The Hylo abstract machine consists of a collection of threads and a global memory space. A thread consists of a register map that associates local register names to objects or locations in the machine’s global memory, a control stack that contains instructions, and a call stack that contains register maps.

When a basic block is loaded into a thread, its instructions are pushed onto the stack in reverse order (i.e., the last instruction is pushed first).

Program execution starts by adding a thread to the abstract machine, which gets designated as the main thread. The register map and call stack of the main thread are initialized empty. The control stack is initialized by loading the entry block of the main function of the program’s entry module.

An instruction is a basic unit of computation in Hylo IR that modifies the state of the executing machine. An instruction may accept zero or more arguments and return zero or one value. An instruction has an associated type corresponding to the instruction’s result.

There are multiple categories of instructions:

An instruction is called a terminator instruction if it causes control flow to jump to another basic block or exit the current function.

return

Returns control flow to the caller.

This instruction is only about control flow. Return values are stored in the return registers before control flow leaves the function.

  • There must be no other instructions after a return instruction in a basic block.
  • Every path through a function must terminate with a return or unreachable instruction.

When the return instruction is executed, the control flow returns back to the caller.

This instruction is similar to LLVM’s ret, but it does not deal with the returned value. In Hylo IR, handling the return value happens prior to return instruction.

  • branch
  • cond_branch
  • switch
  • union_switch
  • unreachable
<result> = alloca <storage-type>, #align(of: <alignment-source>)

Allocates memory on the stack frame of the currently executing function; this memory is automatically released when the function returns to the caller.

storage-type is a Hylo type and represents the storage type that needs to be allocated on the stack. It may be any type with a statically known size (including {}).

t is a type for which we want to copy the alignment from.

An address that can store storage-type values.

  • All allocas need to be emitted in the entry block of the function.
  • The alignment of t must be compatible with the minimal alignment requirements for type storage-type.
  • The result of the alloca instruction must be in an uninitialized state when the current function returns to the caller.

Memory is allocated and an address to it is returned. The allocated memory is uninitialized. If there is not enough space to grow the stack, the operation is undefined.

If the size to be allocated is zero bytes, the returned address may not be unique. Otherwise, the address is unique across all other non-zero allocations in the function.

The memory is automatically released when the function returns to the caller.

This instruction is similar to LLVM’s alloca, but it does not provide all the options.

<result> = load <source>

Loads a value from memory to register.

source represents the memory address from which to load.

The dereferenced value of source.

  • source needs to be an access instruction.
  • The operation requires exclusive access to source.
  • source must be initialized
  • If source is not a machine type, after this instruction, source becomes uninitialized.
  • If source is a machine type, after this instruction, source remains initialized.
  • The size of the value being loaded must be known at compile-time.

The location of memory pointed to is loaded, and the resulting value is returned.

This instruction is similar to LLVM’s load, but it does not provide all the options.

store <value> to <target>

Writes a value to memory.

value represents the value that needs to be stored into memory. target represents the memory address in which we need to store the value.

  • target needs to be an access instruction.
  • The operation requires exclusive access to target.
  • If source is not a machine type, target must be uninitialized.
  • After this instruction, target becomes initialized.
  • value cannot be used after the operation.
  • The type of target is an address of the type of value

The memory pointed to by target is updated to contain value.

This instruction is similar to LLVM’s store, but it does not provide all the options.

<result> = subfield <base> at <path> as <type>

Examples:

%i11: &ptr = subfield %i3 at [0, 0] as ptr
%i12: &Int = subfield %i3 at [0, 1] as Int
%i13: &ptr = subfield %b0#1 at [1] as ptr

Computes the address of storage for a field or sub-field of a record. The address calculation is done without accessing memory.

base is an address of storage for the record from which we want to extract the field or sub-field.

path is a non-empty list of compile-time indices. If base is a pointer and we are performing a pointer dereference, the first index is always 0; if we are performing an array indexing, the first index can be greater than zero. Subsequent indices select fields of record types only (no pointer dereference).

The type argument is used to avoid calculating the resulting type, when the instruction is referenced. It must match with the type of the field or subfield obtained from base by applying the indices in path, but without being an address.

The address of the subfield of base indicated by path.

  • base is an address.
  • path indicates a well-formed subfield within base.
  • type matches the type of the subfield obtained through path from base, but without being an address.

Given the pointer to base, this instruction computes the address needed to access the subfield of base obtained through path.

This instruction is similar to LLVM’s getelementptr, but it does not provide all the options.

  • access — Creates an access on a storage.
  • end access — .
  • AssumeState — Unsafely assumes the initialization state of memory storage.
  • Apply — Invokes an IR function
  • ApplyBuiltin — Invokes a function of the Builtin module.
  • Move — Relocates a value to different storages.
  • Getter — Returns the getter of a property in an opaque record.
  • SyntheticConformance — Returns the synthesized the conformance of a type to a core trait.