Intermediate Representation (IR)
Lowering
Section titled “Lowering”The process of transforming high-level source code to its equivalent intermediate representation is called lowering.
Lowering takes place over multiple translation phases, each of which resulting in IR code at a different stage:
- High-level source code is translated to raw IR.
- Raw IR is translated to legal IR.
- Legal IR is translated to canonical IR.
A program in raw IR may contain incomplete deinitialization and flow-sensitive type errors. A program in legal or canonical IR shall not violate any of the static semantic rules of the Hylo programming language.
Modules
Section titled “Modules”IR code is organized into modules. A module consists of an application programming interface (API), an application binary interface (ABI), a collection of lowered type definitions, a collection of lowered functions, and a collection of resources.
An API is a collection of type, function, subscript, and property signatures which document how to interact with Hylo code present in the module at source or IR level.
An ABI is a collection of type and function signatures which document how to interact with compiled Hylo code present in the module.
A resource is an arbitrary file that is part of a module and not processed as a source file.
A module is represented using a target-independent binary format called the hylomodule format. Module files should use .hylomoduleextension and application/octet-stream media type.
Lowered functions
Section titled “Lowered functions”A lowered function is a set of IR instructions packaged as a single unit of computation. A lowered function has a name that must be unique across all linked modules.
A lowered function has a signature describing the type and passing conventions of its inputs and outputs.
A lowered function may have a body defined as a set of one or more basic blocks, one of which is designated as the function entry. A lowered function that does not define a body is called a stub and denotes the API of a function defined in another module or the ABI of a function linked from a binary.
Lowered function signatures
Section titled “Lowered function signatures”Basic blocks
Section titled “Basic blocks”A basic block is a non-empty sequence of instructions. The first and last instructions of a basic block are called its entry and exit points, respectively. Control flow shall always enter at entry points and exit at exit points. The last instruction of a basic block shall be a terminator instruction.
A basic block may accept arguments.
Given two basic blocks bb1 and bb2, bb1 is successor of bb2 (or, equivalently, bb2 is predecessor of bb1) if the exit point of bb2 may cause control flow to jump to bb1.
A basic block is reachable if it is an entry point or a successor of a reachable block. All basic blocks of a function shall be reachable in legal or canonical IR.
Given two basic blocks bb1 and bb2:
bb1dominatesbb2if control flow must go throughbb1before it can reachbb2or ifbb1isbb2;bb1strictly dominatesbb2if it dominatesbb2and it is notbb2;bb1immediately dominatesbb2if it strictly dominatesbb2and does not strictly dominate any other basic block that strictly dominatesbb2.
Operands
Section titled “Operands”Instructions in Hylo IR may accept operands embodying values to be operated upon. There exists 5 types of operands:
- Type references: A type reference denotes a lowered type definition.
- Basic block references: A basic block reference denotes a basic block.
- Constants: A constant is a value that is computed at compile time and immutable at runtime.
- Basic block arguments: A basic block argument denotes the value passed to a basic block when control flow entered to its entry point.
- Instruction results: An instruction result denotes the one of the values produced by the evaluation of an instruction.
Each basic block argument and instruction result is assigned to a unique local register in the function. The value of a register
An instruction i is said to be a user of an operand o if o is argument of i. Each occurrence of an operand in one of its users is called a use.
Given two uses u1 and u2, u2 is sequenced after u1 (or, equivalently, u1 is sequenced before u2) if and only if:
- the user of
u2is sequenced after the user ofu1, or u1andu2have the same user andu1appears beforeu2.
Abstract machine
Section titled “Abstract machine”The semantics of Hylo IR is defined in terms of an abstract machine. A language implementation shall interpret or compile native programs that follow the same behavior.
The Hylo abstract machine consists of a collection of threads and a global memory space. A thread consists of a register map that associates local register names to objects or locations in the machine’s global memory, a control stack that contains instructions, and a call stack that contains register maps.
When a basic block is loaded into a thread, its instructions are pushed onto the stack in reverse order (i.e., the last instruction is pushed first).
Program execution
Section titled “Program execution”Program execution starts by adding a thread to the abstract machine, which gets designated as the main thread. The register map and call stack of the main thread are initialized empty. The control stack is initialized by loading the entry block of the main function of the program’s entry module.
Instruction reference
Section titled “Instruction reference”An instruction is a basic unit of computation in Hylo IR that modifies the state of the executing machine. An instruction may accept zero or more arguments and return zero or one value. An instruction has an associated type corresponding to the instruction’s result.
There are multiple categories of instructions:
- terminator instructions
- memory access and addressing instructions
- value state instructions
- control flow instructions
- instructions that are desugared
Terminator instructions
Section titled “Terminator instructions”An instruction is called a terminator instruction if it causes control flow to jump to another basic block or exit the current function.
return instruction
Section titled “return instruction”Syntax
Section titled “Syntax”returnOverview
Section titled “Overview”Returns control flow to the caller.
This instruction is only about control flow. Return values are stored in the return registers before control flow leaves the function.
Well-formedness
Section titled “Well-formedness”- There must be no other instructions after a
returninstruction in a basic block. - Every path through a function must terminate with a
returnorunreachableinstruction.
Semantics
Section titled “Semantics”When the return instruction is executed, the control flow returns back to the caller.
See also
Section titled “See also”This instruction is similar to LLVM’s ret, but it does not deal with the returned value. In Hylo IR, handling the return value happens prior to return instruction.
branchcond_branchswitchunion_switchunreachable
Memory access and addressing instructions
Section titled “Memory access and addressing instructions”alloca instruction
Section titled “alloca instruction”Syntax
Section titled “Syntax”<result> = alloca <storage-type>, #align(of: <alignment-source>)Overview
Section titled “Overview”Allocates memory on the stack frame of the currently executing function; this memory is automatically released when the function returns to the caller.
Arguments
Section titled “Arguments”storage-type is a Hylo type and represents the storage type that needs to be allocated on the stack. It may be any type with a statically known size (including {}).
t is a type for which we want to copy the alignment from.
Result
Section titled “Result”An address that can store storage-type values.
Well-formedness
Section titled “Well-formedness”- All
allocas need to be emitted in the entry block of the function. - The alignment of
tmust be compatible with the minimal alignment requirements for typestorage-type. - The result of the
allocainstruction must be in an uninitialized state when the current function returns to the caller.
Semantics
Section titled “Semantics”Memory is allocated and an address to it is returned. The allocated memory is uninitialized. If there is not enough space to grow the stack, the operation is undefined.
If the size to be allocated is zero bytes, the returned address may not be unique. Otherwise, the address is unique across all other non-zero allocations in the function.
The memory is automatically released when the function returns to the caller.
See also
Section titled “See also”This instruction is similar to LLVM’s alloca, but it does not provide all the options.
load instruction
Section titled “load instruction”Syntax
Section titled “Syntax”<result> = load <source>Overview
Section titled “Overview”Loads a value from memory to register.
Arguments
Section titled “Arguments”source represents the memory address from which to load.
Result
Section titled “Result”The dereferenced value of source.
Well-formedness
Section titled “Well-formedness”sourceneeds to be anaccessinstruction.- The operation requires exclusive access to
source. sourcemust be initialized- If
sourceis not a machine type, after this instruction,sourcebecomes uninitialized. - If
sourceis a machine type, after this instruction,sourceremains initialized. - The size of the value being loaded must be known at compile-time.
Semantics
Section titled “Semantics”The location of memory pointed to is loaded, and the resulting value is returned.
See also
Section titled “See also”This instruction is similar to LLVM’s load, but it does not provide all the options.
store instruction
Section titled “store instruction”Syntax
Section titled “Syntax”store <value> to <target>Overview
Section titled “Overview”Writes a value to memory.
Arguments
Section titled “Arguments”value represents the value that needs to be stored into memory. target represents the memory address in which we need to store the value.
Well-formedness
Section titled “Well-formedness”targetneeds to be anaccessinstruction.- The operation requires exclusive access to
target. - If
sourceis not a machine type,targetmust be uninitialized. - After this instruction,
targetbecomes initialized. valuecannot be used after the operation.- The type of
targetis an address of the type ofvalue
Semantics
Section titled “Semantics”The memory pointed to by target is updated to contain value.
See also
Section titled “See also”This instruction is similar to LLVM’s store, but it does not provide all the options.
subfield instruction
Section titled “subfield instruction”Syntax
Section titled “Syntax”<result> = subfield <base> at <path> as <type>Examples:
%i11: &ptr = subfield %i3 at [0, 0] as ptr%i12: &Int = subfield %i3 at [0, 1] as Int%i13: &ptr = subfield %b0#1 at [1] as ptrOverview
Section titled “Overview”Computes the address of storage for a field or sub-field of a record. The address calculation is done without accessing memory.
Arguments
Section titled “Arguments”base is an address of storage for the record from which we want to extract the field or sub-field.
path is a non-empty list of compile-time indices. If base is a pointer and we are performing a pointer dereference, the first index is always 0; if we are performing an array indexing, the first index can be greater than zero. Subsequent indices select fields of record types only (no pointer dereference).
The type argument is used to avoid calculating the resulting type, when the instruction is referenced. It must match with the type of the field or subfield obtained from base by applying the indices in path, but without being an address.
Result
Section titled “Result”The address of the subfield of base indicated by path.
Well-formedness
Section titled “Well-formedness”baseis an address.pathindicates a well-formed subfield withinbase.typematches the type of the subfield obtained throughpathfrombase, but without being an address.
Semantics
Section titled “Semantics”Given the pointer to base, this instruction computes the address needed to access the subfield of base obtained through path.
See also
Section titled “See also”This instruction is similar to LLVM’s getelementptr, but it does not provide all the options.
Value state instructions
Section titled “Value state instructions”access— Creates an access on a storage.end access— .AssumeState— Unsafely assumes the initialization state of memory storage.
Control flow instructions
Section titled “Control flow instructions”Apply— Invokes an IR functionApplyBuiltin— Invokes a function of theBuiltinmodule.
Instructions that are desugared
Section titled “Instructions that are desugared”Move— Relocates a value to different storages.Getter— Returns the getter of a property in an opaque record.SyntheticConformance— Returns the synthesized the conformance of a type to a core trait.