Initial view of the language

Variables

  • Variables define scope,length and name of a symbol in code
  • Variables only live inside the bounds of code block they are defined in
  • Compiler enforced naming conventions, warns when not following. Snake case maybe?
  • Primitive types u8...u64, f32, f64, bool
  • Bitfields!
  • usize, isize, architecture specific
  • char (these are unicode scalar values, 32 bits), you can use u8 as old school C-char
  • Strings. I would really want to have UTF-8 supported from start.
  • If you have explicitly specified address to a variable, then it's handled as volatile, its likely memory mapped IO. These will always be loaded and stored during read/write.
  • Global variables are visible only inside the source file, unless exported, and imported in another source file.
  • I guess we want generic types?

Structs

  • Structs have always known size
  • You can put any type in to it!
  • used to construct complex types

Unions

  • Unions are structs, thats size is size of it's biggest element. All members point to same address.
  • You can also explicitly give size for the union! This way you can allocate some block from global memory or stack.
  • in memory, its like struct that has lenght and data
  • Value of the members is just pointed by the data pointer

Enums

  • Special user defined types with range known in compile time
  • useful for pattern matching
  • two different kinds of enums, typed or typeless
    • in typeless enums, its just appropriately sized unsigned integer, where name matches a integer value (called discriminant).
    • in typed enums, in memory it's an array of elements with given enum type. Name matches to some appropriately sized integer value.
    • so typeless enums are just the same but the array lenght is zero
  • can be used as type, not just as value.

Pointers

  • Pointers are special, they are more than the address.
  • Pointer is a struct that has following members
    • type (VariableType enum)
    • address, which has members
      • valid (boolean)
      • address (usize)
  • Pointers have type!
    • This tells compiler to warn if you are using pointer to assing reference to another type that it was previously assigned. No implicit casting
    • Makes the code easier to follow, no void* business like in c
    • It tells the type, and thus data lenght at runtime, so you can do ptr.size
  • Arrays are they are just struct with pointer to start of data and length in bytes
    • This is why we cannot index arrays. No array indexing, since they are just structs.
    • But it's using pointer, so we get the same result with something like array.get_element(nth=5). We can do that, since we know the variable type, so we can get the lenght.
  • Address of any variable can be extracted with & (reads "address of ...")
  • As usual addresses are always usize (so 64 bit in 64 bit machines etc)

Constants

  • constants are read-only variables that can be only set to global scope, meaning scope of the source file, unless exported and imported in another source file.

Registers

  • user needs to be able to clear registers on cpu startup.
  • registers are special kind of variables that allow things like setting a stack pointer.
  • User can use registers as they wish, but for compiler to know not use the same registers, they need to be reserved and abandoned.
    • during reserve, initial value will be stored somewhere.
    • during abandon, stored value will be read to the register.
  • these also provide way to do constant time operations.
  • these can be only used with assembly instructions, so arithmetics, load and store, branching/jumping and enviroment calls.
  • first thought about syntax :
    let i1:i32 = 1;
    let i2:i32 = 2;
    let output:i32;
    let x1 = reserve(r10);
    let x2 = reserve(r11);
    let x3 = reserve(r11);
    x1.load(i1);
    x1.load(i2);
    x1 = x2 + x3; // basic instructions would map to basic operators.
    x3.store(output);
    x1.abandon()
    x2.abandon()
    x3.abandon()

Sections

  • Any symbols, meaning global variables, constants or functions, can be explicitly put to any section.
  • These implicitly go to .bss, .data or .text sections.
  • I guess we follow Executable and Linkable Format

Arithmetics

  • debug builds should have run-time assertion checking of integer overflows and divided by zeros. These would just be warnings anyway.

Code blocks

  • it is nessesary to be able to split code in to blocks to define scope of variables.
  • proposed syntax: { let x; { let y; }//y no longer lives here }// x no longer lives here

Functions

  • if you have not called a function, its optimized out in the build.
  • if you have explicitly given address to a function it will not be optimized. It's likely an interrupt function.
  • functions with &self as first argument can be chained like variable.function1().function2;. Any varible with same type as the type of self argument can use these functions.
  • functions should use as many registers for their inputs as possible with the hardware. this should all be implicit
  • nameless functions. These are just regular functions that compiler has to give name to. They are nice to pass as function arguments.
  • closures. These are a bit weird but people use them quite a lot.
  • functions always go to global memory, no matter if they are named, nameless or closures.
  • function calling with argument name should be possible, like this array.get_element(nth=5, size=sizeof(u32)).
  • Not checking return value should result to warning

if's and else's

  • these are just conditional jumps in desquise.

Loops

  • no strong preferences here.

Iterators and ranges

  • Maybe

Pattern matching

  • Any primitive or enum type can be matched. Structs, therefore also arrays and unions are not possible to match.
  • Matching values need to defined compile time
  • Whole range of the variable needs to be covered
  • default can be set if preferred.
  • just creates a jump table (array of function pointers) size of the type, initializes everything to default function pointer, and sets specified cases to right indexes.
    • i would not suggest trying to match u64's, since that results you running out of memory, you end up with usize*2^64 array of function pointers.
  • enums preferred for these. Makes the code also more readable.
  • enums can only be matched with variant, not with value.

Error handling

Should be as errors-as-value. No try catches here.
I like rust-like option / error types

Memory management

  • This is not exactly a language problem. We can do whatever we want.
  • If someone want's garbage collector, then implement allocator with carbage collection.
  • We mostly want stack allocators for small thing and arena allocators for big things. We should implement what most of the people would be using.
  • C++ Smart pointers are great! We should have some mechanism for calling destructors when leaving scope.