Dada Formal Model

Documents the formal model. Aspires to be a tour of the codebase, a kind of "literate programming"-style document. But for now, mostly a place for me to add notes.

Formality core

The model is implemented in formality-core, the system defined for a-mir-formality. Snippets of the grammar are typically taken from src/grammar.rs.

Dada's design tenets

These are a living set of design tenets, extended as we resolve tradeoffs.

We believe that...

  • Dada's users will write high-scale, efficient programs, thus we ensure that Dada is expressive and performant. Dada is not for small scripts where performance doesn't matter. It's meant for efficient, real-world programs. We are willing to introduce implicit O(1) performance costs but nothing that scales with the size of a data structure.
  • Dada's users care most about their domain and less about machine details, and thus we ensure Dada is high-level. It should feel as much as possible like Java, JavaScript, Kotlin, or other high-level languages that largely hide machine details from their users. We assume a minal
  • Dada's users want to interoperate with other languages, and thus we ensure Dada has minimal runtime requirements. We are willing to require libc and a memory allocator, but we cannot assume that Dada owns the entire stack top-to-bottom. Dada must use a standard stack frame format and there may be frames on the stack from other languages that our runtime does not control.

Why this ordering / examples of tension?

  • Expressive and performant over high-level -- we distinguish boxed values from regular values, which is a case of showing machine details that we'd rather not show. But it's important because avoiding the need to allocate on every value is critical for performanmce. Also

Dada types and permissions

A Dada type has the following grammar

#![allow(unused)]
fn main() {
// Types:


// Permissions:

}

Notes:

Ty = Struct
   | Class
   | Var
   | Perm Ty

Perm =

canonical form

TyC = Perm (Struct | Class | Var)

PermC = shared(places)? leased(places)* (my | Var)

Classes

Declared as

#![allow(unused)]
fn main() {
#[term(class $name $binder)]
pub struct ClassDecl {
    pub name: ValueId,
    pub binder: Binder<ClassDeclBoundData>,
}

#[term($:where $,predicates { $*fields $*methods })]
pub struct ClassDeclBoundData {
    pub predicates: Vec<Predicate>,
    pub fields: Vec<FieldDecl>,
    pub methods: Vec<MethodDecl>,
}
}

Class members

Fields

A field Field...

#![allow(unused)]
fn main() {
#[term($?atomic $name : $ty ;)]
pub struct FieldDecl {
    pub atomic: Atomic,
    pub name: FieldId,
    pub ty: Ty,
}
}

...has three properties:

Structs and enums

Permissions

Atomicity

Declared as

#![allow(unused)]
fn main() {
#[term]
#[derive(Default)]
pub enum Atomic {
    #[default]
    #[grammar(nonatomic)]
    No,

    #[grammar(atomic)]
    Yes,
}
}

Types checking

Subtyping

Notes:

  • operate on canonical types
  • if subtype is shared, supertype must be shared
  • can drop leased from subtype, not supertype
  • must end with my/my or var/var

Operational semantics

For now, just some notes on how we expect Dada to be implemented.

  • Dada objects may be heap or stack allocated
    • Eventually we will require box class to force a layer of indirection.
  • All Dada objects will have a header with a reference count. Typically 1.
  • Leased values are a pointer.
  • Shared values are a copy of the object.

Notes

Misc notes and observations we have to place somewhere better eventually.

Objects versus values, pondering on identity

I have been debating for some time whether to divide classes from value types. I've come to the conclusion that we should.

When to use which one:

  • Classes are for values that have individual identity:
    • In some cases, like Vec, this identity arises from tracking resources (memory) that must be disposed of.
    • But it can also arise from domain reasons -- e.g., you might have a class Shape representing shapes in a drawing that were created by the user and which are hence distinct objects to them.

Classes...

  • Have permissions
  • Support mutation of individual fields
  • Permit atomic fields (if boxed)

Values...

  • Are always created atomically
  • Do not have their own permissions
  • Do not support mutation of individual fields or atomic fields

Boxing

Classes and values are typically represented "inline". This implies that...

  • They cannot be cyclic
  • They cannot have atomic fields
  • Data shared/leased from their fields is invalidated when they are moved

...but they can also be declared as boxed. If so, they are represented by a pointer, and we allocate them on the heap. In that case...

  • They can be cyclic
  • They can have atomic fields
  • Data shared/leased from their fields is preserved when they are moved

...definitely tempting to say "classes are always boxed, values are only boxed if they participate in a cycle" or something like that.

Memory layout

Both classes and boxes can be declared as boxed. A boxed value is represented by a pointer. Otherwise,

  • Owned: always a copy of the struct or (if boxed) pointer
  • Shared: as owned
  • Leased: always a pointer to some other location

One trick is the subtyping below. We do need to track whether this is a my/our value or a shared copy whose ref count is maintained elsewhere.

To track that:

  • For boxed values: set the
  • For inline values: overwrite the ref count field with

Owned as a subtype of shared, the need to drop shared

I am pondering the given() <: shared(_) relation. It's very powerful. It does imply that dropping (and copying!) a shared thing is never free, you have to at least check for the need to bump a reference count.

We want to allow shared() <: shared(a) too, so that implies that shared(a) has to check for reference count drops anyway.