This thread will describe the development of the hg scripting language and interpreter for the CE.

Following on from here, I have now gotten most of the basic tasks of the interepreter working, with properly functioning functions coming up fast.



The memory is structured as follows:

The C API includes a function hg_Init(void *arena, size_t arena_size) that allocates the arena for the entire hg environment. This memory is then internally managed, and will not increase or decrease in size. So this is fed in with


Code:
void *arena = malloc(ARENA_SIZE);
hg_Init(arena, ARENA_SIZE);


and then at the end, this memory can be freed. Internally, it is managed in two structures: the heap, which is formatted as a singly-linked list starting at the start of the arena, and the stack, which grows downwards from the top of the arena. All objects (of type hg_expr_t) are contained in the heap, with the exception of the singletons hg_err and hg_nil, which are statically allocated (because otherwise they might be garbage collected, which would be a disaster). The stack contains bindings that are each simply two pointers: one to the name of the variable, and the other to its value.

The hg_expr_t is defined like so:


Code:
typedef struct hg_expr_t {
    union {
        uint24_t tag;
        struct hg_expr_t *ptr;
    } car;
    union {
        void *ptr;
        struct hg_expr_t *(*fn)(struct hg_expr_t *, struct hg_expr_t *);
        int24_t num;
        char chars[3];
    } cdr;
} hg_expr_t;


Tags are defined by bits 0-3. This is because on the CE, every writeable address that could really be part of the hg environment has a pointer that starts with $d0-$d6, and so therefore, if the pointer begins with something other than 1101 = $d, then we can simply restore it to a pointer by changing those first few bits; they store no necessary information. So a cons cell is any hg_expr_t where the car is a raw pointer; then we can just check if the tag starts with $d. Strings' tags start with $c, and their cars are also pointers: thus, string pointer derefences and storages are preceeded by flipping bit 3. But strings are stored in the heap as a series of string cells, each of which contain 3 chars in the cdr, and if none of those is a string-terminating null, the car will point to the next cell in the string.

Both of these solutions are inspired by the fe interpreter, but this platform-specific assumption in storing strings allows for a 50% increase in memory efficiency (and fe's assumption that all cells are aligned to four bytes, which it uses to unite the tags and pointers in the cars, is not really viable with six-byte-width CE cells, is not really viable here). So far, the binary is under 4 kB.

Next up: conditionals!

Git repo: https://tangled.org/@euphory.gay/hg
Okay, actually, conditionals are a problem because they are trinomial, so first, I restructured the list evaluation.

The evaluator is quite simple to describe: a string evaluates to its bound value (if any); a list checks what the first element is and, if it is a function, returns the result of running it on all the other elements, otherwise returning itself; and everything else evaluates to itself. The trick is the evaluation; previously, I had assumed that all functions were binomial (and more operands could be handled by just calling the binomial function multiple times), but this doesn't always work. While we can make monomial operations like CAR and CDR just by ignoring any further arguments, we can't really make trinomial if-condition-then-else statements.

The solution: more function pointers!

Because the code for iteratively applying a binomial operation is so handy, it has been factored out. Then defining a new binomial function is as simple as:


Code:
int24_t fb_add(int24_t a, int24_t b)
{
    return a + b;
}

hg_expr_t *f_add(hg_expr_t *tail)
{
    return fb_binom(tail, fb_add);
}


Currently, these are built on the assumption that iterative binomials like this always take numerical arguments; if this later ends up being false, it is trivial to change. But critically, monomial operations are now trivial to define:


Code:
hg_expr_t *f_car(hg_expr_t *tail)
{
    const hg_expr_t *arg = tail->car.ptr;
    if (TRIM_TAG(arg->car.tag) != HG_CONS) return &hg_err;

    return arg->car.ptr;
}


Now we can have everything work no matter how many arguments it has!

Design revision! There is no more stack.

It turns out that having a stack and a heap grow into each other is kind of annoying. Previously, stack frames were a six-byte header pointing to the header of the encompassing frame and the bottom of the current frame, and then a list of pairs (name, value) of bindings in that frame. However, as this frame grew downwards, it would replace freelist cells. Therefore, there is a freelist cell somewhere pointing to a cell which no longer exists and is now part of the stack. This is a pretty huge problem, because finding that out would require iterating through the entire freelist and checking if its pointer has ever been greater than the stack pointer. Plus, it might not be part of the freelist, but instead an allocated part of the heap storing bound data! This could be resolved by making the freelist doubly-linked, but that would take more memory, and still be far from ideal.

The solution is to put the stack on the heap. After all, a scope is really just a pair (previous scope, list of bindings), a binding can just be a cons pair (name, value), and I've already had to deal with ways of managing lists as recursive cons pairs! So a scope now looks like this:



At any given point in time, we store a pointer to the maximal cons pair in current_scope, and to save time and searching, the minimal cons pair (with &hg_nil in its cdr) in stack_top. Now searching for a binding is as easy as:


Code:
hg_expr_t *get_binding(hg_expr_t *name)
{
    hg_expr_t *branch = current_frame;

    while (TRIM_TAG(branch->car.tag) == HG_CONS)
    {
        hg_expr_t *leaf = next_in_tail(&branch);
        if (str_cell_cmp(leaf->car.ptr, name)) continue;
        return leaf;
    }

    return &hg_err;
}


From which point the getter and setter are simple:


Code:
hg_expr_t *get_value(hg_expr_t *name)
{
    hg_expr_t *binding = get_binding(name);
    if (binding == &hg_err) return binding;
    return binding->cdr.ptr;
}

hg_expr_t *set_value(hg_expr_t *name, hg_expr_t *value)
{
    hg_expr_t *binding = get_binding(name);
    if (binding == &hg_err) return binding;
    return binding->cdr.ptr = value;
}


Next up: functions! Probably. Current binary size: 5.9 kB
I have finally gotten lexically-scoped functions basically working, I think. Some strangeness seems to occur with different numbers of arguments, which I will work out soon.



↑ This screen fills me with so much joy.
Okay, so now it's all worked out. My issues were symptomatic of scope handling, but they're solved now. I've added the primitives DEFUN and LAMBDA (the former is really just a shortening, as "(DEFUN [name] [closure] [content])" is the same as "(LET [name] (LAMBDA [closure] [content]))"). But both work exactly as expected. MACRO is yet to be added, mostly because I just cannot wrap my head around how a macro actually works. (It has its own scope, but only for closure bindings? I'm not quite sure.)

Searching a scope for a binding can now optionally return bindings in an ancestor scope:


Code:
hg_expr_t *get_frame_binding(
        hg_expr_t *name,
        hg_expr_t *frame,
        uint8_t allow_encompassing
) {
    if (frame == NULL || TRIM_TAG(frame->car.tag) != HG_CONS) return &hg_err;

    hg_expr_t *branch = frame;
    while (TRIM_TAG(branch->car.tag) == HG_CONS)
    {
        hg_expr_t *leaf = next_in_tail(&branch);
        if (str_cell_cmp(leaf->car.ptr, name)) continue;
        return leaf;
    }

    return allow_encompassing ? get_frame_binding(name, frame->car.ptr, 1) : &hg_err;
}

#define get_local_binding(name) get_frame_binding(name, current_frame, 0)
#define get_binding(name) get_frame_binding(name, current_frame, 1)


So when adding bindings, only local bindings can block that, but getting values from existing bindings usually can go all the way back to the global scope.

Here are some demonstrations, starting with compositionality, which turned out to be much fiddlier than anticipated:



Next is a real mess of binding names, but everything works out as expected:



And last, the Fibonnaci function as a demonstration of recursion:



This last one is actually really pushing the limits of the 2 kB arena I've been allocating; I need to get the garbage collector functional ASAP. So next up is going to be sweep-and-check garbage collection, which hopefully, I can use CAR bit 4 for.
Alright, and there's garbage collection basically working. For testing purposes, this is a 2 KiB heap with a display of the number of free cells remaining in each REPL prompt.



As you can see, GC runs at least once mid-execution of "(POW 8 4)", and the definition of POW survives the GC cycle.

GC is mark-and-sweep with a stack for values that are in use. Basically, we have a global hg_expr_t *temp_stack that points to the top of the temp stack. We define ways of pushing to and popping from it:


Code:
#define gc_push_temp(item) temp_stack = cons_cell(item, temp_stack)
#define gc_pop_temp() temp_stack = temp_stack->cdr.ptr


Then we invoke these before and after evaluating any expression:


Code:
hg_expr_t *hg_Eval(hg_expr_t *expr)
{
    gc_push_temp(expr);

    const uint24_t tag_trimmed = TRIM_TAG(expr->car.tag);

    hg_expr_t *return_val;
    switch (tag_trimmed)
    {
        case HG_CONS:
            return_val = eval_list(expr);
            break;
        case HG_STR:
            return_val = get_value(expr);
            break;
        default:
            // everything else just returns itself
            return_val = expr;
            break;
    }

    gc_pop_temp();
    return return_val;
}


This may lead to duplicate values on the temp stack, but there are already lots of duplicate values on the heap, so... they say the enemy of progress is premature optimization.

Anyway, then we just recursively mark every item in the current frame (which includes every ancestor frame) and the temp stack, both of which end at hg_nil, which is exempt from both marking and sweeping. Then we sweep the heap.


Code:
void gc_sweep()
{
    for (hg_expr_t *ptr = heap; ptr < heap_end; ptr++)
    {
        if (MARKED(ptr->car.tag))
        {
            UNMARK(ptr->car.tag);
            continue;
        }

        if (ptr->car.tag == HG_FREE) continue;

        free_cell(ptr);
    }
}


The result seems to work pretty well. Marking is, as I had hoped, handled by CAR bit 4. I will keep testing this to make sure it doesn't cause any huge problems. Of course, with a 2 KiB heap, the real threat is running out of space for the callstack, so super-recursive tests like asking for the 28th Fibonnaci number or the like overflow the heap on that alone, which currently is unsafe. Oh, well. I guess fixing that and adding a few more prims are the next things to do.
Alright, that's basically all working! The current binary size is about 7 kB. I am now going to start incorporating it back into the project it was originally intended for, and any further development will be informed by that.
hell yeah, lisp my beloved 🔥🔥🔥🔥🔥🔥
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement