-
Notifications
You must be signed in to change notification settings - Fork 30
streaming byte-string Stack-based Assembly draft #6
Comments
just so you know what you are getting into, this is the psudocode for calculating pi in this language
|
This would sanely need to be 'compiled' by some sort of macro handler, just to deal with the skips forward and backward. (that could actually lead to (solvable) problems as skips forward may need to skip past a skip backward and the number of bytes in a skip integer would vary based on how far you need to skip.) |
more practically, a DAG-based data block would look like:
|
After some review. We should replace the cmp with jmp. and rather than use relative motion, we should use absolute byte location in the file. his way "jmp val" is essentially file.seek(val) this also escapes a bit of an issue where solving for the distance between a jmp command and a given tag could be deadlocked by having the length of 2 jumps depend on each other. In implementation I'd also store a reference to a literal value on the stack, rather than copying the literal on the stack. It would optimize for reading raw data files. |
@BrendanBenshoof this is great. 👍 👍 👍 👍 👍 |
So because I want a clear argument for the existence of this monstrosity for later generations: It still needs a name "StackStream Assembly" sounds cool. We have wandered into the abyss in creating a byte-code assembly without fixed word size or otherwise intended to operate an arbitrary length bytestrings. I can't find anybody even talking about trying it. It might not be worth it. The goals are:
Things that can be tweaked:
would be the shortest route to outputting a raw value.
I am going to make a repo, write all of this out better, fix typos and make how the opcodes use arguments more clear. |
streaming byte-string Stack-based Assembly
The goal of this language is to design a minimal stack assembly that is meant to work with byte string.
interestingly, using context based addressing for "modules" means that recursion is not possible using call
the atomic value in this language is a byte-string.
byte string as defined as:
[length][value]
length is a bit fiddly. If the length can be encoded in 1 byte, then it is
otherwise that byte is set to 255 (max) and the next 2 bytes describe the length
if this is insufficient length, those bits are set to max value and the next 2n bytes describe the length.
this means lengths are encoded in O(log(n)) bytes (the same as another other number), but potentially twice as many bytes as a normal number in computer science.
A length of 0 indicates all remaining bytes in a block/object
Better schemes for encoding arbitrary lengths are welcome.
memory management:
"contexts" are key-value memories stored in a stack
by default values are stored on the top context of the stack
keys that are read seek down the stack until a match is found
In general methods receive arguments and return results via the stack
basic statement:
[command code byte-string] [args]
basic command codes are:
command | code | args | results
output | 0 | byte-string | outputs byte-string
put | 1 | byte-string | puts byte string on top of stack
pop | 2 | none | discards the current top of stack
store | 4 | key | stores top of stack in memory[value]
get | 5 | key | puts value at key on top of stack
context | 6 | none | starts a new memory context stack
untext | 7 | none | discards the current top context
cmp | 8 | signed-integer | if stack value is not 0, skips the indicated number of bytes. negative skips allow loops
call | 9 | hash location | fetches and runs indicated block, perpending it to the "todo" buffer.
cmp should only work in context of the current block of code. You cannot skip back before this block began and can skip past the end.
This way only the source of the current block and your location in it is required to "rewind"
NEEDS byte-string operations (concat, lshift, rshift) and MATH (add, subtract). Everything else can be implemented in-language
this method of encoding opcodes allows for infinitely many, however it is suggested call be used to implement basic functions. The performance hit of retrieving them goes away very quickly as they should be cached. it might be worth chucking the variable length for opcodes entirely, but I like the future proofing it offers.
a "raw data" block would be:
output DATA
or
put DATA
The text was updated successfully, but these errors were encountered: