streaming byte-string Stack-based Assembly draft #6

BrendanBenshoof · 2015-05-15T23:06:40Z

streaming byte-string Stack-based Assembly

The goal of this language is to design a minimal stack assembly that is meant to work with byte string.
interestingly, using context based addressing for "modules" means that recursion is not possible using call

the atomic value in this language is a byte-string.
byte string as defined as:
[length][value]
length is a bit fiddly. If the length can be encoded in 1 byte, then it is
otherwise that byte is set to 255 (max) and the next 2 bytes describe the length
if this is insufficient length, those bits are set to max value and the next 2n bytes describe the length.
this means lengths are encoded in O(log(n)) bytes (the same as another other number), but potentially twice as many bytes as a normal number in computer science.

A length of 0 indicates all remaining bytes in a block/object

Better schemes for encoding arbitrary lengths are welcome.

memory management:

"contexts" are key-value memories stored in a stack
by default values are stored on the top context of the stack
keys that are read seek down the stack until a match is found

In general methods receive arguments and return results via the stack

basic statement:
[command code byte-string] [args]

basic command codes are:

cmp should only work in context of the current block of code. You cannot skip back before this block began and can skip past the end.
This way only the source of the current block and your location in it is required to "rewind"

NEEDS byte-string operations (concat, lshift, rshift) and MATH (add, subtract). Everything else can be implemented in-language

this method of encoding opcodes allows for infinitely many, however it is suggested call be used to implement basic functions. The performance hit of retrieving them goes away very quickly as they should be cached. it might be worth chucking the variable length for opcodes entirely, but I like the future proofing it offers.

a "raw data" block would be:

output DATA
or
put DATA

BrendanBenshoof · 2015-05-15T23:22:51Z

just so you know what you are getting into, this is the psudocode for calculating pi in this language

#the following code returns pi after k iterations of a convergent sum: args integer k
context #start a new context so we don't break other code
store 'k' #store our argument to a variable
put 1.0
store 'result' #intialize the result

put 1
store 'i' #initalize our counter
get 'i'
get 'k'
#loopback reference point, like a label in assembly
lessthan #put zero on stack 0 if i > k
cmp '#END' #skip to end if done
put 1.0
put -2.0
get 'i'
put 2
mod #top of stack is now i%2
mul #top is now -2.0*(i%2)
add #top of stack is now -1 if i is odd, 1 otherwise
get 'i'
put 2.0
mul #top of stack is now 2i
put 1.0
sub #top of stack is now 2i+1
div #top of stack is now (-1)^i / (2i+1)
get 'result'
add #top of stack is now itermidate value for pi
store 'result'
get 'i'
put 1
add
store 'i'
put 0
cmp '#loopback' #go back to top of loop
get 'result'
put 4.0
mul
#stack now holds approximation for pi
untext #close our memory context so we don't pollute
#the code is done, and pi is on the top of the stack

BrendanBenshoof · 2015-05-15T23:31:28Z

This would sanely need to be 'compiled' by some sort of macro handler, just to deal with the skips forward and backward. (that could actually lead to (solvable) problems as skips forward may need to skip past a skip backward and the number of bytes in a skip integer would vary based on how far you need to skip.)

BrendanBenshoof · 2015-05-15T23:36:30Z

more practically, a DAG-based data block would look like:


put BLOCKID0
call
put BLOCKID1
call
put BLOCKID2
call
put RAWDATA
output

BrendanBenshoof · 2015-05-16T01:32:05Z

After some review. We should replace the cmp with jmp. and rather than use relative motion, we should use absolute byte location in the file.

his way "jmp val" is essentially file.seek(val)

this also escapes a bit of an issue where solving for the distance between a jmp command and a given tag could be deadlocked by having the length of 2 jumps depend on each other.

In implementation I'd also store a reference to a literal value on the stack, rather than copying the literal on the stack. It would optimize for reading raw data files.

jbenet · 2015-05-16T05:42:05Z

@BrendanBenshoof this is great. 👍 👍 👍 👍 👍

BrendanBenshoof · 2015-05-16T12:06:55Z

So because I want a clear argument for the existence of this monstrosity for later generations:

It still needs a name "StackStream Assembly" sounds cool.

We have wandered into the abyss in creating a byte-code assembly without fixed word size or otherwise intended to operate an arbitrary length bytestrings. I can't find anybody even talking about trying it. It might not be worth it.

The goals are:

Be turing complete
Do the original job of chopping a file into a DAG with minimal overhead
Be easy to manage in terms of memory usage

Things that can be tweaked:

current design allows for a infinite amount of opcodes, this is likely unnecessary
atomic types beyond byte-strings have not been discussed. There should be at least integers. Maybe a few more types.
For a lot of commands it is currently unclear if the argument should be provided or be on top of the stack. Essentially the only command that should take an in-source argument in code is the "put" command so initial values can be put on the stack.

put val
output

would be the shortest route to outputting a raw value.

it might be worth making a opcode that outputs it's in-source argument in code (rather than off the stack)
jmp is a kludge and is essentially a conditional goto (exactly like "[" and "]" in brainfuck). I'd prefer a more elegant solution but I also want to minimize the overhead of parsing.
In implementation I would have a "todo" stack that holds source code. normally when I made a function call I would simply push the source to the stack, execute it first, then resume normal code (all done by just consuming the stack). Because we are looking at using jmp (goto) then we need to keep a lot more context including "what file am I currently in" and "what line number am I reading".
arguments for jmp may be complicated in practice. They need to be long enough to describe locations in the file and they make the file longer as their byte-length increases. Start by using the smallest byte-length possible, then count the new number of bytes in the file, update the number of bytes needed and fix the errors. Iterate this until the pointers are correct.
Keep an eye on the above problem, it will seem initially moot but might make a compile-time and file-size performance hit as "blocks of bytecode" get bigger.

I am going to make a repo, write all of this out better, fix typos and make how the opcodes use arguments more clear.

BrendanBenshoof · 2015-05-16T13:16:34Z

https://github.com/BrendanBenshoof/StackStream is a thing

jbenet mentioned this issue Apr 18, 2016

Abstractions in IPLD: high level and low level data representations ipfs/specs#91

Open

jbenet added the Candidate Open Problem label Jul 9, 2018

daviddias mentioned this issue Sep 10, 2019

Open Problem: Improved layouts to represent data in hash-linked graphs (using IPLD) ipfs-inactive/research#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streaming byte-string Stack-based Assembly draft #6

streaming byte-string Stack-based Assembly draft #6

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 16, 2015

jbenet commented May 16, 2015

BrendanBenshoof commented May 16, 2015

BrendanBenshoof commented May 16, 2015

streaming byte-string Stack-based Assembly draft #6

streaming byte-string Stack-based Assembly draft #6

Comments

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 15, 2015

BrendanBenshoof commented May 16, 2015

jbenet commented May 16, 2015

BrendanBenshoof commented May 16, 2015

BrendanBenshoof commented May 16, 2015