Add custom iterator function support which enables implementing a REPL in jq #67

wader · 2021-02-27T11:21:29Z

Hi! this is more of a feature request than a PR but I choose to use a PR to show some proof of concept code.

Background is that i'm working on tool based on gojq that has a interactive CLI REPL interface. The REPL used to be implemented in go with quite a lot of messy code to make it "feel" like jq. Some days I realized that much of this could probably be solved if the REPL itself was written in jq. After some thinking i realized that adding support for custom iterator functions would enable implementing eval as custom function.

With read and print as custom functions you can implement a REPL like this:

def repl:
	def _wrap: if (. | type) != "array" then [.] end;
	def _repl:
		try read("> ") as $e |
		(try (.[] | eval($e)) catch . | print),
		_repl;
	_wrap | _repl;

A bit more complicated than def repl: read | eval(.) | print, repl; repl to make it more user friendly.

Example usage showing basic expressions, nested REPL and errors:

$ go run cmd/repl/main.go
> 2+2
4
> 1 | repl
> .+2
3
> ^D
> [1,2,3] | repl
> .+10
11
12
13
> ^D
> 123
123
> undefined
function not defined: undefined/0
> [1,2,3] | repl
> undefined
function not defined: undefined/0
> ^D
> ^D
$

Some implementation notes:

The repl takes an array as input and will array wrap non-array values. This feels natural but maybe there are better solutions?
How to handle 1, 2, 3 | repl. The current behavior to give multiple REPLs feels ok i think.
Im unsure how to handle env.paths in *env.Next(). I guess it's related to assignment and paths? i haven't looked into how this could affect it.
The code to implement the iterator can probably be much clear and nicer

What do you think? could be useful?

wader · 2021-02-27T11:28:37Z

Noticed now that tests are failing, ill have a look at it later. Looks like a env.pop() too much somehome?

itchyny · 2021-02-28T04:03:11Z

execute.go

@@ -149,37 +149,90 @@ loop:
 				goto loop
 			}
 		case opcall:
-			if backtrack {
-				break loop
-			}


You cannot remove these 3 lines.

Ok thanks. Think the reason i moved it into only be done for non-iterator opcall case was to try mimic how opeach case does things but i probably got it wrong. Will have to think more about it. Also don't like that there is some duplication the way I did this now.

Oops, you're right. I missed you moved it to L181.

But i'm guessing the unit test fail might be because i do a pop too much in some error case, haven't look closer at it yet.

wader · 2021-02-28T12:39:23Z

Do you want me to try make this proof-of-concept-PR into something that could be mergeable? Now i just commented out some things i don't understand how it work yet and did some ugly code duplication to just see if i could make it work. Wanted to get your feedback if i'm on the correct track and if this feature would be something you want to have.

Maybe you want to work on implementing this yourself? i'm happy to help either way.

itchyny · 2021-02-28T15:27:43Z

First of all, thank you for various feedbacks, and you are the first one trying to dive deeply into the gojq interpreter. I don't decided yet but this kind of feature extension is likely to be declined. Pushing an iterator to the stack breaks various assumptions in the implementation. Also I am wondering how much this is useful in other usages than eval, it is worth accepting and maintaining this feature or not. This is open source, feel free to use your fork, and I can advice on implementation.

wader · 2021-02-28T17:20:12Z

Yes very much understand that, and thanks of offering advice, will most likely need it.

Do you have any ideas how a implementation would look like that don't keep a iterator on the stack? or maybe it would be ok but some code assuming things will have to be changed?

Was thinking today what other uses than eval function iterators could have:

In my CLI tool i have a display function that is meant to display a value in human readable format (a format not meant for further trasformation), can be binary hexdump, show tree structures (not really JSON), summaries and previews and those return a empty iterator to terminate. Feels very natural when used in a REPL, eg: to see hexdump of first 10 packets you can do .packets[0:10] | hexdump (display is implicitly called in the REPL).
Maybe replace opeach and some builtin functions if it makes the interpreter code simpler or it make sense for CPU/memory performance reasons to implement in go.

For example could opeach in the compiler be compiled to a function call implemented something like this:

type sliceIter struct{ c []interface{} }

func (si *sliceIter) Next() (interface{}, bool) {
	if len(si.c) == 0 {
		return nil, false
	}
	e := si.c[0]
	if err, ok := e.(error); ok {
		return err, false
	}
	si.c = si.c[1:]
	return e, true
}

func each(c interface{}, a []interface{}) interface{} {
	switch c := c.(type) {
	case []interface{}:
		return &sliceIter{c}
	case map[string]interface{}:
		xs := make([][2]interface{}, len(c))
		var i int
		for k, v := range c {
			xs[i] = [2]interface{}{k, v}
			i++
		}
		sort.Slice(xs, func(i, j int) bool {
			return xs[i][0].(string) < xs[j][0].(string)
		})
		ss := make([]interface{}, len(c))
		for i, v := range xs {
			ss[i] = v[1]
		}
		return &sliceIter{ss}
	default:
		panic("unreachable") // TODO: ?
	}
}

Now i realized to make this work the function iterator interface probably can't use gojq.Inter as it will have to provide some kind of path information also? maybe it should actually look something like interface { Next() [2]interface{}}?

Again thanks for taking your time and I really enjoy working with the gojq code.

wader · 2021-03-02T17:34:13Z

Hi again, i tried your suggestion to use a jq function to wrap an internal function to implement a generator. Seems to work fine, fascinatingly simple :)

I did these changes:

Renamed internal eval to _eval and modified these functions:

func (preludeLoader) LoadInitModules() ([]*gojq.Query, error) {
	replSrc := `
def eval($e): _eval($e)[];
def repl:
	def _wrap: if (. | type) != "array" then [.] end;
	def _repl:
		try read("> ") as $e |
		(try (.[] | eval($e)) catch . | print),
		_repl;
	_wrap | _repl;
`
	gq, err := gojq.Parse(replSrc)
	if err != nil {
		return nil, err
	}

	return []*gojq.Query{gq}, nil
}

func eval(c interface{}, a []interface{}) interface{} {
	src, ok := a[0].(string)
	if !ok {
		return fmt.Errorf("%v: src is not a string", a[0])
	}
	iter, err := replRun(c, src)
	if err != nil {
		return err
	}

	var vs []interface{}
	for {
		v, ok := iter.Next()
		if !ok {
			break
		}
		vs = append(vs, v)
		if _, ok := v.(error); ok {
			break
		}
	}

	return vs
}

Trying to come up with ways it might behave differently to returning an iterator. As long as the eval expression is "pure" i think it should not matter that we empty the iterator and then return all values instead of doing one at a time.

Is it correct to handle error just as any value but stop iterating?

wader · 2021-03-03T10:17:01Z

One difference because the generator is not "lazy" when wrapped is that you can't use a internal function to generate a unknown number of values, for example a infinite range or to read inputs etc. Maybe a bit of a edge case but could be worth to document?

itchyny · 2021-03-03T12:32:22Z

The problem of handling Iter in the opcall instruction is, as the CI failure suggest, we cannot distinguish backtracking after an error and to an iterator. Maybe we need to distinguish the iterator function from others, and generate different opcode like opcalliter. But still backtracking issue can happen in some cases? Also, we need to make sure the debugger (which runs with make build-debug) can handle the stack containing an iterator.
The problem of collecting the values is, as you have pointed out, it is impossible to generate infinite values. Also it may less performant. But this is in general useful way of implementation of jq function accepting another filter as its argument; like min_by and _min_by.
From another way of viewpoint, input(s) function is a specific form of iterator functions. Currently we have WithInputIter to inject the iterator. The input(s) function is special that multiple calls share the same iterator but custom iterator functions may not; first(fibonacci), first(fibonacci) will both print the first value.

wader · 2021-03-03T12:52:09Z

For my use case of eval your solution to wrap could be good enough i think, someone could try evaluate something that produces infinit or lots of values and run out of memory etc but that could be documented behaviour.

But I do feel the idea to support internal iterator function is quite compelling and might be useful, and as i mention maybe it can be used to implement some builtins things like inputs, each etc if it makes things easier to understand or for performance reasons.

Should i continue look into adding opcalliter etc as you mentioned or do you feel eager to do it? i guess there are lots of details that might be hard to account for, like the debugger as you mentioned. Either way let me know how i can help, very interesting things to work on!

wader · 2021-03-09T16:43:16Z

Just noticed that debug behaves a bit confusingly with non-iterator version of eval

wader · 2021-03-09T16:58:30Z

I noticed in #39 that you mentioned capture of variables. By that did you mean the ability to somehow get the environment from a evaluated expression? i've thought about somehow support something like this:

> 123 as $var | repl
> $var
123

Not sure how to accesses to the environment in a nice way but providing an a environment i think could be a second argument to eval.

wader · 2021-03-09T16:58:39Z

BTW i added jq support to github linguist github-linguist/linguist#5233 so soon jq files should be properly highlighted (and also not detected as JSONiq) which also should add support for jq code blocks in markdown like this:

```jq
def f: 123;
```

wader · 2021-03-11T13:42:10Z

Hi again, got the idea to extend opeach to support Iter and separate between custom function/iterator and in then duration compilation if iterator add a opeach after opcall. Turned out quite good i think. What do you think?

Some questions and TODOs:

Not a good idea to separate function/iterator? could wrap all functions as one value iterators but might impact performance?

Should WithIterator callback use return type Iter. Can share core with WithFunction but maybe nice to be type safe.

Keep track of path? currently iterators give 0 as path for each value. Feel i don't really understand that part of the code yet.

Remove REPL or add to CLI?

Have a NewSliceIter helper for less confusion about reference or not

Useful to provide IterFn, EmptyIter and SliceIter?

Test for ValueError

Update debugger?

wader · 2021-03-11T14:01:57Z

execute.go

@@ -217,10 +217,17 @@ loop:
 				break loop
 			}
 			backtrack = false
+
+			var xc [2]interface{}
+			var xv interface{}


Idea was to refactor so that the pus/fork below can be shared. xc is current and xn is next value for opeach.

fugkco · 2021-03-23T21:11:14Z

Can I just voice my support of a repl feature! Great idea!

wader · 2021-04-20T15:01:57Z

Noticed that debug needed some special treatment, got it working but code is a messy. Needs a refactor and tests

TODO: NewSliceIter helper? less confusion about reference or not Make execte.go debug and Iter code nicer Not a good idea to separate function/iterator? could wrap all functions as one value iterators but might impact performance? Keep track of path? currently iterator give 0 as path Remove repl or add to cli?

wader · 2021-04-27T11:35:25Z

Added TestWithIteratorDebug test

itchyny · 2021-05-16T02:06:58Z

I'm sorry for keeping this pull request open for months. I wasn't sure the correctness of the patch, especially in execute.go, and whether or not accepting the iterator utilities and repl code.

I have included to the main branch. I really appreciate your work and committed with adding you as a co-author.

The idea of appending the opeach instruction is nice. Also adopted the idea of adding a new compiler option for iterator function, with naming WithIterFunction. They are special functions returning iterators.
I changed the type of the option argument to func(interface{}, []interface{}) gojq.Iter to force the user to return iterators in any case. This is necessary because the appended instruction will crash on an iterator function returning a non-iterator value. Also, an iterator can emit an error.
I added NewIter to create an iterator from values. It can be used instead of SliceIter and EmptyIter. Also useful to convert an error to an iterator. I liked IterFn but not much useful in real use cases.
I simplified the implementation in execute.go. Also fixed some error cases. Rejecting definition of iterator and non-iterator functions with the same name. Emitting path error when used at expression depth 0 (tracking the path is not necessary).
I don't include the repl cli, but confirmed the behavior with my implementation. Including a repl code in this repository will likely to add a new dependencies for better prompt.

wader · 2021-05-16T09:07:47Z

I'm sorry for keeping this pull request open for months. I wasn't sure the correctness of the patch, especially in execute.go, and whether or not accepting the iterator utilities and repl code.

No worries

I changed the type of the option argument to func(interface{}, []interface{}) gojq.Iter to force the user to return iterators in any case. This is necessary because the appended instruction will crash on an iterator function returning a non-iterator value. Also, an iterator can emit an error.

I added NewIter to create an iterator from values. It can be used instead of SliceIter and EmptyIter. Also useful to convert an error to an iterator. I liked IterFn but not much useful in real use cases.

Both good ideas i think, API feels cleaner

I don't include the repl cli, but confirmed the behavior with my implementation. Including a repl code in this repository will likely to add a new dependencies for better prompt.

Understand, i mostly included it to show my use case.

I've been thinking how to handle debug with iterator functions. In my case this happens when using debug inside an (possibly multi nested) evaluated expression. Should the debug value be pass-thru to the outer most iterator or should it be up to iterators in the middle to filter them out so they don't end up being used as values?

In my version i went for the pass-thru but now i got second thought maybe it's better to do something like this:

func eval(c interface{}, a []interface{}) gojq.Iter {
	src, ok := a[0].(string)
	if !ok {
		return gojq.NewIter(fmt.Errorf("%v: src is not a string", a[0]))
	}
	iter, err := replRun(c, src)
	if err != nil {
		return gojq.NewIter(err)
	}

	return IterFn(func() (interface{}, bool) {
		for {
			v, ok := iter.Next()
			if _, ok := v.([2]interface{}); ok {
				// custom debug handling here
				continue
			}
			return v, ok
		}
	})
}

Again, thanks for taking the time to review and work on this

itchyny · 2021-05-16T19:23:10Z

Hmm, it seems that it was a mistake to implementing debug and stderr functions in the library and the result iterator returning [2]interface{}. They were implemented in early age of gojq, so I added special instruction opdebug and use [2]interface{} to distinguish from normal values. But currently we can use WithFunction to implement them. I'm considering removing them from the library completely in the next release.

wader · 2021-05-17T11:18:08Z

Aha ok you mean use WithFunction and it just prints to stderr etc and pass-thru the value? so no need for gojq library to know about it at all? that seems nicer. Would we loose something by doing it? debug messages and Next() value order will be different?

itchyny · 2021-05-17T13:33:14Z

Right, library users can implement them with the option to choose how they print the value. Called order does not change.

itchyny reviewed Feb 28, 2021

View reviewed changes

itchyny force-pushed the main branch from 027f9d5 to 99c362a Compare March 1, 2021 12:58

wader force-pushed the custom-iter branch 2 times, most recently from a505d05 to 7d92d9a Compare March 11, 2021 13:32

wader force-pushed the custom-iter branch from 7d92d9a to 086226e Compare March 11, 2021 13:56

wader commented Mar 11, 2021

View reviewed changes

wader force-pushed the custom-iter branch from 086226e to 89165b0 Compare March 11, 2021 22:36

itchyny force-pushed the main branch from e426d45 to 269e11d Compare March 13, 2021 13:05

fugkco mentioned this pull request Mar 23, 2021

Feature request: create a cui #74

Closed

wader force-pushed the custom-iter branch from 89165b0 to df2f5e4 Compare April 20, 2021 14:54

wader force-pushed the custom-iter branch from df2f5e4 to 9d5c3f4 Compare April 27, 2021 11:34

itchyny closed this in d2eb6bd May 16, 2021

wader deleted the custom-iter branch August 7, 2021 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom iterator function support which enables implementing a REPL in jq #67

Add custom iterator function support which enables implementing a REPL in jq #67

wader commented Feb 27, 2021 •

edited

Loading

wader commented Feb 27, 2021 •

edited

Loading

itchyny Feb 28, 2021

wader Feb 28, 2021

itchyny Feb 28, 2021 •

edited

Loading

wader Feb 28, 2021

wader commented Feb 28, 2021

itchyny commented Feb 28, 2021

wader commented Feb 28, 2021 •

edited

Loading

wader commented Mar 2, 2021 •

edited

Loading

wader commented Mar 3, 2021 •

edited

Loading

itchyny commented Mar 3, 2021

wader commented Mar 3, 2021 •

edited

Loading

wader commented Mar 9, 2021

wader commented Mar 9, 2021

wader commented Mar 9, 2021

wader commented Mar 11, 2021 •

edited

Loading

wader Mar 11, 2021 •

edited

Loading

fugkco commented Mar 23, 2021 •

edited

Loading

wader commented Apr 20, 2021

wader commented Apr 27, 2021

itchyny commented May 16, 2021

wader commented May 16, 2021

itchyny commented May 16, 2021

wader commented May 17, 2021 •

edited

Loading

itchyny commented May 17, 2021

Add custom iterator function support which enables implementing a REPL in jq #67

Add custom iterator function support which enables implementing a REPL in jq #67

Conversation

wader commented Feb 27, 2021 • edited Loading

wader commented Feb 27, 2021 • edited Loading

itchyny Feb 28, 2021

Choose a reason for hiding this comment

wader Feb 28, 2021

Choose a reason for hiding this comment

itchyny Feb 28, 2021 • edited Loading

Choose a reason for hiding this comment

wader Feb 28, 2021

Choose a reason for hiding this comment

wader commented Feb 28, 2021

itchyny commented Feb 28, 2021

wader commented Feb 28, 2021 • edited Loading

wader commented Mar 2, 2021 • edited Loading

wader commented Mar 3, 2021 • edited Loading

itchyny commented Mar 3, 2021

wader commented Mar 3, 2021 • edited Loading

wader commented Mar 9, 2021

wader commented Mar 9, 2021

wader commented Mar 9, 2021

wader commented Mar 11, 2021 • edited Loading

wader Mar 11, 2021 • edited Loading

Choose a reason for hiding this comment

fugkco commented Mar 23, 2021 • edited Loading

wader commented Apr 20, 2021

wader commented Apr 27, 2021

itchyny commented May 16, 2021

wader commented May 16, 2021

itchyny commented May 16, 2021

wader commented May 17, 2021 • edited Loading

itchyny commented May 17, 2021

wader commented Feb 27, 2021 •

edited

Loading

wader commented Feb 27, 2021 •

edited

Loading

itchyny Feb 28, 2021 •

edited

Loading

wader commented Feb 28, 2021 •

edited

Loading

wader commented Mar 2, 2021 •

edited

Loading

wader commented Mar 3, 2021 •

edited

Loading

wader commented Mar 3, 2021 •

edited

Loading

wader commented Mar 11, 2021 •

edited

Loading

wader Mar 11, 2021 •

edited

Loading

fugkco commented Mar 23, 2021 •

edited

Loading

wader commented May 17, 2021 •

edited

Loading