Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom iterator function support which enables implementing a REPL in jq #67

Closed
wants to merge 1 commit into from

Conversation

wader
Copy link
Contributor

@wader wader commented Feb 27, 2021

Hi! this is more of a feature request than a PR but I choose to use a PR to show some proof of concept code.

Background is that i'm working on tool based on gojq that has a interactive CLI REPL interface. The REPL used to be implemented in go with quite a lot of messy code to make it "feel" like jq. Some days I realized that much of this could probably be solved if the REPL itself was written in jq. After some thinking i realized that adding support for custom iterator functions would enable implementing eval as custom function.

With read and print as custom functions you can implement a REPL like this:

def repl:
	def _wrap: if (. | type) != "array" then [.] end;
	def _repl:
		try read("> ") as $e |
		(try (.[] | eval($e)) catch . | print),
		_repl;
	_wrap | _repl;

A bit more complicated than def repl: read | eval(.) | print, repl; repl to make it more user friendly.

Example usage showing basic expressions, nested REPL and errors:

$ go run cmd/repl/main.go
> 2+2
4
> 1 | repl
> .+2
3
> ^D
> [1,2,3] | repl
> .+10
11
12
13
> ^D
> 123
123
> undefined
function not defined: undefined/0
> [1,2,3] | repl
> undefined
function not defined: undefined/0
> ^D
> ^D
$

Some implementation notes:

  • The repl takes an array as input and will array wrap non-array values. This feels natural but maybe there are better solutions?
  • How to handle 1, 2, 3 | repl. The current behavior to give multiple REPLs feels ok i think.
  • Im unsure how to handle env.paths in *env.Next(). I guess it's related to assignment and paths? i haven't looked into how this could affect it.
  • The code to implement the iterator can probably be much clear and nicer

What do you think? could be useful?

@wader
Copy link
Contributor Author

wader commented Feb 27, 2021

Noticed now that tests are failing, ill have a look at it later. Looks like a env.pop() too much somehome?

execute.go Outdated
@@ -149,37 +149,90 @@ loop:
goto loop
}
case opcall:
if backtrack {
break loop
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot remove these 3 lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. Think the reason i moved it into only be done for non-iterator opcall case was to try mimic how opeach case does things but i probably got it wrong. Will have to think more about it. Also don't like that there is some duplication the way I did this now.

Copy link
Owner

@itchyny itchyny Feb 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right. I missed you moved it to L181.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But i'm guessing the unit test fail might be because i do a pop too much in some error case, haven't look closer at it yet.

@wader
Copy link
Contributor Author

wader commented Feb 28, 2021

Do you want me to try make this proof-of-concept-PR into something that could be mergeable? Now i just commented out some things i don't understand how it work yet and did some ugly code duplication to just see if i could make it work. Wanted to get your feedback if i'm on the correct track and if this feature would be something you want to have.

Maybe you want to work on implementing this yourself? i'm happy to help either way.

@itchyny
Copy link
Owner

itchyny commented Feb 28, 2021

First of all, thank you for various feedbacks, and you are the first one trying to dive deeply into the gojq interpreter. I don't decided yet but this kind of feature extension is likely to be declined. Pushing an iterator to the stack breaks various assumptions in the implementation. Also I am wondering how much this is useful in other usages than eval, it is worth accepting and maintaining this feature or not. This is open source, feel free to use your fork, and I can advice on implementation.

@wader
Copy link
Contributor Author

wader commented Feb 28, 2021

Yes very much understand that, and thanks of offering advice, will most likely need it.

Do you have any ideas how a implementation would look like that don't keep a iterator on the stack? or maybe it would be ok but some code assuming things will have to be changed?

Was thinking today what other uses than eval function iterators could have:

  • In my CLI tool i have a display function that is meant to display a value in human readable format (a format not meant for further trasformation), can be binary hexdump, show tree structures (not really JSON), summaries and previews and those return a empty iterator to terminate. Feels very natural when used in a REPL, eg: to see hexdump of first 10 packets you can do .packets[0:10] | hexdump (display is implicitly called in the REPL).
  • Maybe replace opeach and some builtin functions if it makes the interpreter code simpler or it make sense for CPU/memory performance reasons to implement in go.

For example could opeach in the compiler be compiled to a function call implemented something like this:

type sliceIter struct{ c []interface{} }

func (si *sliceIter) Next() (interface{}, bool) {
	if len(si.c) == 0 {
		return nil, false
	}
	e := si.c[0]
	if err, ok := e.(error); ok {
		return err, false
	}
	si.c = si.c[1:]
	return e, true
}

func each(c interface{}, a []interface{}) interface{} {
	switch c := c.(type) {
	case []interface{}:
		return &sliceIter{c}
	case map[string]interface{}:
		xs := make([][2]interface{}, len(c))
		var i int
		for k, v := range c {
			xs[i] = [2]interface{}{k, v}
			i++
		}
		sort.Slice(xs, func(i, j int) bool {
			return xs[i][0].(string) < xs[j][0].(string)
		})
		ss := make([]interface{}, len(c))
		for i, v := range xs {
			ss[i] = v[1]
		}
		return &sliceIter{ss}
	default:
		panic("unreachable") // TODO: ?
	}
}

Now i realized to make this work the function iterator interface probably can't use gojq.Inter as it will have to provide some kind of path information also? maybe it should actually look something like interface { Next() [2]interface{}}?

Again thanks for taking your time and I really enjoy working with the gojq code.

@wader
Copy link
Contributor Author

wader commented Mar 2, 2021

Hi again, i tried your suggestion to use a jq function to wrap an internal function to implement a generator. Seems to work fine, fascinatingly simple :)

I did these changes:

Renamed internal eval to _eval and modified these functions:

func (preludeLoader) LoadInitModules() ([]*gojq.Query, error) {
	replSrc := `
def eval($e): _eval($e)[];
def repl:
	def _wrap: if (. | type) != "array" then [.] end;
	def _repl:
		try read("> ") as $e |
		(try (.[] | eval($e)) catch . | print),
		_repl;
	_wrap | _repl;
`
	gq, err := gojq.Parse(replSrc)
	if err != nil {
		return nil, err
	}

	return []*gojq.Query{gq}, nil
}
func eval(c interface{}, a []interface{}) interface{} {
	src, ok := a[0].(string)
	if !ok {
		return fmt.Errorf("%v: src is not a string", a[0])
	}
	iter, err := replRun(c, src)
	if err != nil {
		return err
	}

	var vs []interface{}
	for {
		v, ok := iter.Next()
		if !ok {
			break
		}
		vs = append(vs, v)
		if _, ok := v.(error); ok {
			break
		}
	}

	return vs
}

Trying to come up with ways it might behave differently to returning an iterator. As long as the eval expression is "pure" i think it should not matter that we empty the iterator and then return all values instead of doing one at a time.

Is it correct to handle error just as any value but stop iterating?

@wader
Copy link
Contributor Author

wader commented Mar 3, 2021

One difference because the generator is not "lazy" when wrapped is that you can't use a internal function to generate a unknown number of values, for example a infinite range or to read inputs etc. Maybe a bit of a edge case but could be worth to document?

@itchyny
Copy link
Owner

itchyny commented Mar 3, 2021

The problem of handling Iter in the opcall instruction is, as the CI failure suggest, we cannot distinguish backtracking after an error and to an iterator. Maybe we need to distinguish the iterator function from others, and generate different opcode like opcalliter. But still backtracking issue can happen in some cases? Also, we need to make sure the debugger (which runs with make build-debug) can handle the stack containing an iterator.
The problem of collecting the values is, as you have pointed out, it is impossible to generate infinite values. Also it may less performant. But this is in general useful way of implementation of jq function accepting another filter as its argument; like min_by and _min_by.
From another way of viewpoint, input(s) function is a specific form of iterator functions. Currently we have WithInputIter to inject the iterator. The input(s) function is special that multiple calls share the same iterator but custom iterator functions may not; first(fibonacci), first(fibonacci) will both print the first value.

@wader
Copy link
Contributor Author

wader commented Mar 3, 2021

For my use case of eval your solution to wrap could be good enough i think, someone could try evaluate something that produces infinit or lots of values and run out of memory etc but that could be documented behaviour.

But I do feel the idea to support internal iterator function is quite compelling and might be useful, and as i mention maybe it can be used to implement some builtins things like inputs, each etc if it makes things easier to understand or for performance reasons.

Should i continue look into adding opcalliter etc as you mentioned or do you feel eager to do it? i guess there are lots of details that might be hard to account for, like the debugger as you mentioned. Either way let me know how i can help, very interesting things to work on!

@wader
Copy link
Contributor Author

wader commented Mar 9, 2021

Just noticed that debug behaves a bit confusingly with non-iterator version of eval

@wader
Copy link
Contributor Author

wader commented Mar 9, 2021

I noticed in #39 that you mentioned capture of variables. By that did you mean the ability to somehow get the environment from a evaluated expression? i've thought about somehow support something like this:

> 123 as $var | repl
> $var
123

Not sure how to accesses to the environment in a nice way but providing an a environment i think could be a second argument to eval.

@wader
Copy link
Contributor Author

wader commented Mar 9, 2021

BTW i added jq support to github linguist github-linguist/linguist#5233 so soon jq files should be properly highlighted (and also not detected as JSONiq) which also should add support for jq code blocks in markdown like this:

```jq
def f: 123;
```

@wader wader force-pushed the custom-iter branch 2 times, most recently from a505d05 to 7d92d9a Compare March 11, 2021 13:32
@wader
Copy link
Contributor Author

wader commented Mar 11, 2021

Hi again, got the idea to extend opeach to support Iter and separate between custom function/iterator and in then duration compilation if iterator add a opeach after opcall. Turned out quite good i think. What do you think?

Some questions and TODOs:

Not a good idea to separate function/iterator? could wrap all functions as one value iterators but might impact performance?

Should WithIterator callback use return type Iter. Can share core with WithFunction but maybe nice to be type safe.

Keep track of path? currently iterators give 0 as path for each value. Feel i don't really understand that part of the code yet.

Remove REPL or add to CLI?

Have a NewSliceIter helper for less confusion about reference or not

Useful to provide IterFn, EmptyIter and SliceIter?

Test for ValueError

Update debugger?

@@ -217,10 +217,17 @@ loop:
break loop
}
backtrack = false

var xc [2]interface{}
var xv interface{}
Copy link
Contributor Author

@wader wader Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was to refactor so that the pus/fork below can be shared. xc is current and xn is next value for opeach.

@fugkco
Copy link

fugkco commented Mar 23, 2021

Can I just voice my support of a repl feature! Great idea!

@wader
Copy link
Contributor Author

wader commented Apr 20, 2021

Noticed that debug needed some special treatment, got it working but code is a messy. Needs a refactor and tests

TODO:
NewSliceIter helper? less confusion about reference or not

Make execte.go debug and Iter code nicer

Not a good idea to separate function/iterator? could wrap
all functions as one value iterators but might impact performance?

Keep track of path? currently iterator give 0 as path

Remove repl or add to cli?
@wader
Copy link
Contributor Author

wader commented Apr 27, 2021

Added TestWithIteratorDebug test

@itchyny itchyny closed this in d2eb6bd May 16, 2021
@itchyny
Copy link
Owner

itchyny commented May 16, 2021

I'm sorry for keeping this pull request open for months. I wasn't sure the correctness of the patch, especially in execute.go, and whether or not accepting the iterator utilities and repl code.

I have included to the main branch. I really appreciate your work and committed with adding you as a co-author.

  • The idea of appending the opeach instruction is nice. Also adopted the idea of adding a new compiler option for iterator function, with naming WithIterFunction. They are special functions returning iterators.
  • I changed the type of the option argument to func(interface{}, []interface{}) gojq.Iter to force the user to return iterators in any case. This is necessary because the appended instruction will crash on an iterator function returning a non-iterator value. Also, an iterator can emit an error.
  • I added NewIter to create an iterator from values. It can be used instead of SliceIter and EmptyIter. Also useful to convert an error to an iterator. I liked IterFn but not much useful in real use cases.
  • I simplified the implementation in execute.go. Also fixed some error cases. Rejecting definition of iterator and non-iterator functions with the same name. Emitting path error when used at expression depth 0 (tracking the path is not necessary).
  • I don't include the repl cli, but confirmed the behavior with my implementation. Including a repl code in this repository will likely to add a new dependencies for better prompt.

@wader
Copy link
Contributor Author

wader commented May 16, 2021

I'm sorry for keeping this pull request open for months. I wasn't sure the correctness of the patch, especially in execute.go, and whether or not accepting the iterator utilities and repl code.

No worries

  • I changed the type of the option argument to func(interface{}, []interface{}) gojq.Iter to force the user to return iterators in any case. This is necessary because the appended instruction will crash on an iterator function returning a non-iterator value. Also, an iterator can emit an error.
  • I added NewIter to create an iterator from values. It can be used instead of SliceIter and EmptyIter. Also useful to convert an error to an iterator. I liked IterFn but not much useful in real use cases.

Both good ideas i think, API feels cleaner

  • I don't include the repl cli, but confirmed the behavior with my implementation. Including a repl code in this repository will likely to add a new dependencies for better prompt.

Understand, i mostly included it to show my use case.

I've been thinking how to handle debug with iterator functions. In my case this happens when using debug inside an (possibly multi nested) evaluated expression. Should the debug value be pass-thru to the outer most iterator or should it be up to iterators in the middle to filter them out so they don't end up being used as values?

In my version i went for the pass-thru but now i got second thought maybe it's better to do something like this:

func eval(c interface{}, a []interface{}) gojq.Iter {
	src, ok := a[0].(string)
	if !ok {
		return gojq.NewIter(fmt.Errorf("%v: src is not a string", a[0]))
	}
	iter, err := replRun(c, src)
	if err != nil {
		return gojq.NewIter(err)
	}

	return IterFn(func() (interface{}, bool) {
		for {
			v, ok := iter.Next()
			if _, ok := v.([2]interface{}); ok {
				// custom debug handling here
				continue
			}
			return v, ok
		}
	})
}

Again, thanks for taking the time to review and work on this

@itchyny
Copy link
Owner

itchyny commented May 16, 2021

Hmm, it seems that it was a mistake to implementing debug and stderr functions in the library and the result iterator returning [2]interface{}. They were implemented in early age of gojq, so I added special instruction opdebug and use [2]interface{} to distinguish from normal values. But currently we can use WithFunction to implement them. I'm considering removing them from the library completely in the next release.

@wader
Copy link
Contributor Author

wader commented May 17, 2021

Aha ok you mean use WithFunction and it just prints to stderr etc and pass-thru the value? so no need for gojq library to know about it at all? that seems nicer. Would we loose something by doing it? debug messages and Next() value order will be different?

@itchyny
Copy link
Owner

itchyny commented May 17, 2021

Right, library users can implement them with the option to choose how they print the value. Called order does not change.

@wader wader deleted the custom-iter branch August 7, 2021 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants