Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_smt2_string() wastes time creating copies of symbols and functions #6123

Closed
kosarev opened this issue Jun 30, 2022 · 9 comments
Closed

Comments

@kosarev
Copy link

kosarev commented Jun 30, 2022

The definition of parse_smt2_string() reads like it calls _dict2sarray() for the passed symbols and functions, which causes significant delays on large dictionaries -- a huge problem when you have hundreds of expressions to parse. Would it possible to introduce a Dict2DArray kind of class that implements storing such arrays in a ready-to-use form and allow its instances along with ordinary dictionaries for the sorts and decls parameters so parse_smt2_string() doesn't need to translate them on every call?

@NikolajBjorner
Copy link
Contributor

sorry, this is very vague. To start with what is dict2sarray?
Is the comment based on running an instance in a profiler and observing where you see overhead?

@kosarev
Copy link
Author

kosarev commented Jul 1, 2022

I didn't use any profilers, but I think from just reading the definition of parse_smt2_string() it's pretty obvious what's going on.

Here's where we call _dict2sarray() and _dict2darray():

ssz, snames, ssorts = _dict2sarray(sorts, ctx)

And here's a reproducer that should hopefully demonstrate the problem:

$ python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import z3
>>> decls = {f'x{i}': z3.Bool(f'x{i}') for i in range(10000)}
>>> def foo():
...   z3.parse_smt2_string('(assert true)', decls=decls)
... 
>>> timeit.timeit("foo()", globals=globals(), number=5)
1.0481350090003616
>>> decls = {f'x{i}': z3.Bool(f'x{i}') for i in range(100)}
>>> timeit.timeit("foo()", globals=globals(), number=5)
0.017926508000073227
>>> 

@NikolajBjorner
Copy link
Contributor

NikolajBjorner commented Jul 1, 2022

Alright, this is about the python API (and not inside of z3) and the ask is to cache results.

Maybe use attributes, setattr, getattr on the dictionary objects to cache the result and retrieve it if the dictionary didn't change.

@kosarev
Copy link
Author

kosarev commented Jul 1, 2022

Caching translated symbols and functions unfortunately wouldn't help where we have to amend the dictionary. A better solution might be to introduce a couple classes that store symbols and functions in a form that can be passed to Z3_parse_smtlib2_string() without any additional translation.

@NikolajBjorner
Copy link
Contributor

In that case, the underlying API is not suitable.
It would be more suitable with an API that took a vector of sort and declarations.
The ast-vector object supports mutation.

NikolajBjorner added a commit that referenced this issue Jul 2, 2022
Adding new API object to maintain state between calls to parser.
The state is incremental: all declarations of sorts and functions are valid in the next parse. The parser produces an ast-vector of assertions that are parsed in the current calls.

The following is a unit test:

```
from z3 import *

pc = ParserContext()

A = DeclareSort('A')

pc.add_sort(A)
print(pc.from_string("(declare-const x A) (declare-const y A) (assert (= x y))"))
print(pc.from_string("(declare-const z A) (assert (= x z))"))

print(parse_smt2_string("(declare-const x Int) (declare-const y Int) (assert (= x y))"))

s = Solver()
s.from_string("(declare-sort A)")
s.from_string("(declare-const x A)")
s.from_string("(declare-const y A)")
s.from_string("(assert (= x y))")
print(s.assertions())
s.from_string("(declare-const z A)")
print(s.assertions())
s.from_string("(assert (= x z))")
print(s.assertions())
```

It produces results of the form

```
[x == y]
[x == z]
[x == y]
[x == y]
[x == y]
[x == y, x == z]
```
Thus, the set of assertions returned by a parse call is just the set of assertions added.
The solver maintains state between parser calls so that declarations made in a previous call are still available when declaring the constant 'z'.
The same holds for the parser_context_from_string function: function and sort declarations either added externally or declared using SMTLIB2 command line format as strings are valid for later calls.
@NikolajBjorner
Copy link
Contributor

I have added a separate mechanism for parsing incrementally.
It allows adding sort and function declarations (but not with alternative names, just the names used by the sorts and functions).
It handles the symbol table additions incrementally.
It should address use cases like yours (I don't really like it as it implies that you are parsing a string instead of using the programmatic API to build expressions, so your downstream infrastructure is probably "an area for improvement").

@kosarev
Copy link
Author

kosarev commented Jul 2, 2022

Will give a try, thanks!

Regarding using the API for building expressions, the aim is to serialise expressions and then load them back. I know there is z3.deserialize(), but it's painfully slow when the number of expressions to parse is large and then traversing expressions by means of using BoolRef.arg() is even slower. So I eventually ended up with z3.parse_smt2_string(f'(assert {expr})', decls=decls)[0] as the fastest approach discovered so far.

If I take it right that parse_smt2_string() is able to parse multiple assertions, that might help in some cases. But in other cases this is just not an option, either because you have to deal with individual expressions or because it's not practical to put all expressions into a single string.

@NikolajBjorner
Copy link
Contributor

you can parse multiple expressions by adding multiple assertions. Each assertion is an element in the vector that is returned.

@NikolajBjorner
Copy link
Contributor

Closing, as feature is there to support incremental parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants