-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manual control of simplification and evaluation #327
Comments
No, the data structures are different in SymbolicUtils.jl. The no-evaluation form is the simplified form in this case, by design, and that's why it's fast. There is no call to simplify required to make that stuff work, and it would be slower to hold it as a tree instead of 3. So this seems to be just a major misunderstanding of the design. This is a similar design to symengine. |
That automatic simplification is exactly the design that I'm concerned about. Quoting another CAS developer in that thread
Instead of automatically canonicalizing |
It's not canonicalized to either. |
julia> using SymbolicUtils # v0.13.3
julia> @syms a
(a,)
julia> a + 1
1 + a Currently |
No it does not. That's just the printing. Look at the data structure. It is neither direction because it's not ordered. |
julia> a + 1
1 + a
help?> SymbolicUtils.Add
Add(T, coeff, dict::Dict)
Represents coeff + (key1 * val1) + (key2 * val2) + ...
where keys and values come from the dictionary (dict). where coeff and the vals are <:Number and keys are symbolic.
• operation(::Add) – returns +.
• symtype(::Add) – returns T.
• arguments(::Add) – returns a totally ordered vector of arguments. i.e. [coeff, keyM*valM, keyN*valN...]
The canonicalization of The symbolic semantics I'm describing are useful for
CCing some other people who may have thoughts @0x0f0f0f @fredrik-johansson @JacquesCarette @oscarbenjamin. It might be helpful to have examples of the performance benefit if somebody knows a good one off hand. |
There is an obvious tension between taking expressions as-is and having efficient datastructures which can make some assumptions. For variables of numeric type (by default in The benefit of the multiple orders of magnitude of speed gained by this cannot be understated -- it makes new things possible. It seems like there is room for a mode where you can treat expressions as is. We easily add a macro for this. @macroexpand @symquote a + 1
:(term(+, a, 1)) |
This is fixed now (will be releasing), if you pass in |
You can check the old PRs on this which documented the performance. It was a good 1000x speedup or so on the examples we had on-hand, and similar speedups were seen on biopharma real-world examples. At this point it's pretty much the other way around: there is so much evidence about how much of an acceleration that it gives in real examples that we would need to see some very compelling evidence to change to another form. I was actually pretty reluctant about it at first, but the results don't lie. So try something on say the BCR reaction from ReactionNetworkImporters, or on the robot mass matrix example, and see if you can find another form. |
Do you have any concrete examples of when canonicalization is bad? It's not like change |
You can also try |
@shashi Thanks for recognizing the representation tension.
On master (efaa595) right now it returns 3: julia> substitute(a+b, Dict(a=>1,b=>2), fold=false)
3 This function is more like an
julia> @syms a::Any b::Any
(a, b)
julia> a + b
ERROR: MethodError: no method matching +(::SymbolicUtils.Sym{Any, Nothing}, ::SymbolicUtils.Sym{Any, Nothing}) That's not what I have in mind: the result I want is just The
I'll leave that to the CAS experts I mentioned. |
For some applications you really want to have control over how an expression is manipulated. Symbolic manipulation has applications that are not just about computing things and in which fine control is needed. For example you might use a CAS to run through the steps in a complicated derivation that is used in a mathematical text. Then if your expressions can be converted to latex/mathml/etc you can have the equations of a document generated semi-automatically. In this context you really don't want canonicalisation to mess up your equations though. If canonicalisation is unavoidably built in to core operations like substitution then the CAS is unusable for this. Canonicalisation is also very much a slippery slope. If implicit/automatic canonicalisation is expected by users then there will be a long tail of different kinds of canonicalisation that some of them will want e.g.
If you want to think about expensive computation then you need to think about large expressions and/or operations that can happen many times. If your sum has millions of terms do they need to be sorted every time any substitution is performed? What if the terms themselves are complicated expressions and comparing two terms under the sorting order is nontrivial?
And it can also waste time for no reason in situations that don't need it. What if you have an algorithm that works by repeatedly making many different substitutions? Perhaps a single canonicalisation at the end would have been fine but instead the sort has to be recomputed at every step because canonicalisation has been made automatic and unavoidable. Ultimately high-level routines built on top of primitive routines like I would advocate for isolating primitives as much as possible so e.g. |
This discussion does not seem to be relevant to the actual data structures of the repository. Could you give a real-world example, say an SBML file, which is handled better with a different data structure? |
This is a very complex problem. It's not that dis-similar to choosing between call-by-value or call-by-name (or call-by-need); there are examples of each case that are clear win/lose (depending on what you're trying to prove). [And @oscarbenjamin 's answer came in while I was writing... I agree with all he says. So I'll assume that, and comment further.] There is definite tension between naive usability and large-scale computation. Having
The default would be to use the user-level |
I'm not personally experienced enough with Julia to do this (I don't even know what a SBML file is). I've been trying Symbolics/SymbolicUtils out but so far I'm hitting against quite basic things like JuliaSymbolics/Symbolics.jl#328 (comment) |
Oh right... This is because of how In Julia there are not multiple
could give you a different type of symbol. Then the If you go through the history of this package you will notice that we did start off with this. But we found that it is really awful for systems with 30000 equations in 10000 variables, and generating code for that. It was unreasonable to wait for them to simplify. Switching to the canonical form lead to saving hours. I have some thoughts about the social aspects and organizing symbolic programming to support all use cases. I may write a post about this. But I would like to quickly note here: In other ecosystems, we are conditioned to think "one package -- one representation" and any mis-step with it is decades of lock-in. But reality is much more forgiving when you are in a language with dynamic multiple dispatch. |
Closed by #429 |
Currently SymbolicUtils uses constructor-level simplification and automatic evaluation of expressions. This may be convenient for numerical work, but makes SymbolicUtils inappropriate for purely symbolic manipulation. It also eliminates algorithmic speedup opportunities.
There is an asymmetry: a raw symbolic expression can always be simplified for numerical work, but once simplified an expression cannot be unsimplified for symbolic work. To support both kinds of users and ensure high performance, I would like to propose that SymbolicUtils adopt a posture of not simplifying or evaluating without explicit user request.
For example, with
@syms a b
:a + 1
would staya + 1
, not changing to1 + a
as it currently does.substitute(a+b, Dict(a=>1, b=>2))
would go toTerm(+, [1,2])
, not3
as it currently does.The current SymPy maintainer has warned of this issue's critical importance. He says [emphasis added]:
I hope we can get these decisions built into the early foundation of Symbolics.jl so we don't follow the wrong path.
The text was updated successfully, but these errors were encountered: