-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility layer with SymbolicRegression.jl #3790
Comments
It's not obvious to me how you intend the end-user to combine JuMP and SymbolicRegression. Assuming we can magic any internals, what is your ideal API? Can you provide a small hypothetical example of what you would like to build? |
Great questions! I'm honestly not even sure myself. It will probably evolve as I figure out what sorts of things are possible. (This is effectively an unexplored area of research, after all!) For an example, let's take constraining constants in an expression to be integers. The current way a user would do this is as follows: using SymbolicRegression
function my_loss(expr, dataset::Dataset{T,L}, options::Options) where {T,L}
y_pred, completed = eval_tree_array(expr, dataset.X, options)
!completed && return L(Inf) # Bad expression (e.g., 1/0)
y = dataset.y
loss = sum(i -> abs2(y[i] - y_pred[i]), eachindex(y, y_pred)) / length(y)
# Now, add soft integer constraint:
integral_penalty = sum(expr) do node # Sum over nodes of expression
if node.degree == 0 && node.constant
val = node.val
abs(x - round(x)) # Distance from nearest integer
else
zero(L)
end
end
return loss + 100 * integral_penalty
end the user passes this to an MLJ-style regressor class with a bunch of other available options: using MLJ, Optim
model = SRRegressor(
loss_function=my_loss,
unary_operators=(cos, exp),
binary_operators=(+, -, *, /),
optimizer_algorithm=Optim.BFGS(),
)
mach = machine(model, X, y)
fit!(mach) However, this user-provided loss function is basically just a hacky way of penalizing non-integral constants and hoping that the genetic algorithm + BFGS takes care of them (it cycles through many rounds of genetic operations, like swapping nodes or perturbing constants, followed by explicit BFGS optimization). i.e., nothing like proper MINLP! So what would be nice is if there was a way to declare a constraint on the constants and use a MINLP algorithm interfaced by JuMP/MOI to properly do this sort of thing – maybe just where P.S., here are some related requests from PySR users:
(and many many more related ones in that discussions page – which handles both the Julia and Python side) |
I still don't really understand where JuMP would fit into your example. Would the user write a JuMP model? Or is this some backend feature and you really just want access to a MINLP solver? Not to dissuade you, but see: It's not obvious that JuMP is a good choice here. |
Good questions and please don't hesitate to dissuade, I know next to nothing about MINLP.
I think this would be one possibility. The user would declare a JuMP model, and there would be an additional feature available that allows the user to specify a functional form (a DynamicExpressions.jl expression). Sort of like (modifying the quickstart example) julia> @variable(model, x >= 0)
x
julia> @variable(model, 0 <= y <= 3)
y
julia> @function(model, f, inputs=1, internal_constants=false)
f
julia> @objective(model, Min, abs(f(x) - y) + abs(f(2 * x) - 5) + 0.1 * complexity(f))
abs(f(x) - y) + abs(f(2 * x) - 5) + 0.1 * complexity(f)
julia> @constraint(model, c1, 6x + 8y >= 100)
c1 : 6 x + 8 y ≥ 100
julia> @constraint(model, c2, 7x + 12y >= 120)
c2 : 7 x + 12 y ≥ 120 SymbolicRegression.jl would evolve that The You could set up the optimizers like the following: ipopt = optimizer_with_attributes(Ipopt.Optimizer)
highs = optimizer_with_attributes(HiGHS.Optimizer)
juniper = optimizer_with_attributes(
Juniper.Optimizer,
"nl_solver" => ipopt,
"mip_solver" => highs,
)
optimizer = optimizer_with_attributes(
SymbolicRegressionJuMP.Optimizer,
"minlp_solver" => juniper,
"options" => SymbolicRegression.Options(unary_operators=(cos, exp), binary_operators=(+, -, *), maxsize=25)
) The However, this would obviously a massive project... So, in the meantime, maybe:
This is also something I am interested in regardless of (1). It's probably much easier too. I've looked a bit and have found a lot of general NLP solvers, but I am specifically looking for a MINLP for blackbox functions (which DO have gradients, as well as Hessians). DynamicExpressions.jl can also provide info on convexity if needed. But the expressions are dynamic and might be linear or nonlinear – and the optimizer would not be able to know the exact expression (since then it would likely have a huge overhead, since the expression is runtime-generated). |
Let me see if I can summarize what you'd like to do, and you tell me which bits I get wrong. I have only a vague understanding of how SymbolicRegression works.
I assume you want to do something like this: function optimize_expression_in_step_4(expression, dataset, options, optimizer)
_, references = get_scalar_constants(expression)
model = Model(optimizer)
@variable(model, p[1:length(references)])
@variable(model, residuals[1:length(dataset)])
set_scalar_constants!(expression, p, references)
y, _ = eval_tree_array(expression, dataset.X, options)
@constraint(model, residuals .== y .- dataset.y)
@objective(model, Min, sum(residuals.^2))
# START-USER-CONSTRAINTS
set_integer.(p)
# END-USER-CONSTRAINTS
optimize!(model)
set_scalar_constants!(expression, value.(p), references)
return objective_value(model)
end To get it working, you need to be able to evaluate I have no idea how you can get the user to write constraint like " I don't think we need to do anything special in JuMP. |
Thanks! Yes, your description of 1-8 is what I had in mind. There are only a couple possible differences. One is that the optimization is typically decoupled from the user-specified loss function, in order for the genetic algorithm itself to be aware of any constraints – it can optimize things even if BFGS (or an MINLP solver) is disabled. So if the The second is about how this works for the expression evaluation. Does |
It seems like it works, but it's really slow. DynamicExpressions.jl requires that the element type of the tree and the data is the same – for performance and preallocation reasons. So for this to work I had to type everything as an Also, it seems like JuMP records the entire expression tree (?), which is in opposition to the use of DynamicExpressions which is designed around fast runtime-generated expressions. Needing to walk the expression tree dynamically with other data structures will likely use a lot of memory. Is there any way I can force JuMP to be content with julia> using DynamicExpressions
julia> using JuMP
julia> using Ipopt: Ipopt
julia> x1 = Node{Float64}(; feature=1)
x1
julia> x2 = Node{Float64}(; feature=2)
x2
julia> operators = OperatorEnum(
binary_operators=(+, -, *, /),
unary_operators=(sin, cos),
)
OperatorEnum{Tuple{typeof(+), typeof(-), typeof(*), typeof(/)}, Tuple{typeof(sin), typeof(cos)}}((+, -, *, /), (sin, cos))
julia> tree = cos(x1 * 2.1) - x2
cos(x1 * 2.1) - x2
julia> p0, references = get_scalar_constants(tree)
([2.1], Base.RefValue{Node{Float64}}[Base.RefValue{Node{Float64}}(2.1)])
julia> num_variables = length(p0)
1
julia> optimizer = Ipopt.Optimizer
Ipopt.Optimizer
julia> model = Model(optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt
julia> @variable(model, p[1:num_variables])
1-element Vector{VariableRef}:
p[1]
julia> function convert_to_variable_ref(tree, params, ::Type{R}) where {R}
i = Ref(0)
tree_mapreduce(
leaf -> if leaf.constant
Node{R}(; val=params[i[] += 1])
else
Node{R}(; feature=leaf.feature)
end,
identity,
(parent, children...) -> Node{R}(; parent.op, children),
tree,
)
end
convert_to_variable_ref (generic function with 1 methods)
julia> tree_ref = convert_to_variable_ref(tree, p, AbstractJuMPScalar)
cos(x1 * (p[1])) - x2
julia> X = randn(Float64, 2, 32)
2×32 Matrix{Float64}:
-0.0746526 0.364179 -0.31086 -0.27601 -0.111302 -0.655292 0.79584 0.849459 -1.58574 -0.132552 -0.999372 … 0.815621 -1.0162 0.877786 1.82748 -0.977439 0.25327 -1.5235 0.151345 -0.134643 1.6589 0.825483
1.79968 -0.725315 0.0160492 -0.0165158 -1.52329 -0.179642 -0.434624 -1.00659 0.875373 -0.750719 -1.78184 -1.09908 0.253982 0.822796 -0.620156 -1.76847 -0.507072 -1.18368 -0.522273 0.922352 2.16542 0.109187
julia> @variable(model, converter)
converter
julia> @constraint(model, converter == 1)
converter = 1
julia> X_ref = Matrix{AbstractJuMPScalar}(converter .* X)
2×32 Matrix{AbstractJuMPScalar}:
-0.07465259226587102 converter 0.36417901289093735 converter -0.31085990947789494 converter … 0.15134495266926729 converter -0.1346426329962789 converter 1.6589021930524135 converter 0.8254827460019445 converter
1.799683617390219 converter -0.7253151845805736 converter 0.016049244525629433 converter -0.5222730762965133 converter 0.9223524172130876 converter 2.1654162971520536 converter 0.10918691593060266 converter
julia> eval_tree_array(tree_ref, X_ref, operators)
(AbstractJuMPScalar[cos(-0.07465259226587102 converter*p[1]) - (1.799683617390219 converter), cos(0.36417901289093735 converter*p[1]) - (-0.7253151845805736 converter), cos(-0.31085990947789494 converter*p[1]) - (0.016049244525629433 converter), cos(-0.27600982416010994 converter*p[1]) - (-0.01651584685659015 converter), cos(-0.11130200072339379 converter*p[1]) - (-1.5232920468979532 converter), cos(-0.6552919255545688 converter*p[1]) - (-0.1796421756636822 converter), cos(0.795840390388174 converter*p[1]) - (-0.4346236223235513 converter), cos(0.8494586131074104 converter*p[1]) - (-1.0065930799112859 converter), cos(-1.585735689597626 converter*p[1]) - (0.8753733720730349 converter), cos(-0.13255195039681938 converter*p[1]) - (-0.7507187114742331 converter) … cos(-1.0162002402114096 converter*p[1]) - (0.2539823566729166 converter), cos(0.8777861446417738 converter*p[1]) - (0.822795814625541 converter), cos(1.827476863381431 converter*p[1]) - (-0.6201558037364908 converter), cos(-0.9774388479783699 converter*p[1]) - (-1.768471457571797 converter), cos(0.25327022257524606 converter*p[1]) - (-0.5070717696587647 converter), cos(-1.5234961669454194 converter*p[1]) - (-1.183675456014598 converter), cos(0.15134495266926729 converter*p[1]) - (-0.5222730762965133 converter), cos(-0.1346426329962789 converter*p[1]) - (0.9223524172130876 converter), cos(1.6589021930524135 converter*p[1]) - (2.1654162971520536 converter), cos(0.8254827460019445 converter*p[1]) - (0.10918691593060266 converter)], true) Perhaps I need to have some type of new type that simply records the application of |
It's a little tricky to reply to each of your points, but here's an attempt
JuMP doesn't return solutions that violate constraints. If you want soft constraints, these need to be modeled explicitly. You could do: if !is_solved_and_feasible(model)
return Inf
end
return objective_value(model)
We implement operator overloading that returns new expression trees.
I'm assuming here that set_scalar_constants!(expression, p, references)
y, _ = eval_tree_array(expression, dataset.X, options) has set the JuMP variables in the tree, so
Yes, your code snippet is pretty much what I expected to happen. You could perhaps do
Not as much as actually solving the MINLP 😄
So this depends on what MINLP solver you want to use. In cases like Apline, BARON, and Couenne, they require the expression graph, and do not support arbitrary user-defined functions (because they use a priori knowledge of convexity, monotonicity, etc for spatial branching). If you use Juniper or KNITRO, you will get only a local solution to the MINLP, but they can support user-defined functions. You could do something like: function optimize_expression_in_step_4(expression, dataset, options, optimizer)
_, references = get_scalar_constants(expression)
model = Model(optimizer)
M, N = length(dataset), length(references)
@variable(model, p[1:N])
@variable(model, residuals[1:N])
last_x = nothing
function eval_fn(i::Int, x::T...) where {T<:Real}
if last_x !== x
set_scalar_constants!(expression, collect(x), references)
last_x = x
end
y, _ = eval_tree_array(expression, dataset.X[i], options)
return y
end
ops = map(1:M) do i
return add_nonlinear_operator(
model,
N,
(x...) -> eval_fn(i, x...),
# Will use ForwardDiff for gradient automatically, otherwise provide
# (g::AbstractVector{Float64}, x::Float64...) -> nothing
# which fills in g` with the gradient.
name = Symbol("op_$i"),
)
end
@constraint(model, [i in 1:N], residuals[i] == ops[i](p...) - dataset.y[i])
@objective(model, Min, sum(residuals.^2))
# START-USER-CONSTRAINTS
set_integer.(p)
# END-USER-CONSTRAINTS
optimize!(model)
set_scalar_constants!(expression, value.(p), references)
return objective_value(model)
end but I don't think it will work very well if you have large Note that I'm still not convinced JuMP is a good fit for this. If I had to summarize, it seems like SymbolicRegression is a local-search based algorithm that is made to evaluate many candidates fast and efficiently with minimal overhead. JuMP (for MINLPs at least) is the opposite: given a lot of structure, find the best possible answer. Why can't you just take the output from BFGS and round the outputs? Or do local-search rounding some values up and some values down? |
Thanks, Indeed even a local search would be useful, so Juniper or KNITRO sound like potential options. The outer loop of SymbolicRegression would effectively turn this into a global search.
It’s a more subtle than this, it’s hard to put it in either camp. Internally SR.jl wraps different “genetic operators” / “mutations” that modify a symbolic expression. It randomly applies these operators to expressions according to some categorical distribution. (It also has some population and multi-population evolutionary algorithms.) But those mutations can be simple perturbations to the expressions, or local search algorithms like BFGS, or even global search algorithms. The user controls the frequency at which a mutation occurs, so there isn’t really a strict need for a mutation to be a given speed. For example, one of the default mutations is a full BFGS optimization run. Evolutionary algorithms are non-convergent so SR.jl just runs for the user-specified number of iterations or exit criteria. What I am imagining is that the user would specify a JuMP model, and there could be a way to quickly evaluate whether the constraints are violated or not, which could be used in the frequently-called loss function. Is it possible to do this check in JuMP, for some user-specified initial parameters? Or would they need to do an explicit Then, should the user want to use some integral constraints, one of the mutations would run an explicit MINLP solver. This would be called much less frequently so wouldn’t need to be quite as fast.
This is one approach for this type of thing. But since it’s not a real MINLP solver it doesn’t get the correct solutions, and we instead have to rely on the genetic algorithm randomly perturbing the constants until it gets lucky. Another hack that can work for simple problems is to provide integers as features to the model. But this relies on the genetic algorithm randomly finding the right combination of features — which is very inefficient. It would be much better to have an explicit solver for this type of thing. |
No, scalar output only.
Somewhat: https://jump.dev/JuMP.jl/stable/manual/solutions/#Checking-feasibility-of-solutions. But it isn't optimized to evaluate this many times quickly.
I include GAs as a local-search. You can't generate a proof of global optimality. Let's step back a bit: what is the dimension of input and output? How big are the expressions? It seems like you have small problems with simple constraints, and a very complicated objective function. Overly generalizing: JuMP is really structured for problems with complicated constraints and simpler objective functions. |
I see. Is this something that does not exist because there is not a need for it, or is it fundamentally incompatible with the internals? If the former, how much effort would it be to add this type of feature?
Got it, indeed that is true.
This varies by multiple orders of magnitude across the community's use-cases. Some users might have inputs with size (100,000 rows, 100 features), and want to search for expressions with hundreds of operators, while some others might have inputs with size (10, 1), and the ground truth expression would be something simple like But across all of these cases, sometimes users will want to impose constraints that seem appropriate for an MINLP solver to optimize. I don't want to generalize the community's use-cases (there seems to be broad interest in integral and other discrete constraints) but for the problems I work on, it's usually size (1000 rows, 10 features), and I am looking for expressions that have maybe a complexity of 30 (30 total operators or constants or instances of variables in the expression). |
It requires a fairly major redesign of some of the most complicated code in JuMP... but it's on our roadmap: For input and output, I mean JuMP is built to solve problems that have O(10^3)-O(10^5) inputs (which will become decision variables) and outputs (which will become constraints here), and solving just this one problem may take seconds to minutes to hours. Coming back to the "should I use JuMP" page, I think the answer for your use-case might be "no". Couldn't you just get the user to provide you a function that returns true/false for violated constraints? They're unlikely to write very complicated constraints right? |
So,
As I described above, this is the current approach that I recommend users to try if they have integral constraints. However, (1) it's a bit hacky, and (2) it doesn't work that well because BFGS is not an MINLP solver. This is why I am interested in some kind of JuMP integration. |
Ah, I see https://jump.dev/JuMP.jl/stable/tutorials/nonlinear/tips_and_tricks/#User-defined-operators-with-vector-outputs. The memoization approach sounds like it could work. Are there any solvers that have multiple parameter searches running in parallel? Otherwise I could just pre-allocate a single buffer for the vector output, and update it whenever the hash of the parameters change. Seems pretty straightforward on my end. |
Have you tried solving some of these as MINLPs? How long do they take? I would assume that even the small problems take seconds.
I don't think so, mainly because for a long time Julia was not thread-safe to call back into from C, so we typically disable multithreading if there are callback oracles in the solver wrappers. |
I guess we're overlooking the more obvious: function optimize_expression_in_step_4(expression, dataset, options, optimizer)
_, references = get_scalar_constants(expression)
function loss_fn(p...)
set_scalar_constants!(expression, collect(p), references)
y_pred, completed = eval_tree_array(expr, dataset.X, options)
if !completed
# This is likely problematic. They should be encoded as constraints
# instead.
return Inf
end
return sum(abs2(a - b) for (a, b) in zip(dataset.y, y_pred))
end
function loss_fn_gradient(g::AbstractVector, p...)
g .= loss_fn'(p...) # Or something?
return nothing
end
model = Model(optimizer)
@variable(model, p[1:length(references)], Int)
@operator(model, op_loss_f, length(references), loss_fn, loss_fn_gradient)
@objective(model, Min, op_loss_fn(p...))
optimize!(model)
set_scalar_constants!(expression, value.(p), references)
return objective_value(model)
end |
Is there anything actionable to do here on the JuMP side of things? |
Sorry I haven't had a chance to look at this again the past few days due to other things, will try to get to it on the weekend. |
I think regardless, I don't know if there is anything that we need to do here with respect to changes in JuMP or MathOptInterface (other than jump-dev/MathOptInterface.jl#2402). |
I'm going to close this issue as "won't-fix" because there is nothing actionable that we plan to change in JuMP or MOI. If you have more questions, please post on the forum, https://discourse.julialang.org/c/domain/opt/13, or the |
I would like to explore using JuMP.jl to set up mixed-integer programming optimization problems in SymbolicRegression.jl and PySR. This would open up an entirely new area of optimization problems, because you could potentially evolve the symbolic objective itself without needing to re-compile. For example, this means we could do things like adding constraints that constants in a discovered symbolic expressions within PySR are equal to integers.
To do this, I think what we basically need to do is set up some kind of compatibility layer between DynamicExpressions.jl and MathOptInterface.
Describe the solution you'd like
I'm not sure where to even start. Perhaps I can explain a bit about the DynamicExpressions.jl API.
DynamicExpressions API
Basically there is this
Node
type that stores references to operators in a lightweight struct:(It's a binary tree now, but there is a working implementation which generalises things to n-arity operators)
The reason it is written like this is so that the expression can be completely runtime-generated, type stable, and memory-efficient. There is no extra compilation needed besides the kernels defined for the user's required operators.
To use this, we define a space of operators and variable names:
We can then create a variable (indicating the first column of some dataset)
We can build up more complex expressions using these basic building blocks:
where the
3
indicates*
and1
indicates+
– referencing theOperatorEnum
we created.This can get turned into an
Expression
object that stores both theOperatorEnum
and the rawNode
type which references the operators:For example,
These are nice since, with the new feature here, you can use regular Julia math on them, and they will lookup the right operator in the enum for you and generate the corresponding
Node
:This expression includes its own metadata about the space of operators, so you can print this expression, or evaluate it directly:
This also allows for gradients with respect to features of the data:
Or the constants of the expression:
again, which is a single evaluation kernel for any expression; no extra compilation needed.
Interface
Now, the interface. So, expressions have this general
tree_mapreduce
operation which is used to implement basically everything. It's like a mapreduce except the reduction is structured like(parent, children...)
. It also has built-in support for graphs that have shared subexpressions via theGraphNode
type – used for avoiding double-counting as well as for caching.One of the main utilities for optimizing expressions is
get_scalar_constants
andset_scalar_constants!
, which are implemented like so:My guess is that this interface of getting and setting constants somehow is probably the right interface for connecting DynamicExpressions to MathOptInterface... What do you think?
The other thing (as mentioned above) is
which gets the gradients using Zygote-generated kernels for each operator. This also has an interface with ChainRulesCore.jl so should work if you differentiate with respect to the expression object.
(The downside of this is that it is forward-mode, although:
So... What is the right way to optimize DynamicExpressions.jl expression objects with JuMP?
Expected question: Why not post this issue in MathOptInterface?
I think for a specific implementation case we could probably have an issue there as well, or maybe in DynamicExpressions.jl if that makes more sense. I set up an issue here for tracking the overall goal of having a compatibility layer between JuMP and SymbolicRegression.jl, regardless of how that takes shape.
The text was updated successfully, but these errors were encountered: