Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[breaking] return config struct from each layer #67

Closed
odow opened this issue Jun 28, 2024 · 13 comments
Closed

[breaking] return config struct from each layer #67

odow opened this issue Jun 28, 2024 · 13 comments

Comments

@odow
Copy link
Collaborator

odow commented Jun 28, 2024

As discussed with Russell and Kaarthik

@odow
Copy link
Collaborator Author

odow commented Aug 13, 2024

This is a breaking change because it changes the return of add_predictor. I'm going to wait a bit longer before adding (e.g., once the repo is ready to be made public and we've had a bit more feedback with the current design).

@odow odow changed the title Return config struct from each layer [breaking] return config struct from each layer Aug 13, 2024
@odow
Copy link
Collaborator Author

odow commented Aug 14, 2024

So playing around with this. One option is:

struct ModelComponents
    variables::Vector{Any}
    constraints::Vector{Any}
end

function add_predictor(model::JuMP.Model, ::Tanh, x::Vector)
    y = JuMP.@variable(model, [1:length(x)], base_name = "moai_Tanh")
    _set_bounds_if_finite.(y, -1.0, 1.0)
    cons = JuMP.@constraint(model, y .== tanh.(x))
    return ModelComponents(y, cons), y
end

function add_predictor(::JuMP.Model, ::ReducedSpace{Tanh}, x::Vector)
    return ModelComponents(), tanh.(x)
end

with syntax

components, y = add_predictor(model, predictor, x)

Another option is

struct PredictionModel{T}
    output::Vector{T}
    variables::Vector{Any}
    constraints::Vector{Any}
end

function add_predictor(model::JuMP.Model, ::Tanh, x::Vector)
    y = JuMP.@variable(model, [1:length(x)], base_name = "moai_Tanh")
    _set_bounds_if_finite.(y, -1.0, 1.0)
    cons = JuMP.@constraint(model, y .== tanh.(x))
    return PredictionModel(y, y, cons)
end

function add_predictor(::JuMP.Model, ::ReducedSpace{Tanh}, x::Vector)
    return PredictionModel(tanh.(x), Any[], Any[])
end

with syntax

ml_model = add_predictor(model, predictor, x)
y = ml_model.output

Yet another option is:

y = add_predictor(model, predictor, x)
components, y = add_predictor(model, predictor, x; return_components = true)

@odow
Copy link
Collaborator Author

odow commented Aug 14, 2024

Thoughts @Robbybp?

@Robbybp
Copy link
Collaborator

Robbybp commented Aug 14, 2024

The idea is that there is no overlap between output (or y) and variables? For now, I like wrapping everything into one struct.

Are you thinking that variables/constraints are completely unstructured, or somehow mimic the structure of the network (e.g. are indexed by nodes/layers)? Asking because I think this would be nice, but I'm not sure if it can/should be generalized.

@odow
Copy link
Collaborator Author

odow commented Aug 14, 2024

The idea is that there is no overlap between output (or y) and variables?

Nah, with overlap.

Perhaps each predictor should have a dedicated Result struct that documents how to inspect the formulation.

@Robbybp
Copy link
Collaborator

Robbybp commented Aug 17, 2024

Personally, I'd prefer no overlap.

Perhaps each predictor should have a dedicated Result struct that documents how to inspect the formulation.

Maybe, although I could see this getting complicated quickly. Maybe a default Result struct that keeps internal variables and constraints in lists, with an option to return a "more structured" result if the predictor supports it?

@odow
Copy link
Collaborator Author

odow commented Aug 17, 2024

Personally, I'd prefer no overlap.

One issue is that the type of our result y can depend on the input, so I think there is value in having all variables in one place, and a list of the outputs (which may or may not be a new vector of variables).

Maybe a default Result struct that keeps internal variables and constraints in lists, with an option to return a "more structured" result if the predictor supports it?

Yip. That is exactly what this is: #80

@odow odow mentioned this issue Aug 26, 2024
@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

Here are @pulsipher's thoughts from #82

I definitely think it is a good idea to have access to the variables and constraints created for the predictor. This helps to demystify the transformations and I believe provides a way to delete a predictor (i.e., manually deleting the variables and constraints).

Here, are some of my thoughts on syntax to accomplish this. I prefer options 2 or 3.

1. Using the approach proposed in #80

I don't have any major issues with this approach, except that I think add_predictor should return y, formulation instead of formulation, y since I imagine the formulation will often not be used and the user would just want y, _ = add_predictor. Though, admittedly it might be a little annoying to have to deal with the 2nd argument when it is often not needed.

One side question would be why is Vector{Any} required for the variables field?

2. Tweaking #80 to return only one object

Instead of returning y and formulation, we could return a reformulation block of the form:

struct SimpleFormulation{P<:AbstractPredictor} <: AbstractFormulation
    predictor::P
    outputs::Array{Any} # new field for `y`
    variables::Vector{Any}
    constraints::Vector{Any}
end

Then the user can just extract the outputs y from the formulation object as wanted. Going one step further, one could even overload getindex such that:

Base.getindex(f::SimpleFormulation, inds...) = getindex(f.outputs, inds...)

3. Store the formulation info in the model and use references

Adding a little more complexity, we could store the formulation objects in model.ext and have add_predictor return a predictor reference object that points to it. Then the reference can be used to simplify the user experience in like manner to VariableRefs and ConstraintRefs (and similar to what DisjunctiveProgramming does with disjunctions). A rough API might look like:

predictor = add_predictor(model, nn, x)
y = outputs(predictor)
cons = transformation_constraints(predictor)
vars = transformation_variables(predictor)
predictor_obj = predictor_object(predictor)
set_transformation(predictor, TransformType()) # change the transformation used in-place
delete(model, predictor) # removes the predictor and anything it added to the model

Most of the above API could also be added with option 2. Moreover, we could also overload getindex to index the output variables.

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

My thought for choosing option 1 is that most people don't actually want the formulation.

Most codes will do:

_, y = MathOptAI.add_predictor(model, predictor, x)
# or
y, _ = MathOptAI.add_predictor(model, predictor, x)
# or
y = first(MathOptAI.add_predictor(model, predictor, x))

I get that a single return simplifies things, but then I assume most people will immediately do:

formulation = MathOptAI.add_predictor(model, predictor, x)
y = formulation.output

This violates my design principle, and is one of the things I don't like about OMLT: https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md#omlt

  1. just seems like 2 with a layer of indirection. I don't think we really need it.

I think we can make delete work with option 1.

I'm not in a hurry to merge #80. At minimum, I'll wait until we can make the repo public and set up CI etc. It's a pretty big change to the library.

@pulsipher
Copy link

pulsipher commented Aug 28, 2024

Another API idea would be to treat an added predictor as an operator-like object. For a NN this might look like:

NN = add_predictor(model, flux_chain) # creates a modelling object that can be treated as an operator
@objective(model, Min, sum(NN(x)))

where NN(x) generates the necessary expressions/constraints/variables as needed. Some sort of cache could be used to avoid making duplicate constraints/variables/expressions. Then that cache could be used to query NN for any desired information such as formulation constraints/variables.

In my research, we commonly use a single NN in an optimal control problem that steps forward in time. So the same NN is used over n-1 different variable inputs. The above API, would enable us to only have to "add" the predictor once.

What is nice about this approach is that NN(x) resembles how I often see this formulated mathematically and creating the model object this way more closely follows the typical JuMP workflow for creating expressions, parameters, variables, and nonlinear operators. Taking it one step further, you could even have a JuMP-like macro:

@predictor(model, NN, flux_chain)

 # containers of predictors could also be supported
@predictor(model, NN[i = 1:10], flux_chain[i])

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

At one point I played around with y = NN(x) and y = NN(model, x). (I can't find it in the logs, so maybe I never pushed.)

I also considered the macros. But I don't know that they add much functionality. They're really just macros for the same of it.

I decided to go with add_predictor for simplicity.

I'm open to revisiting this in the future though, but it would require some large-scale practical examples that clearly demonstrate the benefit.

@pulsipher
Copy link

pulsipher commented Aug 29, 2024

Enabling NN(x) could just be a matter of defining:

function (predictor::AbstractPredictor)(x::Vector)
    model = _get_model_or_error(x)
    return add_predictor(model, predictor, x)
end

where predictor can be created via build_predictor. This makes it more convenient to reuse the same predictor over different sets of inputs. For #80, we could have:

function (predictor::AbstractPredictor)(x::Vector)
    model = _get_model_or_error(x)
    return first(add_predictor(model, predictor, x))
end

which provides a natural way to avoid getting the formulation information if you don't want it.

I still would like to also avoid making redundant formulations by using the same nonlinear operator over a set of inputs and by using the same formulation if NN(x) is called more than once with the same inputs (admittedly, the last point is easy for the user to handle).

A large-scale example that exists in the optimal control community (and my own research) is an MPC formulation that uses a NN model that steps in time. The NN takes the states $x$ and control variables $u$ at time $t_k$ and outputs the states at time $t_{k+1}$. Hence the same NN is used $K-1$ times where $K$ is the number of time points. An ideal syntax for the NN portion of the optimal control formulation would be something like:

K = 100
raw_NN = # put raw Flux/Lux/PyTorch model here

@variable(model, x_lb[i] <= x[i =1:10, 0:K] <= x_ub[i])
@variable(model, u_lb[j] <= u[j = 1:2, 0:K] <= x_ub[j])

NN = build_predictor(raw_NN, gray_box = true)
@constraint(model, [k = 0:K-1], NN(vcat(x[:, k], u[:, k])) .== x[:, k+1])

where only one new nonlinear operator is created for NN over all the inputs it receives (this is point is probably best discussed in #90).

@odow
Copy link
Collaborator Author

odow commented Oct 15, 2024

Closing as won't-fix for now. Let's see how far our current setup gets us.

@odow odow closed this as completed Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants