Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept more-MOIified implementation #393

Closed
wants to merge 26 commits into from
Closed

Conversation

ericphanson
Copy link
Collaborator

@ericphanson ericphanson commented Jul 31, 2020

This is a proof of concept implementation of a different internal structure for Convex.jl (edit: which has started to turn into a bit of a rewrite).

Currently, just the minimal amount of methods are implemented to solve testproblem/testproblem.jl. Probably any other problem will error! The end-user access point is solve2! which works just like solve! (except much of it isn't implemented).

Some design notes

template basically has the same motivation as conic_form!: it recurses through the problem, generating a conic reformulation of it. It does so in a different way, however. It takes a context parameter which acts very similarly to how unique_conic_forms currently works by holding problem-local state. In this case, it holds a MOI model. When constructing the conic reformulation, any Convex AbstractVariables are associated to MOI variables added to the model. Any constraints are simply immediately added to the model. Then a MOI function is returned. In the end, we recover an MOI function which is the objective function for the problem.

This implementation reuses the current atoms and DCP properties, but the conic reformulations are entirely rewritten (template instead of conic_form!) which means that all the objects in conic_form.jl and solution.jl are not used.

Some more notes:

  • We get access to both the Convex.jl object (and it's children), as well as the MOI objects returned from calling template on the children. We should think about when to use one versus the other.
  • It's crucial that a redesign allows defining atoms in terms of partially specified problems (Partially specified problems #310) (but this isn't done here yet! edit: actually this isn't a big deal and can be done as easily in the new implementation).
  • I've used the convention of not including the ! when modifying the context object, under the assumption that it is often being modified.
  • It might be quite interesting to multithread problem formulation. To do so, we could use the branching tree structure of the problem. One issue is that adding variables to the MOI model is a global operation we only want to do once.
  • Convex.jl currently passes around tuples of real and imaginary parts of objects. I did not do this here, partly in the hopes that MOI can get native complex support and we can re-use it here. This I imagine would be key for Hypatia to not need the big complex-to-real SDP reformulation used in Convex.jl (assuming it can natively support the complex PSD cone).

On the problem in testproblem/testproblem.jl, we get much improved results:

┌ Info: Running with classic Convex.jl setup...
└   (m, n, k) = (150, 150, 5)
 42.958454 seconds (77.45 M allocations: 13.555 GiB, 6.44% gc time)
┌ Info: Running with `Convex.conic_form_problem_solve`...
└   (m, n, k) = (150, 150, 5)
 13.723807 seconds (19.08 M allocations: 2.313 GiB, 1.29% gc time)
┌ Info: Running with JuMP...
└   (m, n, k) = (150, 150, 5)
 14.912025 seconds (41.56 M allocations: 2.344 GiB, 2.98% gc time)

It should also be much easier to support other cones to support e.g. geomean for vectors (ref #388).

To do in order to think about merging this

  • Figure out how to handle complex variables (answer: introduce separate ComplexVariable, ComplexConstant, and ComplexTape types)
  • support satisfy
  • duals
  • warmstarts
  • Think about the vexities for problems, or revert those changes
  • decide if the expression tree caching is useful and re-add it if so
  • look at perf impact of not using an atom for dot product (esp with fusion); maybe revert
  • update all atoms (so far: affine, linear, and exp are done, except for duals and the dot product issue)
  • update all constraints (so far LtConstraint, GtConstraint, EqConstraint, SDPConstraint, ExponentialCone)
  • pass tests
  • write devdocs describing how Convex works (now that I really know!)

New features that this enables

src/solve2!.jl Outdated
model = MOIB.full_bridge_optimizer(MOIU.CachingOptimizer(MOIU.UniversalFallback(MOIU.Model{T}()),
optimizer), T)

return (var_id_to_moi_indices=OrderedDict{UInt64,Vector{MOI.VariableIndex}}(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a struct instead of a NamedTuple, but for prototyping I think it's easier as a NamedTuple

src/solve2!.jl Outdated
Comment on lines 28 to 29
scalar_fn(x) = only(MOIU.scalarize(x))
scalar_fn(v::MOI.AbstractScalarFunction) = v
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should really be needed. It came about because I ended up passing things that were nomially vectors to the objective function, or adding scalars and vectors. I think part of the problem is that Convex.jl does not distinguish between scalars and vectors (or matrices!); e.g. atoms are not parametric types. This is an inherent source of type instability and means that we can't use dispatch always to route things correctly (i.e. one might imagine that the objective function would always end up as a scalar function just by the layout of the problem, but that doesn't happen currently).

src/solve2!.jl Outdated
Comment on lines 31 to 37
function solve2!(problem::Problem{T}, optimizer; kwargs...) where {T}
if Base.applicable(optimizer)
return solve!(problem, optimizer(); kwargs...)
else
throw(ArgumentError("MathProgBase solvers like `solve!(problem, SCSSolver())` are no longer supported. Use instead e.g. `solve!(problem, SCS.Optimizer)`."))
end
end
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from solve!

if primal_status != MOI.NO_SOLUTION
for (id, var_indices) in var_to_indices
var = id_to_variables[id]
vexity(var) == ConstVexity() && continue
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means a variable has been fix!'d and should be treated as a constant; in particular, we should not update its value


function template(a::AbstractVariable, context)
var_inds = get!(context.var_id_to_moi_indices, a.id_hash) do
return add_variables!(context.model, a::Variable)
Copy link
Collaborator Author

@ericphanson ericphanson Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two leaf types: Constants and AbstractVariables. When we reach a variable, we add it to our local state (in context) by adding it to the MOI model and to our dictionary. I've used Convex.jl's convention of an explicit hash field, but I think a usual dictionary that does its own hashing might be cleaner.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment doesn't make sense as written - did you mean "does its own hashing"?

Copy link
Collaborator Author

@ericphanson ericphanson Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, sorry about that. By the way, I am unsure what to do with the hashing stuff in Convex; it's used to avoid reformulating branches of the expression tree if they are identical and have already been formulated, but I don't entirely understand why it works (sometimes the head symbol of an atom is hashed in, sometimes it's not; seems like it should be a source of bugs but it hasn't been yet...) and think it should be revamped.

In this PR, only the id_hash field of an AbstractVariable is used to recognize the same variable in different branches of the expression tree, etc, but no other hashes are used. So in particular, in this PR currently identical branches are reformulated multiple times.

@@ -0,0 +1,126 @@
using ECOS
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is from #390

@@ -154,6 +154,16 @@ function conic_form!(c::GtConstraint, unique_conic_forms::UniqueConicForms)
return get_conic_form(unique_conic_forms, c)
end

function add_constraints_to_context(lt::GtConstraint, context)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only implemented GtConstraint so far, but the idea is to have such a method for each Convex.Constraint (like conic_form!). I did not use template because I think constraints should not return an objective function value but instead simply update the model (by adding constraints).

@@ -48,6 +48,29 @@ function conic_form!(x::EucNormAtom, unique_conic_forms::UniqueConicForms)
return get_conic_form(unique_conic_forms, x)
end


function template(A::EucNormAtom, context)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a case where a partially specified problem might come in handy

var = var * ones(1, size(coeff, 1))
end

const_multiplier = Diagonal(vec(coeff))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this method from the conic_form! one (but using Diagonal instead of spdiagm, which I thought might be more efficient here?). Convex.jl uses a lot of these "vectorized" methods where we create a big matrix to act on the vector of variables. I wonder if there are better ways that don't involve constructing giant matrices.

@codecov
Copy link

codecov bot commented Jul 31, 2020

Codecov Report

Merging #393 into master will decrease coverage by 4.38%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #393      +/-   ##
==========================================
- Coverage   88.42%   84.04%   -4.39%     
==========================================
  Files          75       77       +2     
  Lines        3508     3691     +183     
==========================================
  Hits         3102     3102              
- Misses        406      589     +183     
Impacted Files Coverage Δ
src/Convex.jl 100.00% <ø> (ø)
src/VectorAffineFunctionAsMatrix.jl 0.00% <0.00%> (ø)
src/atoms/affine/add_subtract.jl 79.59% <0.00%> (-13.27%) ⬇️
src/atoms/affine/multiply_divide.jl 59.82% <0.00%> (-21.57%) ⬇️
src/atoms/affine/reshape.jl 72.72% <0.00%> (-7.28%) ⬇️
src/atoms/affine/sum.jl 80.00% <0.00%> (-10.91%) ⬇️
src/atoms/lp_cone/abs.jl 72.00% <0.00%> (-22.74%) ⬇️
src/atoms/second_order_cone/norm2.jl 57.69% <0.00%> (-42.31%) ⬇️
src/constant.jl 94.59% <0.00%> (-5.41%) ⬇️
src/constraints/constraints.jl 80.43% <0.00%> (-6.63%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd2f6c9...f4a5e0a. Read the comment docs.

@ericphanson
Copy link
Collaborator Author

In this PR, I introduce a structure

struct VectorAffineFunctionAsMatrix{M,B,V}
    matrix::M
    vector::B
    variables::V
end

to represent an MOI.VectorAffineFunction in a way more conducive to composing affine functions of the variables. Why is this important? The extended formulations that Convex does can be seen as applying a sequence of affine functions to the variables, and adding constraints to the model. The variables are represented by a symbolic vector x (a vector of MOI.VariableIndex), and the affine functions are therefore of the form Ax + b where A is a matrix and b is a vector. By keeping A as a matrix, we can use specialized sparse matrix * sparse matrix methods, or BLAS for dense matrices, etc. Note that a priori we cannot avoid the cubic complexity of matmuls because our vector is symbolic (i.e. ideally we would just do a sequence of matvecs, doing A*x to get a vector, then applying the next transformation to that vector, and so on).

However, the final matrix should be 1 x n dimensional, since we end up with a scalar objective function. Therefore, we should perform the matmuls in the other direction: we have, forgetting about the additive part, things like y^T * D * C * B * A * x, and we should do y^T * D to recover an 1 x n dimensional matrix again, then (y^T * D) * C), and so forth. (Maybe we should then transpose everything to do the multiplications if there aren't specialized dispatches for 1xn instead of nx1?).

@ericphanson
Copy link
Collaborator Author

I've pushed a commit that implements the ideas from the previous comment (and then one that fixes perf bugs), which improves the benchmarks quite a bit:

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with classic `Convex.solve!`...
└   (MAX_ITERS, m, n, k) = (2, 150, 150, 5)
 15.400637 seconds (1.40 M allocations: 10.077 GiB, 14.19% gc time)
┌ Info: Running with `Convex.solve2!`...
└   (MAX_ITERS, m, n, k) = (2, 150, 150, 5)
  2.386677 seconds (121.63 k allocations: 785.039 MiB, 6.23% gc time)
┌ Info: Running with JuMP...
└   (MAX_ITERS, m, n, k) = (2, 150, 150, 5)
  5.013906 seconds (15.27 M allocations: 1.047 GiB, 3.48% gc time)

and

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with `Convex.solve2!`...
└   (MAX_ITERS, m, n, k) = (2, 300, 300, 5)
 16.237871 seconds (148.79 k allocations: 4.973 GiB, 5.62% gc time)
┌ Info: Running with JuMP...
└   (MAX_ITERS, m, n, k) = (2, 300, 300, 5)
 29.201845 seconds (55.55 M allocations: 3.898 GiB, 8.59% gc time)

I could even solve it with n=m=500, for which JuMP and the usual Convex OOM on my desktop with 16 gb of ram:

julia> include("testproblem\\testproblem.jl")
┌ Info: Running with `Convex.solve2!`...
└   (MAX_ITERS, m, n, k) = (1, 500, 500, 5)
 43.958516 seconds (133.30 k allocations: 10.646 GiB, 4.62% gc time)

(note that I set MAX_ITERS to 1 for that last one, for the sake of time).

One thing to note is that the performance of this approach is sensitive to missing specialized dispatches.

@ericphanson
Copy link
Collaborator Author

To achieve type stability with this approach (or most others), we will need to make atoms parametric on the number of dimensions, because we often have different implementations for vectors vs scalars vs matrices. Here, the implementation leads to a choice of AffineOperation and thus often a different output type. This must not depend on runtime values, in order to have type stability.

One other thought is that the VAFTape I introduce here could give the compiler problems if they get too long, since they are implemented as tuples, but we could collapse them down whenever they exceed length 15, say. This could even be type stable because the length of a tuple is not a runtime value.

@ericphanson
Copy link
Collaborator Author

The latest commit adds the unregistered dependency https://github.com/ericphanson/MatrixChainMultiply.jl to look at the impact of using that method for choosing how to evaluate the VAFTape objects. In the test problem it doesn't really make a difference, but maybe in other problems it might. (CI will start failing now due to the dependency, which is fine because I might start stripping out the old functionality here anyway).

@ericphanson
Copy link
Collaborator Author

I'm starting to see how this can all work together. This rewrite is actually not very different than how Convex.jl works now, but I think will actually end up a lot simpler. The idea is the following (note this is mostly for myself to put into the dev docs at some point..):

There are two "levels" at which Convex operates: the high-level that users use, and the MOI/VAFTape level. At the high-level, we have atoms and constraint objects which provide dispatches for usual functions (like *, == , +, norm, etc) to generate a Convex.jl object (an atom or a constraint).

  • The function template(obj, context::Context) lowers an atom to the "MOI level", returning its MOI-level-representation. At the "MOI level", all variables and values are vectorized (instead of being matrices or scalars), and each object is turned into either a VAFTape (or a SparseVAFTape, depending if USE_SPARSE() == true), or a vector of numbers, and the MOI model living in the context object may be updated (say, by adding a constraint).
  • The function add_constraint_to_context(constr, context::Context) adds MOI constraints to the model living in context (possibly adding variables and doing other operations to the model as well), returning nothing.

As a design point, I will try to do as much as possible at the high-level, since this makes the package most accessible to contributions from Convex.jl users, and simplifies the codebase. The functions many atoms implement do not actually need to be represented by atoms; e.g. I've already removed the DotMultiplicationAtom and the PartialTransposeAtom, in favor of just writing out the transformation in Convex.jl's high level syntax. We only need to implement a small set of primitives (+, -, *, reshape, and some constraints, and possibly some others) in order to recover most of the functionality.

In order to make use of more MOI features, however, (e.g. another cone like for vector geomean), a Convex.jl atom needs to be created, and a template dispatch added.

A note on SparseVAFTape vs VAFTape: these are two different ways of lazily representing a sequence of affine transformations to a vector of variables (and all of Convex.jl's reformulations are such transformations + adding constraints). SparseVAFTape represents each transformation by a single sparse matrix, which is nice in that we have the exact same type for every transformation, and the perf model is pretty simple to understand (this model is copied from CVXPY). VAFTape tries to be fancier and use structured types like UniformScaling and zero vectors that do not allocate, and separately stores a matrix multiplication and vector addition component. This has the disadvantage that each operation is possibly of a different type, so they are stored as a tuple, and type stability is more complicated. However, it might provide better performance in some situations. My plan for now is to implement both (they should only need a limited set of primitive methods) and benchmark to decide which to keep (or possibly keep both).

@ericphanson ericphanson marked this pull request as draft August 4, 2020 00:33
@ericphanson
Copy link
Collaborator Author

ericphanson commented Aug 9, 2020

I’m gonna leave this for now; I had it timeboxed to a week and couldn’t push it through but I think the idea works (the idea is very simple, mostly following the existing structure; just recurse through the expression tree, lazily building the affine transformations to the variables and eagerly adding constraints to the model. That being said this is the ~4th reimplementation I’ve written code for, so somehow it wasn’t obvious to me from the start). What’s left is just more implementation and some fixes (there’s some test failures that indicate I’ve made some mistakes), and cleanup to remove all the old code. Hopefully I or someone else can pick this up at some point soon! Also, I have tried out four slightly different ways of holding the tape of affine operations; I think SparseTape2 is the best so far (hold a sparse matrix A and dense vector b to represent Ax + b).

@ericphanson
Copy link
Collaborator Author

Just had a stray thought for how to do the hashing for deeply nested trees that repeat themselves. Currently we can have hash collisions (#78) and the hashing step can be a performance bottleneck (there is a discourse post somewhere). In this PR I removed it / didn’t implement it for the new way because of these issues. But I think we could do something simple like every atom is a mutable struct (so they are semantically distinguishable) and we cache their templates via an IdDict (already doing this for constraints to populate the duals). Then if you have a problem with this structure in which a bunch of subtrees are reused, the user just has to reuse the individual atoms and the cache will be hit. It requires the user to explicitly do this, but I think that’s ok because it avoids the spectre of incorrectness via hash collisions and the performance penalty of hashing everything just to help for this edge case. And I think requiring the user to reuse the atom in order to reuse the template is reasonable and transparent.

@chriscoey
Copy link

I haven't looked in detail at what is happening here yet but I saw you mentioned complex variables in the initial post, and there now exists https://github.com/jump-dev/ComplexOptInterface.jl.
@ericphanson have you had any recent thoughts about this PR?

@ericphanson
Copy link
Collaborator Author

https://github.com/jump-dev/ComplexOptInterface.jl

I did see that, but it was/is still pretty early for that effort. My thought in this PR was to just continue supporting complex variables for the current atoms by reformulating as we already doing but that future work might be able to involve that.

@ericphanson have you had any recent thoughts about this PR?

No recent ones, but to summarize my thoughts on this:

  • We have one interface point with MOI which is when we load the model. We should make an MOI model at the start and allow individual atoms to add constraints directly to the model as is done in this PR. This would allow using more general cones.
  • Convex's current system of using ordered dictionaries makes it hard to have type stability unless we have a uniform output type (e.g. "everything is a spare array" like it seems CVXPY does), and even then that might not be the right structure because...
  • ...We deal with lots of expressions that are essentially A*B*C*D*E*F*x after reformulation, where x is a vector of variables. In the objective, A is 1 x n (to get a scalar objective out), which means we should probably multiply these left-to-right. The current implementation makes this is a bit hidden but if I remember correctly, I think we essentially are multiplying right-to-left, i.e. starting with F, the deepest bit of the expression tree, and building upward to A. For many models it will be better to store a tape as we build the expression tree and once we have all the expressions decide the order to evaluate it.
  • this touches on the fact that Convex represents everything as vectors/matrices but MOI currently uses scalar variables and variable types. This mismatch likely causes performance issues, but we can avoid those by only scalarizing at the end, making use of as many optimized matmuls as we can before then (e.g. BCDEF). Though these operations are often sparse and the product of sparse matrices may not be, which adds its own difficulties
  • for hashing I think Proof of concept more-MOIified implementation #393 (comment) is probably the way to go, since we mostly use mutable structs anyway. (this relates to the previous point, that in this paradigm you want vector/matrix operations, not scalar ones, since you need to allocate for everything you do but that's usually negligible if it doesn't scale with problem size).
  • In general, I think it would be great to have more MOI frontends. E.g. "operator-overloading" vs "macro DSL" is a completely orthogonal choice AFAIK to "convex + extended formulations" vs "only direct formulations", which itself seems largely orthogonal to vectorized operations vs scalar. But out of all the combinations, I think currently you can only choose either (macro DSL, direct formulations, scalar operations), i.e. JuMP, or (operator overloading, extended formulations, vectorized operations), i.e. Convex. (There's actually another choice, Parametron.jl, which treats a more narrow set of problems in a zero-allocation way which is pretty cool). Perhaps some kind of compositional arrangement would let users efficiently choose the combination that fits their problem/domain.

Also, unfortunately I probably won't be working much on this stuff for the forseeable future, but I'm happy to share thoughts / knowledge of convex internals with anyone interested in working on it.

@ericphanson
Copy link
Collaborator Author

closing in favor of #504

@ericphanson ericphanson closed this Jul 6, 2023
@odow odow deleted the eph/moi2 branch May 8, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants