Skip to content
This repository has been archived by the owner on Oct 25, 2023. It is now read-only.

Add progress information during builds #169

Merged
merged 6 commits into from
Apr 19, 2023
Merged

Add progress information during builds #169

merged 6 commits into from
Apr 19, 2023

Conversation

gebner
Copy link
Member

@gebner gebner commented Apr 6, 2023

[354/1978] Building Mathlib.Logic.IsEmpty
[354/1978] Building Mathlib.Data.Sum.Basic
[354/1978] Building Mathlib.Data.Sigma.Basic
[354/1978] Building Mathlib.Logic.Function.Conjugate
[354/1978] Building Mathlib.Algebra.Group.Defs
[354/1978] Building Mathlib.Logic.Pairwise
[355/1978] Building Mathlib.Init.Data.Int.Bitwise
[355/1978] Building Mathlib.Data.Num.Basic
[356/1978] Building Aesop.Search.Expansion.Simp.Basic
[356/1978] Building Aesop.Frontend.Extension.Init
[356/1978] Building Aesop.Frontend.RuleExpr

This is an extremely simple implementation, the two counters (finished / total) are passed around as IO.Ref. When starting a facet job, we increment the total count, when it's finished we increment the total count.

Unfortunately it's a bit hard in Lake to figure out if a Lean module needs to rebuilt before its dependencies are built, so it might begin with [1500/1978] if you have already built 1500 files before calling Lake. (To be clear, I'd prefer [0/478] in that case.)

@gebner gebner added the enhancement New feature or request label Apr 6, 2023
@tydeu
Copy link
Member

tydeu commented Apr 6, 2023

While I appreciate the progress information, I sadly have a number of issues with how it was accomplished. For one, you altered how Lean modules are compiled, weaken C file hashing, and broke the lean-only flag. The IO.Ref is also a hard break of the functional, monadic approach I was going for in Lake. I take its necessity as a good indication that this not the most principled solution to this problem.

As progress information was on my TODO list, I have given some thought about how best to do this. My ideal solution was to collect jobs in the the scheduler monad (and logs in the jobs). Then, after scheduling the jobs, Lake would iterate through them, awaiting and printing the results and logs for each in turn. Since that is a reasonably large refactor, I sadly did not have time to do it last summer, but was planning on doing it this one (and the way things are looking, that is still the plan).

For the reasons above, I am thus not inclined to accept this PR as is. There is still a question of whether it is worth accepting this tech debt now to get this feature and then do a more principled refactor (like my plan) later. But, I would imagine this feature is not that worth it. Sorry for the downer. :(

@gebner
Copy link
Member Author

gebner commented Apr 10, 2023

I agree that a bigger refactor of the build scheduler is probably a better and more principled solution for this feature here, but I don't think it's a significant technical debt. It's trivial to back out again but extremely helpful in the meantime.

you altered how Lean modules are compiled

Yes, for very good reasons.

  1. The refactor brings it in line with how everything else is built, by simply declaring a module facet for it (instead of manipulating the store directly).
  2. It removes a lot of duplicate (and untested) code paths.
  3. It was possible that two Lean processes for the same module could run concurrently (IIUC), potentially corrupting the output files.
  4. It significantly reduces build overhead. A no-op lake build on mathlib4 goes from 1190ms to 790ms, a 33% speedup.

weaken C file hashing

Can you explain that comment? The trace still contains the hash of the C file, so it's dependencies would still be recompiled when the C file changes. There is very little information on what precise effects the traces have, so I'd be curious to know more.

broke the lean-only flag.

The Lean-only flag was unused. Its only use would be for pure formalization projects like mathlib, and we don't set it. More generally, we want to make building C files / code generation a separate step at some point. And then the Lean-only flag won't make any sense at all.

Lake/Build/Module.lean Outdated Show resolved Hide resolved
mkFacetJobConfigSmall fun mod => do
(← mod.leanBin.fetch).bindSync fun _ _ =>
-- do content-aware hashing so that we avoid recompiling unchanged C files
return (mod.cFile, ← computeTrace mod.cFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there was a reason I included getLeanTrace in the C file's trace, but sadly I do not remember what it was.

@tydeu
Copy link
Member

tydeu commented Apr 15, 2023

@gebner Okay, I think you have convinced me. I do have some minor code comments. Otherwise, LGTM.

@tydeu
Copy link
Member

tydeu commented Apr 15, 2023

Also, in addition to the code tweaks, we should probably warn on the isLeanOnly option (and remove it from the README) since it no longer does anything. EDIT: Done.

@tydeu tydeu added the performance Timing and resource usage label Apr 15, 2023
@gebner
Copy link
Member Author

gebner commented Apr 17, 2023

we should probably warn on the isLeanOnly option (and remove it from the README) since it no longer does anything.

Oh, I forgot about the option. Should we just delete it entirely?

@tydeu
Copy link
Member

tydeu commented Apr 18, 2023

@gebner I have made some changes to the original PR (deprecating the isLeanOnly flag and modifying trace computation). You may take a look at these changes if you like. If you find them acceptable, I will rebase and merge the PR.

@gebner
Copy link
Member Author

gebner commented Apr 19, 2023

You may take a look at these changes if you like. If you find them acceptable, I will rebase and merge the PR.

Sounds good to me!

@tydeu tydeu merged commit 257fc59 into master Apr 19, 2023
@tydeu
Copy link
Member

tydeu commented Apr 19, 2023

@gebner If you want the changes soon, feel free to merge master into lean4-master and update Lean accordingly.

@tydeu tydeu deleted the progress branch April 19, 2023 01:00
Julian added a commit to leanprover-community/lean4-metaprogramming-book that referenced this pull request Jul 13, 2023
Julian added a commit to Julian/lean4-metaprogramming-book that referenced this pull request Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request performance Timing and resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants