Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocessFile performance issues #10489

Open
parsonsmatt opened this issue Oct 29, 2024 · 1 comment
Open

preprocessFile performance issues #10489

parsonsmatt opened this issue Oct 29, 2024 · 1 comment
Labels
type: RFC Requests for Comment

Comments

@parsonsmatt
Copy link
Collaborator

Hello! We've noticed that the Preprocessing library for ... part of setting up cabal repl in our very large package is taking a long time - somewhere between 10s and 60s. We have 10k+ modules and I don't think we have any files to preprocess by this mechanism (though we do have a few that use the -pgmF formatters - unsure if this is contributing).

I decided to dig into the source code and have identified some potential issues -

The code does for_ mods $ pre .... This is probably the easiest place to get a win by introducing some concurrency.

The main part of the code in preprocessFile does a file lookup for each of the suffixes in each of the directories in the search path. We have two search paths, and there are seven extensions: that's 14 calls to doesFileExist that will (in the common case) fail. I think this could be refactored to avoid the wasteful lookups in the more common case of "there's a .hs file".

Likewise, prepending buildAsSrcLoc : searchLoc is going to trigger an extra file lookup in the common case of "the module is in a hs-source-dirs." Doing searchLoc ++ [buildAsSrcLoc] should save a lookup.

In findFileCwdWithExtension' , we call ordNub on the search path and the extensions every time. While the two lists are very small and ordNub is efficient, we're hitting it ~20k times, meaning we're allocating 20k sets of 2 and 20k sets of 7. Using a newtype Nubbed a = Nubbed [a] with an mkNubbed :: (Ord a) => [a] -> Nubbed a would save this work from being repeated in a type-safe way, since the paths and extensions are pretty much shared in each invocation.

I'm happy to prepare a PR to do some of these performance improvements.

@ulysses4ever
Copy link
Collaborator

Hey! These sound like good ideas. I am not entirely sure how concurrency should be controlled. But the rest sounds like no-brainer.

@ulysses4ever ulysses4ever added type: RFC Requests for Comment and removed type: user-question labels Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: RFC Requests for Comment
Projects
None yet
Development

No branches or pull requests

2 participants