`preprocessFile` performance issues #10489

parsonsmatt · 2024-10-29T21:54:14Z

Hello! We've noticed that the Preprocessing library for ... part of setting up cabal repl in our very large package is taking a long time - somewhere between 10s and 60s. We have 10k+ modules and I don't think we have any files to preprocess by this mechanism (though we do have a few that use the -pgmF formatters - unsure if this is contributing).

I decided to dig into the source code and have identified some potential issues -

The code does for_ mods $ pre .... This is probably the easiest place to get a win by introducing some concurrency.

The main part of the code in preprocessFile does a file lookup for each of the suffixes in each of the directories in the search path. We have two search paths, and there are seven extensions: that's 14 calls to doesFileExist that will (in the common case) fail. I think this could be refactored to avoid the wasteful lookups in the more common case of "there's a .hs file".

Likewise, prepending buildAsSrcLoc : searchLoc is going to trigger an extra file lookup in the common case of "the module is in a hs-source-dirs." Doing searchLoc ++ [buildAsSrcLoc] should save a lookup.

In findFileCwdWithExtension' , we call ordNub on the search path and the extensions every time. While the two lists are very small and ordNub is efficient, we're hitting it ~20k times, meaning we're allocating 20k sets of 2 and 20k sets of 7. Using a newtype Nubbed a = Nubbed [a] with an mkNubbed :: (Ord a) => [a] -> Nubbed a would save this work from being repeated in a type-safe way, since the paths and extensions are pretty much shared in each invocation.

I'm happy to prepare a PR to do some of these performance improvements.

The text was updated successfully, but these errors were encountered:

ulysses4ever · 2024-10-30T00:47:22Z

Hey! These sound like good ideas. I am not entirely sure how concurrency should be controlled. But the rest sounds like no-brainer.

parsonsmatt added the type: user-question label Oct 29, 2024

ulysses4ever added type: RFC Requests for Comment and removed type: user-question labels Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`preprocessFile` performance issues #10489

`preprocessFile` performance issues #10489

parsonsmatt commented Oct 29, 2024

ulysses4ever commented Oct 30, 2024

preprocessFile performance issues #10489

preprocessFile performance issues #10489

Comments

parsonsmatt commented Oct 29, 2024

ulysses4ever commented Oct 30, 2024

`preprocessFile` performance issues #10489

`preprocessFile` performance issues #10489