You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! We've noticed that the Preprocessing library for ... part of setting up cabal repl in our very large package is taking a long time - somewhere between 10s and 60s. We have 10k+ modules and I don't think we have any files to preprocess by this mechanism (though we do have a few that use the -pgmF formatters - unsure if this is contributing).
I decided to dig into the source code and have identified some potential issues -
The code does for_ mods $ pre .... This is probably the easiest place to get a win by introducing some concurrency.
The main part of the code in preprocessFile does a file lookup for each of the suffixes in each of the directories in the search path. We have two search paths, and there are seven extensions: that's 14 calls to doesFileExist that will (in the common case) fail. I think this could be refactored to avoid the wasteful lookups in the more common case of "there's a .hs file".
Likewise, prepending buildAsSrcLoc : searchLoc is going to trigger an extra file lookup in the common case of "the module is in a hs-source-dirs." Doing searchLoc ++ [buildAsSrcLoc] should save a lookup.
In findFileCwdWithExtension' , we call ordNub on the search path and the extensions every time. While the two lists are very small and ordNub is efficient, we're hitting it ~20k times, meaning we're allocating 20k sets of 2 and 20k sets of 7. Using a newtype Nubbed a = Nubbed [a] with an mkNubbed :: (Ord a) => [a] -> Nubbed a would save this work from being repeated in a type-safe way, since the paths and extensions are pretty much shared in each invocation.
I'm happy to prepare a PR to do some of these performance improvements.
The text was updated successfully, but these errors were encountered:
Hello! We've noticed that the
Preprocessing library for ...
part of setting upcabal repl
in our very large package is taking a long time - somewhere between 10s and 60s. We have 10k+ modules and I don't think we have any files to preprocess by this mechanism (though we do have a few that use the-pgmF
formatters - unsure if this is contributing).I decided to dig into the source code and have identified some potential issues -
The code does
for_ mods $ pre ...
. This is probably the easiest place to get a win by introducing some concurrency.The main part of the code in
preprocessFile
does a file lookup for each of the suffixes in each of the directories in the search path. We have two search paths, and there are seven extensions: that's 14 calls todoesFileExist
that will (in the common case) fail. I think this could be refactored to avoid the wasteful lookups in the more common case of "there's a.hs
file".Likewise, prepending
buildAsSrcLoc : searchLoc
is going to trigger an extra file lookup in the common case of "the module is in ahs-source-dirs
." DoingsearchLoc ++ [buildAsSrcLoc]
should save a lookup.In
findFileCwdWithExtension'
, we callordNub
on the search path and the extensions every time. While the two lists are very small andordNub
is efficient, we're hitting it ~20k times, meaning we're allocating 20k sets of 2 and 20k sets of 7. Using anewtype Nubbed a = Nubbed [a]
with anmkNubbed :: (Ord a) => [a] -> Nubbed a
would save this work from being repeated in a type-safe way, since the paths and extensions are pretty much shared in each invocation.I'm happy to prepare a PR to do some of these performance improvements.
The text was updated successfully, but these errors were encountered: