-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues #27
Comments
Again looking through the code I can see that instead of using a |
Hello @mingodad, thank you very much for your detailed report on this problem. You're totally right, the current implementation is a naive exponential comparison of lists, mostly because the lists are self-implemented in libphorward. There was no focus on performance yet, but on usability. There are couple of things inside unicc that could be improved. But due its very rare use, lack of time and priority on other projects, I currently don't have the time on dealing with this issues, because it might require a deeper examination to resolve the problem properly.
Normally, there won't be any duplicates, so your attempt sounds useful. There's also the parray structure which might be useful as well for specific use-cases (this was used in the unfinished unicc2 code base in several situations to increase speed and decrease memory usage), but it requires a lot of effort to rewrite the current implementation from plist to parray. If you have any proposal, or a quick working solution in form of a pull request, this is very welcome! |
Adding a cached sorted list array the total time to parse was reduced around 40% and the profile output is now:
|
Profiling with with optimization now I'm getting this:
|
Trying to build the grammar from postgresql mentioned here #22 (comment) it takes a long time to generate the parser and looking at the profile it seems that most of the time is spent on
size_t plist_union( plist* all, plist* from )
(see bellow) that is doing a naive exponential comparison/insertion between two lists.Looking through the code I think that creating a sorted list first and then doing a binary search on it to decide to
push/add
one list element could improve things (not ideal but it's something), but I'm not sure about it because I don't know if the lists can have duplicates.Any thought about this ?
The text was updated successfully, but these errors were encountered: