Performance issues #27

mingodad · 2022-07-16T09:34:18Z

Trying to build the grammar from postgresql mentioned here #22 (comment) it takes a long time to generate the parser and looking at the profile it seems that most of the time is spent on size_t plist_union( plist* all, plist* from ) (see bellow) that is doing a naive exponential comparison/insertion between two lists.

Looking through the code I think that creating a sorted list first and then doing a binary search on it to decide to push/add one list element could improve things (not ideal but it's something), but I'm not sure about it because I don't know if the lists can have duplicates.

Any thought about this ?

/usr/bin/time unicc-nb postgresql-13.3.y.par 
104.40user 0.10system 1:44.57elapsed 99%CPU (0avgtext+0avgdata 135872maxresident)k
152inputs+11096outputs (0major+34371minor)pagefaults 0swaps

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 10.95      8.67     8.67 1398180161     0.00     0.00  plist_compare
 10.50     16.98     8.31 12371648     0.00     0.00  plist_union
  9.83     24.76     7.78 3259644116     0.00     0.00  plist_next
  9.61     32.37     7.61  2294100     0.00     0.00  close_item
  7.97     38.68     6.31        1     6.31     6.38  detect_default_productions
  7.06     44.27     5.59     8438     0.00     0.01  lalr1_closure
  5.59     48.69     4.43 2272182630     0.00     0.00  plist_access
  5.48     53.03     4.34 1343718440     0.00     0.00  pccl_count
  5.00     56.99     3.96  6007214     0.00     0.00  list_push
  3.27     59.58     2.59 209994052     0.00     0.00  list_count
  2.69     61.71     2.13 113372032     0.00     0.00  plistel_drop
  2.30     63.53     1.82                             plist_key
  2.08     65.18     1.65 582855415     0.00     0.00  pccl_intersect
  1.91     66.69     1.51     3306     0.00     0.00  pregex_dfa_from_nfa
  1.63     67.98     1.29  1223780     0.00     0.00  list_find
  1.48     69.15     1.18 78921959     0.00     0.00  test_same_kernel
  1.26     70.15     1.00 586092893     0.00     0.00  pccl_compat
  1.12     71.04     0.89        1     0.89     3.53  check_regex_anomalies
  0.86     71.72     0.68 11977201     0.00     0.00  plist_erase
  0.84     72.38     0.67 113382156     0.00     0.00  plist_insert
  0.78     73.00     0.62 27035168     0.00     0.00  pccl_testrange
  0.75     73.59     0.59 179834200     0.00     0.00  sort_symbols
  0.57     74.04     0.45   855253     0.00     0.00  pregex_ptn_to_nfa
  0.44     74.39     0.35                             plist_set_printfn
  0.40     74.71     0.32    11081     0.00     0.00  plist_subsort
  0.37     75.00     0.29 35355186     0.00     0.00  plist_remove
  0.35     75.28     0.28 130690767     0.00     0.00  pmalloc
  0.32     75.53     0.25 247164803     0.00     0.00  pfree
  0.30     75.77     0.24 10357446     0.00     0.00  plist_get_by_ptr
  0.26     75.97     0.21 154645260     0.00     0.00  plist_first
  0.24     76.16     0.19  4939448     0.00     0.00  pccl_create
  0.24     76.35     0.19  1447245     0.00     0.00  pccl_compare
  0.20     76.51     0.16 124362026     0.00     0.00  plist_last

The text was updated successfully, but these errors were encountered:

mingodad · 2022-07-16T09:55:25Z

Again looking through the code I can see that instead of using a BitArray for computing first/follow sets it's using plist.
Probably plist is fine to use when parsing the grammar but then converting then to arrays and doing all processing on then would improve the performance.

phorward · 2022-07-21T09:02:14Z

Hello @mingodad, thank you very much for your detailed report on this problem.

You're totally right, the current implementation is a naive exponential comparison of lists, mostly because the lists are self-implemented in libphorward. There was no focus on performance yet, but on usability.

There are couple of things inside unicc that could be improved. But due its very rare use, lack of time and priority on other projects, I currently don't have the time on dealing with this issues, because it might require a deeper examination to resolve the problem properly.

Looking through the code I think that creating a sorted list first and then doing a binary search on it to decide to push/add one list element could improve things (not ideal but it's something), but I'm not sure about it because I don't know if the lists can have duplicates.

Normally, there won't be any duplicates, so your attempt sounds useful. There's also the parray structure which might be useful as well for specific use-cases (this was used in the unfinished unicc2 code base in several situations to increase speed and decrease memory usage), but it requires a lot of effort to rewrite the current implementation from plist to parray.

If you have any proposal, or a quick working solution in form of a pull request, this is very welcome!

mingodad · 2022-07-22T12:33:23Z

Adding a cached sorted list array the total time to parse was reduced around 40% and the profile output is now:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 14.87      7.97     7.97  2368251     0.00     0.00  close_item
 13.43     15.17     7.20     8735     0.00     0.00  lalr1_closure
 11.77     21.48     6.31        1     6.31     6.37  detect_default_productions
  9.86     26.77     5.29  6222279     0.00     0.00  list_push
  6.39     30.19     3.43 208112882     0.00     0.00  list_count
  6.32     33.58     3.39 2069372468     0.00     0.00  plist_access
  4.91     36.21     2.63 1860430152     0.00     0.00  plist_next
  3.40     38.03     1.82 65828038     0.00     0.00  bsearch_r
  3.02     39.65     1.62 865039536     0.00     0.00  plist_compare
  2.97     41.24     1.59 1444959779     0.00     0.00  plist_prev
  2.95     42.82     1.58  1206508     0.00     0.00  list_find
  2.72     44.28     1.46 34071855     0.00     0.00  plist_get_by_ptr
  2.19     45.45     1.18 77750733     0.00     0.00  test_same_kernel
  1.72     46.37     0.92  2109590     0.00     0.00  plist_offset
  1.66     47.26     0.89 234329705     0.00     0.00  sort_symbols
  1.65     48.15     0.89 390224369     0.00     0.00  cmp_ordered_list_bsearch
  1.51     48.96     0.81                             ordered_list_compare
  0.95     49.47     0.51 73347230     0.00     0.00  plist_insert
  0.76     49.88     0.41                             plist_set_printfn
  0.76     50.28     0.41                             plist_key
  0.60     50.60     0.32 73337106     0.00     0.00  plistel_drop
  0.52     50.88     0.28     6008     0.00     0.00  plist_subsort
  0.50     51.15     0.27 12956766     0.00     0.00  plist_union
  0.49     51.41     0.26 10123214     0.00     0.00  plist_erase
  0.40     51.63     0.22  1657945     0.00     0.00  recreate_ordered_list
  0.34     51.81     0.18 13145292     0.00     0.00  plist_get
  0.28     51.96     0.15 166448742     0.00     0.00  pfree
  0.28     52.11     0.15 85400518     0.00     0.00  pmalloc
  0.24     52.24     0.13 78722663     0.00     0.00  plist_first
  0.22     52.36     0.12                             plist_riter_access
  0.21     52.47     0.11 26824786     0.00     0.00  list_free
  0.20     52.57     0.11                             list_dup

/usr/bin/time unicc-nb postgresql-13.3.y.par 
60.34user 0.09system 1:00.44elapsed 99%CPU (0avgtext+0avgdata 140336maxresident)k
0inputs+11096outputs (0major+60347minor)pagefaults 0swaps

mingodad · 2022-07-22T18:54:36Z

Profiling with with optimization now I'm getting this:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 38.37     17.40    17.40        1    17.40    18.36  pregex_dfa_from_nfa
 25.76     29.08    11.68                             generate_tables
  9.13     33.22     4.14                             detect_default_productions
  6.68     36.25     3.03  4922934     0.00     0.00  list_push
  4.07     38.10     1.85 12956766     0.00     0.00  plist_union
  3.37     39.63     1.53  1206508     0.00     0.00  list_find
  2.94     40.96     1.34 442540698     0.00     0.00  plist_compare
  2.40     42.05     1.09        1     1.09     1.11  pregex_dfa_minimize
  1.55     42.76     0.71 234330865     0.00     0.00  sort_symbols
  1.03     43.22     0.47 489312439     0.00     0.00  plist_access
  0.82     43.59     0.37 73347229     0.00     0.00  plist_insert
  0.79     43.95     0.36    11971     0.00     0.00  ordered_list_compare
  0.71     44.27     0.32 10123214     0.00     0.00  plist_erase
  0.64     44.56     0.29     6008     0.00     0.00  plist_subsort
  0.33     44.71     0.15  1901053     0.00     0.00  plist_concat
  0.13     44.77     0.06 84101173     0.00     0.00  pmalloc
  0.11     44.82     0.05  8899391     0.00     0.00  seek_rhs_first
  0.11     44.87     0.05      486     0.00     0.00  pregex_ptn_to_NFA
  0.10     44.92     0.05 14962100     0.00     0.00  plist_count
  0.10     44.96     0.05                             pregex_qreplace
  0.09     45.00     0.04 13080956     0.00     0.00  plist_get
  0.09     45.04     0.04                             build_code
  0.08     45.08     0.04                             plist_set_printfn
  0.07     45.11     0.03  8212527     0.00     0.00  plist_get_by_ptr
  0.07     45.14     0.03  2106378     0.00     0.00  plist_diff
  0.07     45.17     0.03    24343     0.00     0.00  list_count
  0.06     45.19     0.03  1208019     0.00     0.00  create_item
  0.04     45.21     0.02  1423413     0.00     0.00  plist_init
  0.04     45.23     0.02  1244475     0.00     0.00  plist_first
  0.04     45.25     0.02   458244     0.00     0.00  pstrrender
  0.04     45.27     0.02    57916     0.00     0.00  find_tabcol
  0.03     45.29     0.02    66068     0.00     0.00  parray_compare
  0.02     45.30     0.01  1037143     0.00     0.00  list_free
  0.02     45.31     0.01     3306     0.00     0.00  free_state
  0.02     45.32     0.01        1     0.01     0.16  _parse
  0.02     45.33     0.01                             pdbl_to_wcs
  0.02     45.34     0.01                             plist_dbgstats

phorward added the performance label Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues #27

Performance issues #27

mingodad commented Jul 16, 2022

mingodad commented Jul 16, 2022

phorward commented Jul 21, 2022

mingodad commented Jul 22, 2022 •

edited

Loading

mingodad commented Jul 22, 2022

Performance issues #27

Performance issues #27

Comments

mingodad commented Jul 16, 2022

mingodad commented Jul 16, 2022

phorward commented Jul 21, 2022

mingodad commented Jul 22, 2022 • edited Loading

mingodad commented Jul 22, 2022

mingodad commented Jul 22, 2022 •

edited

Loading