Remplace slower function for pseudo_legal evaluation. #2581

OuaisBla · 2020-03-12T22:09:58Z

Commit	pseudo_Legal_speed_up_test (Remplace slower function for pseudo_legal evaluation. Fix bench.)
Info	Test with 2 threads to look for race possible conditions.
Submitter	OuaisBla
TC	10+0.1
SPRT	elo0: -0.50 alpha: 0.05 elo1: 1.50 beta: 0.05 (logistic)
LLR	2.94 [-2.94,2.94] (accepted)
Elo	6.05 [2.06,10.01] (95%)
LOS	99.8%
Games	9964 [w:18.1%, l:16.3%, d:65.6%]
Link	/tests/view/5e69b273e42a5c3b3ca2e442

Commit	pseudo_Legal_speed_up_test (Remplace slower function for pseudo_legal evaluation. Fix bench.)
Info	Following a successfull STC.
Submitter	OuaisBla
TC	60+0.6
SPRT	elo0: 0.25 alpha: 0.05 elo1: 1.75 beta: 0.05 (logistic)
LLR	-1.62 [-2.94,2.94]
Elo	0.72 [-0.43,1.93] (95%)
LOS	89.1%
Games	84188 [w:11.8%, l:11.6%, d:76.6%]
Link	/tests/view/5e69b797e42a5c3b3ca2e44b

(cherry picked from commit dd94898)

vondele · 2020-03-14T15:40:28Z

@OuaisBla, YI, the LTC test failed, which makes it less likely to apply such a patch.

This could be a burden for those people maintaining chess variants of SF. For example, did you consider chess960?
How did you prove the code correct, for arbitrary input moves?
There were some code reviews by @protonspring, did you consider those?

OuaisBla · 2020-03-14T16:32:43Z

@vondele Do you really consider a 0.72 ELO Gain a fail at LTC? It didn't failed because its a yellow over more than 100k games played. Plus at STC it's a huge success of 6 ELO gain.

I run a unittest to make sure that the function behaves properly in either good or bad cases. I.e. I ran a test over many benchmark position and look that if the moves is valid, to track for false positive or false negative as well. The google unit test is in attachement for reference.

pseudo_legal_test.cpp.txt

On top of the standard perft FEN position, I use all those position specialized for not normal moves: https://gist.github.com/peterellisjones/8c46c28141c162d1d8a0f0badbc9cff9

From this reference: https://peterellisjones.com/posts/generating-legal-chess-moves-efficiently/

Yes, I consider the code review of @protonspring. But like you said it yourself, this patch is a big one that can impact SF more that simple fine tune. So it would be unwise at this point to refact it purely on speculation of unlikely speed up gain. That applies to the comment made by @Sopel97 as well for the refact of the switch case portion. I consider it, but decide to not to change the code. Over all the games played by fishtest, there was 0 crash. Unless a bug is found on top of all the tests done, I won't touch it and recommend to integrate it as it is. The quality of the patch has been demonstrated to qualify for a integration on the master. Further improvement could be made afterward, but we need a baseline for this and this patch is a real canditate.

For chess960, the only possible concern is the CASTLING section. The normal case is the same as before. The PROMOTION and ENPASSANT are the same. In the code, no square are hardcoded and regular functions were used like everybody doing it without specific concerns regarding 960. I see no problems and also portion of the benchmark that has some 960 position and the bench number is exactly the same in all position.

So, for me that patch is a speed up that has measurable ELO gain in both STC (a lot) and even in LTC.

As for the future of SF, I would bet a lot that everybody will like to use a more faster pseudo_legal function that doesn't rely on the heavy MoveList< LEGAL > (*this) call.

protonspring · 2020-03-14T16:50:47Z

I think it's important to note that Yellow = failure. This means that the patch doesn't verify any ELO gain at all. Be cautious of interpreting noise as ELO. Your elo test at STC was most likely noise. This happens quite frequently. My suggestions are an effort to help the test actually pass SPRT. If that happens, then this is a different discussion. In short, if your test doesn't pass (passing = GREEN), it's not likely going to go anywhere. We're VERY careful around here. We do things for specific reasons and unfortunately, this patch doesn't look like (yet), it's going to make it. I appreciate your contributions, so please keep trying.

OuaisBla · 2020-03-14T18:01:59Z

There is measurable ELO gain of 0.72 at 95% confidence level. The LLR failed because of the more agressive bounfs of [0 .25, 1.75]. Less agressive bounds would have pass easily.

What's the point of having a 3 color code, including yellow. if the verdict is binary pass/failed status.

Yellow means not converging fast enough so you could either change the bounds or use you judgment to accept it or reject if. Measurable ELO gain is not linked to the LLR status.

So far, no argument unless the color code of the LTC run has been provide to reject this patch. ELO gain has been mesured factually without a doubts. Anyone at this point could integrate it and fork a new SF that will constantly beat that actual slower SF on the master at any given TC and hapilly leave with it. At the rate we are now discovering new upgrades, which is really low, I would have assumed that many people would have been happy with that ELO gain yet. I wonder why there is still much to do to prove it.

I'll keep it as it is because its too risky to change it at the moment without to redo all the tests.

vondele · 2020-03-14T18:04:04Z

@OuaisBla don't get it wrong. Yellow is a fail. It is indicative of being stronger than master, and can be seen as an encouragement to further improve things.

protonspring · 2020-03-14T18:07:22Z

Dude. I get it. This has happened to me a dozen times. The process has worked pretty well for a long time and probably won't change anytime soon.

ianfab · 2020-03-14T18:20:25Z

There is a good reason why the SPRT bounds are asymmetric. A patch not only has to prove it gains elo, but also that it gains enough elo that it outweighs the added code complexity. Especially in this case where the patch adds a lot of complexity and possibly also fragility the elo improvement has to be significant.

OuaisBla · 2020-03-14T18:47:22Z

I understand all your point.

One can argue that snce the code deosn't depends on movelist anymore the complexiity is in fact lower overall. So I may have remove complexity, specially if we speak about cyclical complexity.

That speed up is more relevant in STC. But significant too in LTC. Its not a functionnal change, so that's why the LTC isn't as spectacular as the STC. To longer the TC the lesser this patch works. A functionak changes works differently. Well, i simply can't improve it more.

Mixing functionnal change in speed up improvement to further improve is too risky. Pseudo_legal is rooted deep into the movepicker class with requires refactoring is the pseudo legal logic is changed.

I did lot of work to make that patch works and I understand its borderline. But for thoses reasons its either take it or leave it. I'll check some cyclical complexity metrics to be sure.

OuaisBla · 2020-03-17T01:19:49Z

Here's some cyclomatic numbers:

Metrics Details For File 'movegen.cpp' and Position::pseudo_legal()

Functions: Complexity
generate_all() 9
generate_moves() 10
generate_pawn_moves() 30
make_promotions() 10
Position::pseudo_legal() 21

The newer function remove all the complexity from movegen.cpp to add some in the function pseudo_legal.

Functions and Methods in 1 Class(es): Complexity
Position::pseudo_legal() 30

Overall the code is less complex by removing a huge dependency but the function itself is more complex.

For fun, please note that search() in search.cpp is a record of high cyclomatic complexity up to 216.

Metrics Details For File 'search.cpp'

Functions and Methods in 8 Class(es): Complexity
search() 216

OuaisBla · 2020-03-17T22:48:46Z

Standard bench give a 1.4% increase in speed.

Results for 500 tests for each version:

        Base      Test      Diff      
Mean    1564443   1542154   22289     
StDev   14624     14134     6484

p-value: 0
speedup: -0,014

protonspring · 2020-03-17T23:25:46Z

If you are on a linux machine, could you also do:

sudo perf stat -r 30 -a -B -e cyles:u,instructions:u ./stockfish bench > /dev/null

for both this patch and master? This is probably a much more accurate bench test.

OuaisBla · 2020-03-18T01:40:38Z

I don't have access to a linux machine. I could try on another machine though with a different processor.

OuaisBla · 2020-03-18T18:03:07Z

Nevermind,

vondele · 2020-03-23T21:07:52Z

I'll also refer to the open issue #2593 in this context

Remplace slower function for pseudo_legal evaluation. Fix bench.

616ca9e

(cherry picked from commit dd94898)

OuaisBla closed this Mar 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remplace slower function for pseudo_legal evaluation. #2581

Remplace slower function for pseudo_legal evaluation. #2581

OuaisBla commented Mar 12, 2020

vondele commented Mar 14, 2020

OuaisBla commented Mar 14, 2020 •

edited

Loading

protonspring commented Mar 14, 2020 via email

OuaisBla commented Mar 14, 2020

vondele commented Mar 14, 2020

protonspring commented Mar 14, 2020 via email •

edited

Loading

ianfab commented Mar 14, 2020

OuaisBla commented Mar 14, 2020

OuaisBla commented Mar 17, 2020

OuaisBla commented Mar 17, 2020

protonspring commented Mar 17, 2020

OuaisBla commented Mar 18, 2020

OuaisBla commented Mar 18, 2020

vondele commented Mar 23, 2020

Remplace slower function for pseudo_legal evaluation. #2581

Remplace slower function for pseudo_legal evaluation. #2581

Conversation

OuaisBla commented Mar 12, 2020

vondele commented Mar 14, 2020

OuaisBla commented Mar 14, 2020 • edited Loading

protonspring commented Mar 14, 2020 via email

OuaisBla commented Mar 14, 2020

vondele commented Mar 14, 2020

protonspring commented Mar 14, 2020 via email • edited Loading

ianfab commented Mar 14, 2020

OuaisBla commented Mar 14, 2020

OuaisBla commented Mar 17, 2020

Metrics Details For File 'movegen.cpp' and Position::pseudo_legal()

Metrics Details For File 'search.cpp'

OuaisBla commented Mar 17, 2020

protonspring commented Mar 17, 2020

OuaisBla commented Mar 18, 2020

OuaisBla commented Mar 18, 2020

vondele commented Mar 23, 2020

OuaisBla commented Mar 14, 2020 •

edited

Loading

protonspring commented Mar 14, 2020 via email •

edited

Loading