Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remplace slower function for pseudo_legal evaluation. #2581

Conversation

OuaisBla
Copy link

Commit pseudo_Legal_speed_up_test (Remplace slower function for pseudo_legal evaluation. Fix bench.)
Info Test with 2 threads to look for race possible conditions.
Submitter OuaisBla
TC 10+0.1
SPRT elo0: -0.50  alpha: 0.05  elo1: 1.50  beta: 0.05 (logistic)
LLR 2.94 [-2.94,2.94] (accepted)
Elo 6.05 [2.06,10.01] (95%)
LOS 99.8%
Games 9964 [w:18.1%, l:16.3%, d:65.6%]
Link /tests/view/5e69b273e42a5c3b3ca2e442
Commit pseudo_Legal_speed_up_test (Remplace slower function for pseudo_legal evaluation. Fix bench.)
Info Following a successfull STC.
Submitter OuaisBla
TC 60+0.6
SPRT elo0: 0.25  alpha: 0.05  elo1: 1.75  beta: 0.05 (logistic)
LLR -1.62 [-2.94,2.94]
Elo 0.72 [-0.43,1.93] (95%)
LOS 89.1%
Games 84188 [w:11.8%, l:11.6%, d:76.6%]
Link /tests/view/5e69b797e42a5c3b3ca2e44b

@vondele
Copy link
Member

vondele commented Mar 14, 2020

@OuaisBla, YI, the LTC test failed, which makes it less likely to apply such a patch.

  • This could be a burden for those people maintaining chess variants of SF. For example, did you consider chess960?
  • How did you prove the code correct, for arbitrary input moves?
  • There were some code reviews by @protonspring, did you consider those?

@OuaisBla
Copy link
Author

OuaisBla commented Mar 14, 2020

@vondele Do you really consider a 0.72 ELO Gain a fail at LTC? It didn't failed because its a yellow over more than 100k games played. Plus at STC it's a huge success of 6 ELO gain.

I run a unittest to make sure that the function behaves properly in either good or bad cases. I.e. I ran a test over many benchmark position and look that if the moves is valid, to track for false positive or false negative as well. The google unit test is in attachement for reference.

pseudo_legal_test.cpp.txt

On top of the standard perft FEN position, I use all those position specialized for not normal moves: https://gist.github.com/peterellisjones/8c46c28141c162d1d8a0f0badbc9cff9

From this reference: https://peterellisjones.com/posts/generating-legal-chess-moves-efficiently/

Yes, I consider the code review of @protonspring. But like you said it yourself, this patch is a big one that can impact SF more that simple fine tune. So it would be unwise at this point to refact it purely on speculation of unlikely speed up gain. That applies to the comment made by @Sopel97 as well for the refact of the switch case portion. I consider it, but decide to not to change the code. Over all the games played by fishtest, there was 0 crash. Unless a bug is found on top of all the tests done, I won't touch it and recommend to integrate it as it is. The quality of the patch has been demonstrated to qualify for a integration on the master. Further improvement could be made afterward, but we need a baseline for this and this patch is a real canditate.

For chess960, the only possible concern is the CASTLING section. The normal case is the same as before. The PROMOTION and ENPASSANT are the same. In the code, no square are hardcoded and regular functions were used like everybody doing it without specific concerns regarding 960. I see no problems and also portion of the benchmark that has some 960 position and the bench number is exactly the same in all position.

So, for me that patch is a speed up that has measurable ELO gain in both STC (a lot) and even in LTC.

As for the future of SF, I would bet a lot that everybody will like to use a more faster pseudo_legal function that doesn't rely on the heavy MoveList< LEGAL > (*this) call.

@protonspring
Copy link

protonspring commented Mar 14, 2020 via email

@OuaisBla
Copy link
Author

There is measurable ELO gain of 0.72 at 95% confidence level. The LLR failed because of the more agressive bounfs of [0 .25, 1.75]. Less agressive bounds would have pass easily.

What's the point of having a 3 color code, including yellow. if the verdict is binary pass/failed status.

Yellow means not converging fast enough so you could either change the bounds or use you judgment to accept it or reject if. Measurable ELO gain is not linked to the LLR status.

So far, no argument unless the color code of the LTC run has been provide to reject this patch. ELO gain has been mesured factually without a doubts. Anyone at this point could integrate it and fork a new SF that will constantly beat that actual slower SF on the master at any given TC and hapilly leave with it. At the rate we are now discovering new upgrades, which is really low, I would have assumed that many people would have been happy with that ELO gain yet. I wonder why there is still much to do to prove it.

I'll keep it as it is because its too risky to change it at the moment without to redo all the tests.

@vondele
Copy link
Member

vondele commented Mar 14, 2020

@OuaisBla don't get it wrong. Yellow is a fail. It is indicative of being stronger than master, and can be seen as an encouragement to further improve things.

@protonspring
Copy link

protonspring commented Mar 14, 2020 via email

@ianfab
Copy link

ianfab commented Mar 14, 2020

There is a good reason why the SPRT bounds are asymmetric. A patch not only has to prove it gains elo, but also that it gains enough elo that it outweighs the added code complexity. Especially in this case where the patch adds a lot of complexity and possibly also fragility the elo improvement has to be significant.

@OuaisBla
Copy link
Author

I understand all your point.

One can argue that snce the code deosn't depends on movelist anymore the complexiity is in fact lower overall. So I may have remove complexity, specially if we speak about cyclical complexity.

That speed up is more relevant in STC. But significant too in LTC. Its not a functionnal change, so that's why the LTC isn't as spectacular as the STC. To longer the TC the lesser this patch works. A functionak changes works differently. Well, i simply can't improve it more.

Mixing functionnal change in speed up improvement to further improve is too risky. Pseudo_legal is rooted deep into the movepicker class with requires refactoring is the pseudo legal logic is changed.

I did lot of work to make that patch works and I understand its borderline. But for thoses reasons its either take it or leave it. I'll check some cyclical complexity metrics to be sure.

@OuaisBla
Copy link
Author

Here's some cyclomatic numbers:

Metrics Details For File 'movegen.cpp' and Position::pseudo_legal()

Functions: Complexity
generate_all() 9
generate_moves() 10
generate_pawn_moves() 30
make_promotions() 10
Position::pseudo_legal() 21

The newer function remove all the complexity from movegen.cpp to add some in the function pseudo_legal.


Functions and Methods in 1 Class(es): Complexity
Position::pseudo_legal() 30

Overall the code is less complex by removing a huge dependency but the function itself is more complex.

For fun, please note that search() in search.cpp is a record of high cyclomatic complexity up to 216.

Metrics Details For File 'search.cpp'

Functions and Methods in 8 Class(es): Complexity
search() 216

@OuaisBla
Copy link
Author

Standard bench give a 1.4% increase in speed.

Results for 500 tests for each version:

        Base      Test      Diff      
Mean    1564443   1542154   22289     
StDev   14624     14134     6484      

p-value: 0
speedup: -0,014

@protonspring
Copy link

If you are on a linux machine, could you also do:

sudo perf stat -r 30 -a -B -e cyles:u,instructions:u ./stockfish bench > /dev/null

for both this patch and master? This is probably a much more accurate bench test.

@OuaisBla
Copy link
Author

I don't have access to a linux machine. I could try on another machine though with a different processor.

@OuaisBla
Copy link
Author

Nevermind,

@OuaisBla OuaisBla closed this Mar 18, 2020
@vondele
Copy link
Member

vondele commented Mar 23, 2020

I'll also refer to the open issue #2593 in this context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants