-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remplace slower function for pseudo_legal evaluation. #2581
Remplace slower function for pseudo_legal evaluation. #2581
Conversation
OuaisBla
commented
Mar 12, 2020
Commit | pseudo_Legal_speed_up_test (Remplace slower function for pseudo_legal evaluation. Fix bench.) |
---|---|
Info | Test with 2 threads to look for race possible conditions. |
Submitter | OuaisBla |
TC | 10+0.1 |
SPRT | elo0: -0.50 alpha: 0.05 elo1: 1.50 beta: 0.05 (logistic) |
LLR | 2.94 [-2.94,2.94] (accepted) |
Elo | 6.05 [2.06,10.01] (95%) |
LOS | 99.8% |
Games | 9964 [w:18.1%, l:16.3%, d:65.6%] |
Link | /tests/view/5e69b273e42a5c3b3ca2e442 |
Commit | pseudo_Legal_speed_up_test (Remplace slower function for pseudo_legal evaluation. Fix bench.) |
---|---|
Info | Following a successfull STC. |
Submitter | OuaisBla |
TC | 60+0.6 |
SPRT | elo0: 0.25 alpha: 0.05 elo1: 1.75 beta: 0.05 (logistic) |
LLR | -1.62 [-2.94,2.94] |
Elo | 0.72 [-0.43,1.93] (95%) |
LOS | 89.1% |
Games | 84188 [w:11.8%, l:11.6%, d:76.6%] |
Link | /tests/view/5e69b797e42a5c3b3ca2e44b |
(cherry picked from commit dd94898)
@OuaisBla, YI, the LTC test failed, which makes it less likely to apply such a patch.
|
@vondele Do you really consider a 0.72 ELO Gain a fail at LTC? It didn't failed because its a yellow over more than 100k games played. Plus at STC it's a huge success of 6 ELO gain. I run a unittest to make sure that the function behaves properly in either good or bad cases. I.e. I ran a test over many benchmark position and look that if the moves is valid, to track for false positive or false negative as well. The google unit test is in attachement for reference. On top of the standard perft FEN position, I use all those position specialized for not normal moves: https://gist.github.com/peterellisjones/8c46c28141c162d1d8a0f0badbc9cff9 From this reference: https://peterellisjones.com/posts/generating-legal-chess-moves-efficiently/ Yes, I consider the code review of @protonspring. But like you said it yourself, this patch is a big one that can impact SF more that simple fine tune. So it would be unwise at this point to refact it purely on speculation of unlikely speed up gain. That applies to the comment made by @Sopel97 as well for the refact of the switch case portion. I consider it, but decide to not to change the code. Over all the games played by fishtest, there was 0 crash. Unless a bug is found on top of all the tests done, I won't touch it and recommend to integrate it as it is. The quality of the patch has been demonstrated to qualify for a integration on the master. Further improvement could be made afterward, but we need a baseline for this and this patch is a real canditate. For chess960, the only possible concern is the CASTLING section. The normal case is the same as before. The PROMOTION and ENPASSANT are the same. In the code, no square are hardcoded and regular functions were used like everybody doing it without specific concerns regarding 960. I see no problems and also portion of the benchmark that has some 960 position and the bench number is exactly the same in all position. So, for me that patch is a speed up that has measurable ELO gain in both STC (a lot) and even in LTC. As for the future of SF, I would bet a lot that everybody will like to use a more faster pseudo_legal function that doesn't rely on the heavy MoveList< LEGAL > (*this) call. |
I think it's important to note that Yellow = failure. This means that the
patch doesn't verify any ELO gain at all. Be cautious of interpreting
noise as ELO. Your elo test at STC was most likely noise. This happens
quite frequently.
My suggestions are an effort to help the test actually pass SPRT. If that
happens, then this is a different discussion. In short, if your test
doesn't pass (passing = GREEN), it's not likely going to go anywhere.
We're VERY careful around here. We do things for specific reasons and
unfortunately, this patch doesn't look like (yet), it's going to make it.
I appreciate your contributions, so please keep trying.
|
There is measurable ELO gain of 0.72 at 95% confidence level. The LLR failed because of the more agressive bounfs of [0 .25, 1.75]. Less agressive bounds would have pass easily. What's the point of having a 3 color code, including yellow. if the verdict is binary pass/failed status. Yellow means not converging fast enough so you could either change the bounds or use you judgment to accept it or reject if. Measurable ELO gain is not linked to the LLR status. So far, no argument unless the color code of the LTC run has been provide to reject this patch. ELO gain has been mesured factually without a doubts. Anyone at this point could integrate it and fork a new SF that will constantly beat that actual slower SF on the master at any given TC and hapilly leave with it. At the rate we are now discovering new upgrades, which is really low, I would have assumed that many people would have been happy with that ELO gain yet. I wonder why there is still much to do to prove it. I'll keep it as it is because its too risky to change it at the moment without to redo all the tests. |
@OuaisBla don't get it wrong. Yellow is a fail. It is indicative of being stronger than master, and can be seen as an encouragement to further improve things. |
Dude. I get it. This has happened to me a dozen times. The process has worked pretty well for a long time and probably won't change anytime soon.
|
There is a good reason why the SPRT bounds are asymmetric. A patch not only has to prove it gains elo, but also that it gains enough elo that it outweighs the added code complexity. Especially in this case where the patch adds a lot of complexity and possibly also fragility the elo improvement has to be significant. |
I understand all your point. One can argue that snce the code deosn't depends on movelist anymore the complexiity is in fact lower overall. So I may have remove complexity, specially if we speak about cyclical complexity. That speed up is more relevant in STC. But significant too in LTC. Its not a functionnal change, so that's why the LTC isn't as spectacular as the STC. To longer the TC the lesser this patch works. A functionak changes works differently. Well, i simply can't improve it more. Mixing functionnal change in speed up improvement to further improve is too risky. Pseudo_legal is rooted deep into the movepicker class with requires refactoring is the pseudo legal logic is changed. I did lot of work to make that patch works and I understand its borderline. But for thoses reasons its either take it or leave it. I'll check some cyclical complexity metrics to be sure. |
Here's some cyclomatic numbers: Metrics Details For File 'movegen.cpp' and Position::pseudo_legal()Functions: Complexity The newer function remove all the complexity from movegen.cpp to add some in the function pseudo_legal. Functions and Methods in 1 Class(es): Complexity Overall the code is less complex by removing a huge dependency but the function itself is more complex. For fun, please note that search() in search.cpp is a record of high cyclomatic complexity up to 216. Metrics Details For File 'search.cpp'Functions and Methods in 8 Class(es): Complexity |
Standard bench give a 1.4% increase in speed. Results for 500 tests for each version:
p-value: 0 |
If you are on a linux machine, could you also do: sudo perf stat -r 30 -a -B -e cyles:u,instructions:u ./stockfish bench > /dev/null for both this patch and master? This is probably a much more accurate bench test. |
I don't have access to a linux machine. I could try on another machine though with a different processor. |
Nevermind, |
I'll also refer to the open issue #2593 in this context |