-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent failures in pkg test #16555
Comments
It's a connectivity issue. You cannot expect from services 100% uptime. I do not see any solution to that. Maybe we could run tests in the controlled environment (some of our machines). |
We need to make this more robust, it's actively disruptive when CI is unreliable. Can we make these operations retry several times? |
I believe this is AppVeyor problem. It could be a configuration issue of IIS on Win x86_64 machines. I found this one: http://stackoverflow.com/questions/20682621/using-iis-and-arr-to-reverse-proxy-returns-the-server-returned-an-invalid-or-un. |
cc @FeodorFitsner any ideas/suggestions? |
This is connectivity issue from where to what - could you please elaborate? |
Sorry - we're testing our package manager, which is doing git clones, fetches, checkouts etc, via libgit2, from the AppVeyor VM to github repos. Usually JuliaLang/METADATA.jl (which is pretty big) and JuliaLang/Example.jl (which is small). I believe these are WinHTTP errors from the https transport that libgit2 tries to use. |
So, those are connectivity issues with github.com? |
Yes. |
OK, can you give a sample of error request/response/status? |
Usually "Failed to receive response: The server returned an invalid or unrecognized response" example logs: https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.2056/job/73245nbqo4lq4rf7 There are others further back, most of the other failures have been timeouts due to #16556 which is probably a libuv bug, not an appveyor/github problem. |
Well, this might be connectivity issue, of course, but we haven't received any reports from others about any connectivity issues with github.com. The good news that in coming weeks we are moving our build infrastructure to the same hosting provider used by github.com 😉 Hope that will make things better. |
Cool. Well I hope that migration goes smoothly and we only notice via fewer failed builds. I'm not sure there's any simple way of debugging this. Maybe trying to create a C reproduction of the same library calls we're using, remoting in and trying to catch it in gdb but that would take a bit of work. |
You can do a PowerShell script. |
I was able to trigger some Pkg/LibGit2 errors locally that looked a lot like this, repeatably, by calling
|
Open/comment if this still happens. |
This happens often on AppVeyor and I've seen it locally occasionally too, but I also think I've seen it on Travis so it may not be Windows-specific. We get either status code 500, or "Failed to receive response: The server returned an invalid or unrecognized response," etc.
We have made recent changes in libgit2, adding a few new tests and making it throw a few more errors to avoid catching unintended things like typos. Whatever is happening here seems like it may be due to intermittent connectivity problems, or maybe an underlying bug/race condition/undefined behavior in the C library interfaces? Either way we should try to pinpoint the cause and add a mitigation (retries on some operations? specific handling of this expected failure mode?) or fix it if it's a real bug.
What's the best way to debug this? Does anyone else see this locally if you run
make test-pkg
in a loop? cc @wildartedit: this looks a lot like #13436 but apparently that had been more common on 32-bit?
The text was updated successfully, but these errors were encountered: