Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACA-Py tests failing in AATH - Investigate #2264

Closed
swcurran opened this issue Jun 12, 2023 · 17 comments
Closed

ACA-Py tests failing in AATH - Investigate #2264

swcurran opened this issue Jun 12, 2023 · 17 comments
Assignees

Comments

@swcurran
Copy link
Contributor

For the last two runs of AATH, a number of tests are failing that had been working before. Please investigate the runs to determine the source of the failures and fix what is needed to address the problem (ACA-Py, ACA-Py backchannel, the tests, etc.).

@usingtechnology
Copy link
Contributor

looks like a platform issue with github, all those fails (and there is a success in the middle of those runs) are docker image build fails. not that the code can't be built, but that github docker image registry or whatever is struggling.

@swcurran swcurran self-assigned this Jun 12, 2023
@swcurran
Copy link
Contributor Author

I don’t think so. I think it is a problem with the ACA-Py backchannel and the change to the thread ID. Take a look at this page that shows just the assertion error from the test that failed: https://allure.vonx.io/allure-docker-service-ui/projects/acapy-aip10/reports/latest

Looking further, but I’m guessing it is something with that change.

@swcurran swcurran assigned usingtechnology and unassigned swcurran Jun 12, 2023
@swcurran
Copy link
Contributor Author

I’ll run the tests locally and let you know what I find. Verify consistent results, and then if so, I’ll check a prior commit.

@usingtechnology
Copy link
Contributor

One thing in common with all the latest failed runs is dotnet. I am just looking at the actions in AATH

@usingtechnology
Copy link
Contributor

usingtechnology commented Jun 12, 2023

in docker action: test-harness-findy-javascript-dotnet

Features/BasicMessage/IBasicMessageService.cs(4,50): error CS1514: { expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
Features/BasicMessage/IBasicMessageService.cs(19,2): error CS1513: } expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
The command '/bin/sh -c dotnet publish "DotNet.Backchannel.Master.csproj" -c Release -o /app/publish' returned a non-zero code: 1
Docker image build failed.

locally running AATH : ./manage build -a acapy -a dotnet

#15 3.013 Features/BasicMessage/IBasicMessageService.cs(4,50): error CS1514: { expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
#15 3.013 Features/BasicMessage/IBasicMessageService.cs(19,2): error CS1513: } expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
------
executor failed running [/bin/sh -c dotnet publish "DotNet.Backchannel.Master.csproj" -c Release -o /app/publish]: exit code: 1
Docker image build failed.

@usingtechnology
Copy link
Contributor

So the dotnet tests/actions have not been successful since they added BasicMessageService - 2 months ago.

I don't know what those Allure dashboards tests are and they look like a completely different level of detail so maybe all my comments are meaningless for the actual problem.

@WadeBarnes
Copy link
Contributor

The Allure workflows are used to upload the test results to the Allure servers:

@swcurran
Copy link
Contributor Author

The place to look is here: https://aries-interop.info for a summary of the tests and links to Allure.

Based on every second day runs, I update that page — except when the results suddenly change, as happened in the last few days.

If you then navigate into the per-framework page (e.g. clicking on ACA-Py on the main page), you can see the per runset results, and from their navigate to allure to see the results from the last 10 runs.

You’ll see, for example, that the ACA-Py to ACA-Py (runset “acapy-aip10”) suddenly started getting failures two test runs ago. Those are the ones I’m interested in — why are the ACA-Py to ACA-Py tests failing? I’ve just tried to run locally the “main” and “0.8.2-rc2” branches and they fail the same way. Trying 0.8.1 as I type. I was sure it was going to be one of the two most recent merges, but evidently not…

About .NET — yes, it has been failing for some time. I’m planning on seeing if we can drop it entirely from AATH.

@swcurran
Copy link
Contributor Author

Interesting…0.8.1 is passing, 0.8.2-rc2 has the failures. Sigh. I’ll try to narrow it to a merge.

@usingtechnology
Copy link
Contributor

thanks @WadeBarnes and @swcurran for the context, that https://aries-interop.info/ is super helpful with understanding what is going on.

@swcurran
Copy link
Contributor Author

OK — after some messing around with Docker, I’ve confirmed that #2261 is the change that broke the tests. Doesn’t mean that it is wrong — it could be in the Backchannel.

Process I used was to:

  • Update the requirements-main.yml file to be the particular branch or commit of interest.
  • Remove the backchannel in docker: docker image rm acapy-main-agent-backchannel:latest
  • Run the runset ./manage runset acapy-aip10 -r (-r is rebuild)
    • For a single test that passes/fails depending on the branch/tag/commit:
      • ./manage build -a acapy-main; ./manage run -d acapy-main -t @T006-RFC0037 -t @AIP10 -t @minor -t @AcceptanceTest -t @Schema_Health_ID -t @Indy -t @ProofProposal
  • Check the errors:
    • Errors happening on aries-cloudagent-python@main and commit aries-cloudagent-python@88769c9a3e6044ca4b22f08d83520f1553c2f97e
    • Errors not happening on [email protected]

@usingtechnology — can you please take a look? FYI — with AATH, the logs are at .logs.

I’m sure there are ways to debug, but I don’t know them...

@usingtechnology
Copy link
Contributor

ok, thanks for the process. i'll dig in.

@swcurran
Copy link
Contributor Author

Scanning the logs that are passing and failing, and I’m not seeing anything. My bet is that the backchannel is expecting the empty ~thread item, but I can’t see it in the logs :-). I assume it is Bob that would be having the problem, but who knows :-).

@usingtechnology
Copy link
Contributor

@swcurran - do you have a set of tests that I can run to hit all the other failures? running it wide open takes too long, and there do appear to be irrelevant failures.

@swcurran
Copy link
Contributor Author

If you run ./manage runset acapy-aip10, all the tests should pass — they had been before this change. Takes a long time, but you can do other things, hopefully (that’s why I have two machines :-) ).

With runset, you can add a -b build or -r rebuild to the end of the command.

@usingtechnology
Copy link
Contributor

Thanks for the runset information. I have added a PR to AATH.

Fix was very simple but time-consuming to track down and regression test. But I know a lot more about AATH now!

@swcurran
Copy link
Contributor Author

Nice work! Closing this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants