Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildsystem/CI: Use of BOARD_INSUFFCIENT_MEMORY becomes a maintainace burdon #11128

Closed
maribu opened this issue Mar 7, 2019 · 12 comments
Closed
Labels
Area: build system Area: Build system Area: CI Area: Continuous Integration of RIOT components State: stale State: The issue / PR has no activity for >185 days Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Comments

@maribu
Copy link
Member

maribu commented Mar 7, 2019

Description

Currently for every test and application the BOARD_INSUFFICIENT_MEMORY variable needs to be manually maintained. This makes adding new boards with low RAM/FLASH a nightmare. Also, boards will practically never be removed from BOARD_INSUFFCIENT_MEMORY, even though newer toolchains and improvements in code could result in lower RAM/ROM requirements.

Additionally, the BOARD_INSUFFICIENT_MEMORY approach reduces compilation test coverage. E.g. a test will not be compiled at all when blacklisted via BOARD_INSUFFCIENT_MEMORY, but only the linking stage will fail because of insufficient RAM/flash. Any possible issue that the compilation stage would uncover remain unrevealed.
Update: Only the linking step is skipped when blacklisted via BOARD_INSUFFICIENT_MEMORY

Brainstorming of Possible Alternatives

  1. Let make fail with a custom exit code upon linking stage. The CI can handle treat this return code not as an error
    • Pros:
      • Relatively straight forward
    • Cons:
      • No obvious disadvantages
  2. Add a special Make target that overrides link time checks
    • Pros:
      • Relatively straight forward
      • No changes in the CI required except for using that specific target
    • Cons:
      • A really ugly hack
      • A user might not read the doc, notice that the make target "ci_build" magically makes the error go away and will try to flash the result
  3. Add something like RAM_PROVIDED and FLASH_PROVIDED to every MCU, let boards override those in case of bootloaders. Add RAM_REQUIRED and FLASH_REQUIRED to every test and example
    • Pros:
      • No obvious advantages
    • Cons:
      • A lot of effort to add those to every board
      • A maintenance burden to keep them up to date
      • RAM/flash requirements depend on the toolchain, the CPU/board specific code, the name of the git branch baked into the "hello message" upon boot, and the alignment of the stars
@maribu maribu added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: build system Area: Build system Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation Area: CI Area: Continuous Integration of RIOT components labels Mar 7, 2019
@maribu
Copy link
Member Author

maribu commented Mar 7, 2019

@RIOT-OS/maintainers: Please have a look and extend the brainstorming above as you see fit. (Both new possible approaches and pros/cons I didn't thought about.)

@smlng
Copy link
Member

smlng commented Mar 7, 2019

Funny, we had the same discussion over lunch today 😄

My conclusion was that all these lists, i.e., BOARD_WHITELIST, BOARD_BLACKLIST, BOARD_INSUFFICIENT_MEMORY, TEST_ON_CI_WHITELIST, and so on, are only used by Murdock-CI and to make that pass without errors by avoid building+testing stuff that will likely fail. And also to safe exec time on the CI.

So ideally (IMHO) we should get rid of such manually maintained lists in RIOT, and e.g. let compiling or linking fail. Because, as you already pointed out, the lists are hard to maintain and once a board is on such a list it likely stays there, even if issues are fixed. Again, these lists mostly serve one purpose: to make Murdock not fail and to me that's not how a CI should run.

I'm in favour of option 1. My 2 cents ...

@jcarrano
Copy link
Contributor

jcarrano commented Mar 7, 2019

I like (1). I'm trying to investigate what is the best way to reliably report that certain step in make failed, and also if there is a reliable way of knowing how much space it would have taken.

@kaspar030
Copy link
Contributor

Additionally, the BOARD_INSUFFICIENT_MEMORY approach reduces compilation test coverage. E.g. a test will not be compiled at all when blacklisted via BOARD_INSUFFCIENT_MEMORY, but only the linking stage will fail because of insufficient RAM/flash.

Not true, currently for all boards in BOARD_INSUFFICIENT_MEMORY, all compilation is done, just the linking step is skipped.

@kaspar030
Copy link
Contributor

So ideally (IMHO) we should get rid of such manually maintained lists in RIOT, and e.g. let compiling or linking fail. Because, as you already pointed out, the lists are hard to maintain and once a board is on such a list it likely stays there, even if issues are fixed.

Again, these lists mostly serve one purpose: to make Murdock not fail and to me that's not how a CI should run.

I don't think it is that simple. The lists represent what we expect to succeed. In these things, it is beneficial to be explicit.

In case of the "BOARDS_INSUFFICIENT_MEMORY", if we'd come up with a way to determine whether the link failed because - well, it failed, or because of insufficient memory, and treat the latter as an "OK fail", boards would cross the line without anyone noticing. A change could increase code size so much that a couple of low end boards suddenly don't fit anymore. The currently used size determination wouldn't catch this, as it only works with completed binaries.

In case of BOARD_BLACKLIST, there are many reasons why a board might be listed there.
Failing compilation is just one of the reasons. Others are e.g., known runtime problems, ....
While compilation could eaily be cought by CI, others are not.
So we'd end up with:

  • CI always failing. Unless we have a list of which compiles are "OK to fail".

This is important, as "CI succeeded" is a prerequisite for merging a PR.

  • CI producing binaries that don't work at runtime and the developer knows. Would be nice to document that somewhere. Maybe in a list that both devs and CI can use?
  • CI having much longer build times, because it wastes substantial amount of time building known failures.

I propose two things:

  1. reduce maintenance burden. There are a couple of tries to make boards groupable, e.g., make: add board grouping #8062 and make: add architecture features and feature blacklisting #9081. Maybe we can have BOARD_INSUFFICIENT_MEMORY be populated by a group, e.g.,
    BOARD_INSUFFICIENT_MEMORY += board_has_little_ram.

  2. Report if the link would actually succeed. Thus, instead of just skipping the link if "CI_NO_LINK" is set, try, and if it succeeds, make that an error.

@maribu
Copy link
Member Author

maribu commented Mar 8, 2019

A change could increase code size so much that a couple of low end boards suddenly don't fit anymore.

The current approach is hardly solving this issue. Most tests and examples are already blacklisted for the lower end hardware anyway. And e.g. if a board still has 512B RAM free and one PR adds 511B more RAM requirements, this PR will likely get merged without anyone noticing. The next PR that adds 2B more and will however get all the blame.

To actually get feedback on which PRs do bloat RIOT, it would be better to provide statistics at the end of Murdock. E.g. did the number of link failures increase/decrease compared to the current master HEAD? What is the impact on the size .bss, .data and .text in average, what is the standard derivation and what are the outliers?

I'm aware this is something that will not happen over night. But considering the man power invested to keep BOARDS_INSUFFICIENT_MEMORY up to date, the time to develop this seems to be well spent.

@kaspar030
Copy link
Contributor

To actually get feedback on which PRs do bloat RIOT, it would be better to provide statistics at the end of Murdock. E.g. did the number of link failures increase/decrease compared to the current master HEAD?

The sizes are already collected. There are no comparisons ATM because it would require a re-build of master after each merge in order to get a proper baseline.

@kaspar030
Copy link
Contributor

if there is a reliable way of knowing how much space it would have taken.

AFAIK the ld flag --print-memory-usage also outputs stats if the link fails due to size.

@maribu
Copy link
Member Author

maribu commented Mar 9, 2019

it would require a re-build of master after each merge in order to get a proper baseline.

This might be a good idea regardless of this discussion. E.g. lets say for PR A and PR B the CI tests pass individually, but when both are merged they fail. Currently both could end up being merged with noone noticing the issue.

@jcarrano
Copy link
Contributor

Currently both could end up being merged with noone noticing the issue.

True, but that's what nightly is for.

If we keep all the sizes from nighly and keep track of them we could spot any jump in FLASH usage.

The "real" solution would be to have reliable incremental builds. We'll get there eventually.

@cladmi
Copy link
Contributor

cladmi commented Mar 13, 2019

The size are saved and accessible already:

Just look at the nightlies and for any run replace "output.html" by "sizes.json"

https://ci.riot-os.org/RIOT-OS/RIOT/master/d562af40e625485bc3e2eefdca8659c5b942ffc5/sizes.json

It was monitored for some time by @bergzand here https://riot-graphs.snt.utwente.nl/d/000000004/full-grid-diff-boards-vertical?orgId=1

@stale
Copy link

stale bot commented Sep 14, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Sep 14, 2019
@stale stale bot closed this as completed Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: build system Area: Build system Area: CI Area: Continuous Integration of RIOT components State: stale State: The issue / PR has no activity for >185 days Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

No branches or pull requests

5 participants