-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
twister: ignore serial device SerialException. #44976
twister: ignore serial device SerialException. #44976
Conversation
d4ed510
to
5496bcc
Compare
scripts/twister
Outdated
@@ -608,6 +608,9 @@ structure in the main Zephyr tree: boards/<arch>/<board_name>/""") | |||
(twister.csv and twister.xml). | |||
""") | |||
|
|||
parser.add_argument("--monitor-serial-later", action="store_true", | |||
help="Monitor device serial after running flash process.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not explain well what is going on here, the option itself is also awkward. Can you please explain the situation when this is needed and why is it needed, i.e. why do we need a special option when it works without the option most of time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nashif This was due to that I got many "mismatch errors" on a x86 platform due to recent twister change "verify ztest suite name in output", but this is a positive change, because it did find some potential issues, like below:
when running test A, it opened serial and started to monitor it, but the monitor thread met a serial exception "device reports readiness to read but returned no data (device disconnected or multiple access on port?)" during flash process,
then the current test A will be judged as failed immediately without waiting because the monitor thread was already exited due to serial exception with ser.close(), the handler time was only 3.135s, like below:
DEBUG - Flash command: ['west', 'flash', '--skip-rebuild', '-d', '/home/chenp1/github-zephyr-intel/zephyr/twister-out/board/tests/kernel/fatal/exception/kernel.common.stack_sentinel', '--', '/home/chenp1/test.sh']
DEBUG - DEVICE:
DEBUG - serial exception
DEBUG - -- west flash: using runner misc-flasher
Test binary updated successfully!
DEBUG - exception: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
DEBUG - run status: board/tests/kernel/fatal/exception/kernel.common.stack_sentinel failed
INFO - 163/168 board tests/kernel/fatal/exception/kernel.common.stack_sentinel FAILEDUnknown (device 3.135s)
then when running next test B, after it started the monitor thread, it would catch the output form previous test A, finished the test judgement and did ser.close(), because all tests' pass/fail judgement were same, it won't wait flash process to update the test binary on board, thus causing a mismatch error, like below,
we could see that the monitor thread already finished before doing "west flash" command.
DEBUG - DEVICE: *** Booting Zephyr OS build zephyr-v3.0.0-2764-ge295b9643f0a ***
DEBUG - DEVICE: Running test suite obj_tracking
DEBUG - DEVICE:
DEBUG - DEVICE: START - test_obj_tracking_sanity
DEBUG - DEVICE: PASS - test_obj_tracking_sanity in 0.1 seconds
DEBUG - DEVICE: ===================================================================
DEBUG - DEVICE: Test suite obj_tracking succeeded
DEBUG - DEVICE: ===================================================================
DEBUG - DEVICE: RunID: 12ddf21c1b4d4f960be9d9d1c91e9582
DEBUG - DEVICE: PROJECT EXECUTION SUCCESSFUL
DEBUG - -- west flash: using runner misc-flasher
Test binary updated successfully!
DEBUG - Test suite names were not printed or some of them in output do not correspond with expected: ['tickless_concept']
DEBUG - run status: board/tests/kernel/tickless/tickless_concept/kernel.tickless.concept failed
INFO - 130/168 board tests/kernel/tickless/tickless_concept/kernel.tickless.concept FAILED RunID mismatch (device 18.446s)
and all tests later after this error would be judged as failed with "mismatch error".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I wanted to monitor thread after running flash process, because I could make sure the binary updated successfully during flash process, so the current test B won't be influenced by previous test A even if met this serial exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I am curious why we need to invoke ser.close() when met a serial exception "device reports readiness to read but returned no data (device disconnected or multiple access on port?)", it seems that this exception maybe caused by other reasons, not only serial disconnected.
try:
serial_line = ser.readline()
except TypeError:
pass
except serial.SerialException:
ser.close()
break
because I found I could still got all expected results from serial even I did change as below:
except serial.SerialException:
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in NXP platform, this will introduce another issue i.e. missing leading log.
this is mainly caused by SerialException, I opened an issue #45009 to give more details. |
5496bcc
to
c2fcc29
Compare
c2fcc29
to
13156b5
Compare
13156b5
to
a7cde6b
Compare
@hakehuang @PerMac @gopiotr Could you help take a look of this? thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the most generic Exception
and then ignoring it with pass
looks like an example from "mistakes to avoid" guide. I read you description of the issue, but couldn't fully follow on what is the root of it. Could you try to refine the except part to something more reasonable?
@PerMac I think no matter what exceptions we got from serial, we shouldn't close serial directly in the monitor serial thread, it will be closed by main thread anyway finally,
because if we closed serial in monitor serial thread when met an exception, but now only got half results of test instance A, the main thread won't wait at |
I think I start following. So the issue is that we start serial connection at higher level than monitoring serial thread. Therefore serial connection should be managed on this higher level, and shouldn't be closed inside the thread. Shouldn't then instead of ignoring the serial exception, have the exception passed to the higher level and make it be handled there? TBH I have some problems understanding the logic of the twister code responsible for serial connection. I am worried it might require some more thoughtful refactoring. E.g. I believe that serial connection should be open using context menager ( @chen-png Will you join the upcoming testing wg meeting? Maybe we could discuss the issue there? |
For some boards, like up_squared and ehl_crb, they need do a power off/on operation and use bios to load and execute zephyr test binary, during the power off, it may meet this SerialException, but the serial object still works well after power on, we can continue to get all results from serial, so we don't need to close it, it's ok to ignore this exception. BTW, even if we ignore this exception here, it doesn't influence twister function, because if it was really caused by disconnection, the test would still be judged as failed due to timeout and serial would be closed later in main thread, it doesn't impact results. On the contrary, close serial directly in the monitor serial thread, it will cause later tests failed due to mismatch errors. Signed-off-by: Chen Peng1 <[email protected]>
a7cde6b
to
2d2463e
Compare
original issue for this code in twister: #13541 |
said 'Anyway, I will dismiss my request for change. ', but never happened
twister: ignore serial device SerialException.
For some boards, like up_squared and ehl_crb, they need do a
power off/on operation and use bios to load and execute zephyr
test binary, during the power off, it may meet this SerialException,
but the serial object still works well after power on, we can continue
to get all results from serial, so we don't need to close it,
it's ok to ignore this exception.
BTW, even if we ignore this exception here, it doesn't influence twister
function, because if it was really caused by disconnection, the test would
still be judged as failed due to timeout and serial would be closed later
in main thread, it doesn't impact results.
On the contrary, close serial directly in the monitor serial thread, it
will cause later tests failed due to mismatch errors.
fixes #45009
Signed-off-by: Chen Peng1 [email protected]