-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Serial connection deadlocks due to wrong protocol order (STM32F103RET6_creality) #21244
Comments
You have #define SERIAL_PORT 1 and #define SERIAL_PORT_2 3 Can you disable SERIAL_PORT_2 and see if anything changes (I have a suspicion that the resend: command is going to wrong port) |
The discussion (and potential fix) seems like it could be related: #21010 (comment) Here’s the referenced commit from @rhapsodyv you’d need to cherry pick to test: rhapsodyv@4bfb5cd |
@ellensp good question. This is the file I just copied out of the config Repo without changing much. I wasn't even aware that my board has two serials. I will check if it changes something when I disable one, and see what is the right one. |
@thisiskeithb I suspect this is the same bug. I was suspecting the missing "Resend:" was sent to the wrong port, but looks like the port number was corrupted. (same result) |
Not sure it's related here. The issue in #21010 is linked with multiserial usage, and the OP does not use the 2 serial ports simultaneously (as far as I understand). |
I did not, yes. Only the USB serial connection to an octoprint instance. I recompiled the firmware with having the second serial port disabled. Looks good so far, the current calibration cube print has more layers than any attempts i did yesterday had. However, there were also some successful prints in the past, with both ports enabled, so i can neither confirm nor deny that the suggestion from @ellensp did help. But i keep my fingers crossed ;) |
@X-Ryl669 I suggested turning off serial_port_2 so that multiserial was disabled, as the multiserial code is the only thing I can see that could stop Resend: from being sent on the same serial port. Which is the issue on the initial report https://community.octoprint.org/t/print-freezes-due-to-checksum-mismatch/31425/4 |
Right. What Victor's spotted is a "use uninitialized error". When not in multiserial, there's a default path that always returns 0 for the serial port index. When in multiserial but only using the first serial port, the ring buffer's value is default initialized to 0 so even if the read happens at the wrong place, it'll still read 0 for the serial index and this should not explain the behavior described here. Obviously, if he's using the second serial port, then the above is wrong, and that could explain the bug. |
@simonszu Can you try with the original configuration (2 serial ports) and the latest bugfix branch to report if the issue is solved ? Thanks! |
Hi, I been having the same issue. So far disabling #define SERIAL_PORT_2 3 resend ratio is 0/100K lines. (This is a good sign) Still waiting te see what happened if there is a resend request from the printer. Printer: Ender 3 with V4.2.2 board |
@jvitali I did extensive tests on LPC with 2 serial ports enabled: -1 and 0 (one usb serial and the other hw serial) You comments gave me another hint. I will do the same tests on stm32 with 2 hw serial at same time. I will post the results soon. |
Good to see that @jvitali is also able to test. I have currently a problem with a too much warped bed and am waiting for a BLtouch delivery. I do not want to produce too much spaghetti, so i will check back here once my problem is solved. |
Can you set RX_BUFFER_SIZE to 128 in Configuration_adv.h ? Also, can you post your Configuration.h/adv files and the serial log ? (so we know what was and what lead to the failure) Thanks! |
@X-Ryl669 Just to be clear (I'm new to Merlin)
to:
|
@jvitali are you sure you are testing the last bugfix? I'm running tests right now ,using a mks nano v2, with default serial buffer size, two serial receiving and reply data (1 and 3) and everything is working fine. No single byte lost. And if I force an error, it don't hang, just recovery fine. what are you using to printing? OctoPrint? |
@rhapsodyv And yes I'm printing from Octoprint. After commenting #define SERIAL_PORT_2 in Configuration.h I'm receiving 0 resend requests. I'm using Creality V4.2.2 board on the printer |
@jvitali what do you have connected to the serial_2? the serial tft? |
@jvitali can you share the serial log of a failed printing? |
@jvitali Yes. |
@X-Ryl669 will try. |
@rhapsodyv I have the LCD, micro USB connected to the Pi and the SD card. Unfortunately I reinstalled Raspbian and forgot to turn on serial log in Octoprint. |
I think I found the issue. It's related with Keep Alive + Multi serial. When marlin is executing some slow commands, it may stay in a loop waiting for it to complete. Inside that loop, marlin may call idle periodically. So, the idle will check for new serial commands (in the other serial) to enqueue. When it receive the command, it will reply "busy" (on keep alive function), but it send to the wrong serial port. Call stack:
I could simulate resend commands on octoprint this way. And I fixed making the keep alive reply "busy" for all serial ports... that in fact is correct, because marlin will only handle one command at time, so it need warns all serial ports to stop sending data until it can handle it. @jvitali can you test this branch, keep both serial enabled? https://github.com/rhapsodyv/Marlin/tree/multi-serial-and-keep-alive-hang |
@X-Ryl669 With #define RX_BUFFER_SIZE 128, #define SERIAL_PORT 1 and #define SERIAL_PORT_2 3 (bot serial ports enabled) so far I'm having no issues (Also no receiving resend requests from the printer), Still waiting a resend request to see how it behaves. |
Please test the |
This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Bug Description
When sending data via serial connection, some confusion in the protocol command order happens, which causes a deadlock, since both host and printer wait for the other side to continue. This happens on 2.0.7 release as well as on the bugfix-2.x branch cloned 2 hours ago.
Configuration Files
Marlin.zip
Steps to Reproduce
Expected behavior:
The print is printed, the list of GCODE commands is worked through
Actual behavior:
The print starts, runs for a random amount of time/commands and then stops. There are lines in octoprint's serial.log that indicate that deadlock. Will link to them below.
This is not happening always. There are some prints which run completely fine.
Additional Information
The affected printer is a Creality Ender 3 Pro with the 4.2.2 board, so referenced as Ender 3 1.5 in the configuration repository.
Old related issue in the OctoPrint repo: OctoPrint/OctoPrint#3917
Forum thread on the Octoprint forum with the serial.log as seen from octoprint: https://community.octoprint.org/t/print-freezes-due-to-checksum-mismatch/31425/7, also with some analyzing by @foosel about the steps octoprint expects, and which steps it gets from Marlin.
The text was updated successfully, but these errors were encountered: