Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Serial connection deadlocks due to wrong protocol order (STM32F103RET6_creality) #21244

Closed
simonszu opened this issue Mar 3, 2021 · 26 comments

Comments

@simonszu
Copy link

simonszu commented Mar 3, 2021

Bug Description

When sending data via serial connection, some confusion in the protocol command order happens, which causes a deadlock, since both host and printer wait for the other side to continue. This happens on 2.0.7 release as well as on the bugfix-2.x branch cloned 2 hours ago.

Configuration Files

Marlin.zip

Steps to Reproduce

  1. Compile the firmware without any significant modifications. Only modification on the test with the release firmware was to include support for M73 commands, on the test with the bugfix branch no modification was made.
  2. Install it on printer (Ender 3 Pro)
  3. Start a print via Octoprint
  4. Wait

Expected behavior:

The print is printed, the list of GCODE commands is worked through

Actual behavior:

The print starts, runs for a random amount of time/commands and then stops. There are lines in octoprint's serial.log that indicate that deadlock. Will link to them below.

This is not happening always. There are some prints which run completely fine.

Additional Information

The affected printer is a Creality Ender 3 Pro with the 4.2.2 board, so referenced as Ender 3 1.5 in the configuration repository.

Old related issue in the OctoPrint repo: OctoPrint/OctoPrint#3917
Forum thread on the Octoprint forum with the serial.log as seen from octoprint: https://community.octoprint.org/t/print-freezes-due-to-checksum-mismatch/31425/7, also with some analyzing by @foosel about the steps octoprint expects, and which steps it gets from Marlin.

@ellensp
Copy link
Contributor

ellensp commented Mar 3, 2021

You have #define SERIAL_PORT 1 and #define SERIAL_PORT_2 3
Which port is connected to octoprint and what is on the other serial port?

Can you disable SERIAL_PORT_2 and see if anything changes (I have a suspicion that the resend: command is going to wrong port)

@thisiskeithb
Copy link
Member

thisiskeithb commented Mar 4, 2021

The discussion (and potential fix) seems like it could be related: #21010 (comment)

Here’s the referenced commit from @rhapsodyv you’d need to cherry pick to test: rhapsodyv@4bfb5cd

@simonszu
Copy link
Author

simonszu commented Mar 4, 2021

@ellensp good question. This is the file I just copied out of the config Repo without changing much. I wasn't even aware that my board has two serials. I will check if it changes something when I disable one, and see what is the right one.
Should a bug report against the config Repo be opened?

@ellensp
Copy link
Contributor

ellensp commented Mar 4, 2021

@thisiskeithb I suspect this is the same bug. I was suspecting the missing "Resend:" was sent to the wrong port, but looks like the port number was corrupted. (same result)

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Mar 4, 2021

Not sure it's related here. The issue in #21010 is linked with multiserial usage, and the OP does not use the 2 serial ports simultaneously (as far as I understand).

@simonszu
Copy link
Author

simonszu commented Mar 4, 2021

I did not, yes. Only the USB serial connection to an octoprint instance.

I recompiled the firmware with having the second serial port disabled. Looks good so far, the current calibration cube print has more layers than any attempts i did yesterday had. However, there were also some successful prints in the past, with both ports enabled, so i can neither confirm nor deny that the suggestion from @ellensp did help. But i keep my fingers crossed ;)

@ellensp
Copy link
Contributor

ellensp commented Mar 4, 2021

@X-Ryl669 I suggested turning off serial_port_2 so that multiserial was disabled, as the multiserial code is the only thing I can see that could stop Resend: from being sent on the same serial port. Which is the issue on the initial report https://community.octoprint.org/t/print-freezes-due-to-checksum-mismatch/31425/4

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Mar 4, 2021

Right. What Victor's spotted is a "use uninitialized error". When not in multiserial, there's a default path that always returns 0 for the serial port index. When in multiserial but only using the first serial port, the ring buffer's value is default initialized to 0 so even if the read happens at the wrong place, it'll still read 0 for the serial index and this should not explain the behavior described here.

Obviously, if he's using the second serial port, then the above is wrong, and that could explain the bug.
I observed that the printer could not resume correctly upon error but since it happens sparsely, I thought it was fixed. You are probably right here that there is something else going on.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Mar 5, 2021

@simonszu Can you try with the original configuration (2 serial ports) and the latest bugfix branch to report if the issue is solved ? Thanks!

@jvitali
Copy link

jvitali commented Mar 7, 2021

Hi, I been having the same issue.
Was testing the resend msg from the printer with a 3D benchy.
With both serial ports enabled (#define SERIAL_PORT 1 and #define SERIAL_PORT_2 3) the print failed around layer 25ish.
tried with bugfix and main branch and the issue persisted.

So far disabling #define SERIAL_PORT_2 3 resend ratio is 0/100K lines. (This is a good sign) Still waiting te see what happened if there is a resend request from the printer.

Printer: Ender 3 with V4.2.2 board
Raspberry Pi 4 8GB with Octoprint 1.5.3
3D Benchy sliced with Cura 4.7.1 Standard Quality 0.2mm (No modifications to the default profile)

@rhapsodyv
Copy link
Member

@jvitali I did extensive tests on LPC with 2 serial ports enabled: -1 and 0 (one usb serial and the other hw serial)

You comments gave me another hint. I will do the same tests on stm32 with 2 hw serial at same time.

I will post the results soon.

@simonszu
Copy link
Author

simonszu commented Mar 7, 2021

Good to see that @jvitali is also able to test. I have currently a problem with a too much warped bed and am waiting for a BLtouch delivery. I do not want to produce too much spaghetti, so i will check back here once my problem is solved.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Mar 7, 2021

tried with bugfix and main branch and the issue persisted.

Can you set RX_BUFFER_SIZE to 128 in Configuration_adv.h ? Also, can you post your Configuration.h/adv files and the serial log ? (so we know what was and what lead to the failure)

Thanks!

@jvitali
Copy link

jvitali commented Mar 7, 2021

@X-Ryl669 Just to be clear (I'm new to Merlin)
you want me to change

//#define RX_BUFFER_SIZE 1024

#if RX_BUFFER_SIZE >= 1024
  // Enable to have the controller send XON/XOFF control characters to
  // the host to signal the RX buffer is becoming full.
  //#define SERIAL_XON_XOFF
#endif

to:

#define RX_BUFFER_SIZE 128

#if RX_BUFFER_SIZE >= 1024
  // Enable to have the controller send XON/XOFF control characters to
  // the host to signal the RX buffer is becoming full.
  //#define SERIAL_XON_XOFF
#endif

@rhapsodyv
Copy link
Member

@jvitali are you sure you are testing the last bugfix?

I'm running tests right now ,using a mks nano v2, with default serial buffer size, two serial receiving and reply data (1 and 3) and everything is working fine. No single byte lost. And if I force an error, it don't hang, just recovery fine.

what are you using to printing? OctoPrint?

@jvitali
Copy link

jvitali commented Mar 7, 2021

@rhapsodyv
Yes I'm using https://github.com/MarlinFirmware/Marlin/tree/bugfix-2.0.x (I downloaded this morning as soon as I saw this post)
with Configuration.h and Configuration_adv.h form up to date bugfix version config folder.
I edited the config lines to get it to work with BLTouch.

And yes I'm printing from Octoprint.

After commenting #define SERIAL_PORT_2 in Configuration.h I'm receiving 0 resend requests.

I'm using Creality V4.2.2 board on the printer

@rhapsodyv
Copy link
Member

@jvitali what do you have connected to the serial_2? the serial tft?

@rhapsodyv
Copy link
Member

@jvitali can you share the serial log of a failed printing?

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Mar 7, 2021

@jvitali Yes.

@jvitali
Copy link

jvitali commented Mar 7, 2021

@X-Ryl669 will try.

@jvitali
Copy link

jvitali commented Mar 7, 2021

@rhapsodyv I have the LCD, micro USB connected to the Pi and the SD card.

Unfortunately I reinstalled Raspbian and forgot to turn on serial log in Octoprint.

@rhapsodyv
Copy link
Member

I think I found the issue. It's related with Keep Alive + Multi serial.

When marlin is executing some slow commands, it may stay in a loop waiting for it to complete. Inside that loop, marlin may call idle periodically. So, the idle will check for new serial commands (in the other serial) to enqueue. When it receive the command, it will reply "busy" (on keep alive function), but it send to the wrong serial port.

Call stack:

idle
  process_next_command
    Gxxx
       while(something)
          idle
            keep_alive (sent to the current serial)
            other serial try to send data and thing it succeed
            data loss and wrong behavior on the first serial (it will cause resend on serial 1)

I could simulate resend commands on octoprint this way. And I fixed making the keep alive reply "busy" for all serial ports... that in fact is correct, because marlin will only handle one command at time, so it need warns all serial ports to stop sending data until it can handle it.

@jvitali can you test this branch, keep both serial enabled?

https://github.com/rhapsodyv/Marlin/tree/multi-serial-and-keep-alive-hang

@jvitali
Copy link

jvitali commented Mar 8, 2021

@X-Ryl669 With #define RX_BUFFER_SIZE 128, #define SERIAL_PORT 1 and #define SERIAL_PORT_2 3 (bot serial ports enabled) so far I'm having no issues (Also no receiving resend requests from the printer), Still waiting a resend request to see how it behaves.
Will test during the week @rhapsodyv branch.

@thinkyhead
Copy link
Member

Please test the bugfix-2.0.x branch to see where it stands. If the problem has been resolved then we can close this issue. If the issue isn't resolved yet, then we should investigate further.

@github-actions
Copy link

github-actions bot commented Jul 7, 2021

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants
@rhapsodyv @ellensp @thinkyhead @simonszu @X-Ryl669 @thisiskeithb @jvitali and others