-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serial communication errors: checksum and line number not +1 #3680
Comments
PR "Encapsulate Stepper, Planner, Endstops in singleton classes" #3631 can potentially affect the ISR-runtime. |
@Sebastianv650 If you are using RepetierHost in not ping-pong-mode, try ping-pong, or reducing 'Receive Cache Size' to about 75%. If you have the suspicion your LIN_ADVANCE code takes much extra time like when the 'normal' stepperISR goes in dubblestep/quadstep-mode, try to put in:
|
I'm using pronterface with 250000 baud, changed it to 115200 with no sucess. |
Not if the compiler is smart. Firstly, using The compiler should also be smart about this: ISR(TIMER1_COMPA_vect) { stepper.isr(); } It should recognize that it should not do this…
…but that it should do this…
…and, frankly, it should recognize that:
Am I putting too much trust in the compiler? |
I just don't agree with |
Using Promterface translates to me as using ping-pong, That makes overfilling the RX-buffer unlikely. Hard to say what is going on. |
This for me somehow clears the ISR delay errors.. otherwise we would need to be really screwing it up to still affect 115.2bps. Are we positive this is now hardware ? Could it be noise on the serial line ? Noise could increase with the higher stepping on the motors at higher speeds. Do you have a scope ? |
Sorry, no scope available. I can't imagine it's hardware. I'm in the process of printing 70 identical parts, so all tests are done with the exact same gcode. RC4 runs "smooth" even with advance (maybe 4-5 errors per hour), RCBugFix on the other side is flooding the pronterface status window with error messages and 1-2 times per hour the printer is even doing wrong "print" moves to one edge of the print bed (only limited by software end limits I guess).
I might be wrong, but do you think so? If we need to transfer 10 commands for a print, say each one takes 0.5s (I know, only for easy calculation) at 115200 baud and 0.25 at 250000 baud and the ISR is running at a fixed rate. Than it should be more likely that a ISR is "hitting" the transfer of the 0.5s long slow 115200 baud than it's hitting a only 0.25s long transfer? I will build AnHardt#32 into my code and run some tests with it, but it will take some time to get results as my printer is blocked for the next 9h. |
Yes but lowering the baud rate you'd expect to see a difference in the number of errors, if no difference then one of two things could be true:
But the ISR changes were minimal. So last good version was RC4 right ? |
OK to be honest I can't say for sure if lowering the baud rate made it worse. In both cases, the errors come one after the other. If we want to know that precise, I have to do several runs with both baud rates and count an average value. Yes I'm seeing errors also in RC4, but without advance maybe 1 in 10h print. Don't nail be down on the number, but it's a rate where I'm not worried about. |
@Sebastianv650 Are you talking about "host keepalive" messages like "busy:processing", or line number / checksum errors? |
All versions of error messages: Mostly checksum, some line number mismatch, some "no checksum with line number". |
I did @AnHardt debug changes, here is an example:
Somebody has an idea? The line looks OK to me.. |
The actual data exchange looks good, although we don't see all the checksums. But I am puzzled by the error message - is it 806 or 803 that has the error? And why does it then resync at 807? What really happened to 803 to 806... odd. |
Looks bizarre to me. It's telling us that when it tries to interpret line "806" it is seeing the text of line "803." Did I already ask which version of Arduino you are using to build with? |
I'm using 1.6.8. |
Sorry. |
@Sebastianv650 Well, that is interesting. I cannot (quite) imagine what would make the stepper isr significantly slower between RC4 and RC6, but it sure seems to be the case. I'll do more research on optimizing C++ code and see if there's a way to keep the encapsulated |
@AnHardt @Sebastianv650 |
Thanks for the hints, @AnHardt, @thinkyhead ! I did the modification and it's now printing the corrupted line, here one of the many ones I get:
It's missing single chars in commands, in this example the "3" in the X coordinate. Now it makes also sense that my change yesterday (putting customizedSerial.checkRx() also in the stepper ISR) made it less worse: The "input" is checked more frequently and missing chars are not happening that often any more. But I'm not smart enough to know the real answer: Why it is missing chars when the main stepper ISR takes more time? |
After thinking about what @AnHardt wrote:
and looking how Marlin handles available chars, I would sum up as followed. Please correct me if I'm wrong: So the answer why I get much more errors with enabled advance is clear: I'm doing a few more calculations inside stepper ISR. This may take the ISR close to the 40µs. To proof this, I set the baud rate to 115200 again. I did this already as I saw the errors when I started the advance coding, without success. But I optimized the code since theese day. With 115200 baud the Atmega has 2x the time to get a char from the buffer. With the latest advance code, this solved 100% of the error messages! :-) (Remember: I'm using the RC4 version. I have to see if that's also true for RC6.) This leads two following options: Your opinion? |
I'm considering that option, which basically means reverting all the Singletons back to (still better-encapsulated) flat C code with Honestly, I do not know how much the compiler output truly differs. But my expectation was that the compiler would be smart enough to see the Singletons as basically just being used as "namespaces" to encapsulate some functions. But linkage is something compilers are weirdly religious about sometimes. I wish we had better profiling so we could see where all our cycles are being spent. That Marlin Simulator could probably help with this, since it would also be affected by the linkage.
If the performance issues and communication errors are manifest in Again, proper profiling would be most helpful! |
Something worth reading… http://www.drdobbs.com/implementing-interrupt-service-routines/184401485 |
It really points to starvation, that is the processor is running out of time and pretty much only working on the ISR. I`m not sure if what is being generated with c++ coding is slower than C, though I must say the probability of that is quite high. It does depends on the compiler flags/settings. Note that arduino IDE uses a quite optimized version of these flags. It won`t get (much) better than that. |
I love it, when a theory can be confirmed by a test. 250000 <-> 115200 baud. Receiving with 115200bd should be ok. Sending with 115200bd may be a problem. Marlin is busy waiting while writing. (https://github.com/MarlinFirmware/Marlin/blob/RC/Marlin/MarlinSerial.h#L122 https://github.com/MarlinFirmware/Marlin/blob/RC/Marlin/MarlinSerial.h#L152 ++)
It already has. But no (software) interrupt can interrupt an other interrupt. The interrupt priority only schedules, who is the next, if more than one interrupt request is in the register, after the current interrupt has finished. |
Thank you testers, feedbackers! Without you we who know code well, but not the platform or the compiler, often feel a little lost in the dark. But through this kind of experience, much is learned. Deep gratitude to all of you! |
Or you could put transmition on a TxInterrupt routine....
|
Maybe we should start a new thread - this one has been confused with other things.......... |
@VanessaE I bet that's the reason, MINIMUM_STEPPER_PULSE will increase the duration of each stepper ISR and therefore it's much more likely to miss a char. @ruggb I'm afraid if we create a new thread we will start from scratch while there is already a lot of investigations done here. Therefore I can also say that Marlin is missing single chars from a command that can't be received due to stepper ISR is executed at the specific time. |
Chars can get lost if Maybe a #ifndef USBCON
customizedSerial.checkRx(); // Check for serial chars.
#endif inside the loop helps. Ideally between starting and stopping the pulses. |
I think the obvious question is: Does the hardware have enough horsepower to run both the ISR and handle the i/o data stream simultaneously? If it does not, then there are three options:
|
I don't see that as a main problem of processing power, @Blue-Marlin is bringing it to the point. If we could tell the ATMega that serial communication has a higher priority than the stepper ISR and that it should interrupt the stepper ISR, there would be no problem. But that's impossible. Putting some With Blue Marlins calculation, there is a limit of 87µs at 115200 baud and 40µs at 250000 baud. @ruggb only disabling features that do calculations inside the stepper ISR can minimize the error rate. There are not much: |
I was under the impression that there was something like a DSR in the comm so that the host waited until the printer was ready to accept data. Is that the response string and if so is Marlin sending it prematurely? IE instead of sending it as soon as it receives the command it should wait till it is ready to rcv another command - and this has to do with filling up the buffer. |
In the old days, we used RTS/CTS for flow control on old 8-bitters with fast modems - does that exist here? |
No. |
In one of these threads someone said that when the buffer was full a message was sent to the host to wait. |
@ruggb I think you are mixing up the buffers. A message is sent when the buffer for incoming commands (not single chars!) is full. Marlin isn't sending messages after every single char.. |
@Sebastianv650 I understand that - I wasn't speaking about characters, I was implying messages or commands - I think. |
Uhh. That hurts. You couldn't be wronger. Please look that up. The problem is, the only one byte buffer in the UART. If you do not pick the byte before the next arrived, it's lost. If a new byte arrived a interrupt is issued and queued. The (standard) interrupt scheme of the AVRs does not allow interrupts to be interrupted. So if an (stepper/advanced) interrupt takes to long, a char is lost. For that we already check for new chars in the stepper interrupt (code cited above). It could make sense to do the same in the extruder/advanced interrupt if that takes too long. |
The extruder ISR takes only a few single µs. In fact it is so short that I couldn't measure it.
It wouldn't be as chaotik as a real non-blocking ISR, but no char should be lost.. |
@Blue-Marlin That is what my perception was. The question is, what happens if Marlin does not send an OK? - is there a timeout? host sends cmd, marlin sends OK, host sends cmd, Marlin misses it, time out, host sends another cmd, Marlin sends OK, error generated because line number was for the missed one. If there is a timeout, then what I am saying is maybe just ensuring that Marlin can accept the next command b4 it sends the OK, and not sending the OK just because it has queued the command would solve the issue. And if this doesn't make sense it is because I have no clue how this really works. |
There always seems to be more room for optimizations. Pre-calculating values in the planner. Replacing division with multiplication in the ISR. Things like that. It's going to be helpful to keep profiling the stepper ISR to see where improvements can be made, and where it would be best to call into On that subject, I see that Too bad it can't be called in between pulse start and pulse stop. It would eliminate the need for the
If the character missed is within the line number, that will do it. |
I had same Issue recently. due to bad USB cable i bought shielded one and all run nicely now. |
Hmm. I've been using Cura 4.0.0 beta and Slic3r 1.3.1-dev to send jobs over serial (USB) to an Ender 3. Maybe a few dozen prints, no errors. In Slic3r, noticed the "Line Number is not Last Line Number+1" message in the serial text, but it would only display it once upon connection to printer, and seemed to work fine after that. Never failed mid-print yet. So researching this error, it seemed like a big potential problem, so decided to build and upload the latest 1.1.9 Marlin bugfix fw. Now getting an endless supply of "Line Number is not Last Line Number+1" in Slic3r and it refuses to print anything. Prints fine from Cura though! |
@mj1911 |
In case it helps, I see this error as continuous when it happens, every message fails at 115200. I find that closing the slicer and resetting the printer clears the issue. This seems to occur only when I start the slicer program under some as yet not determined situations. I have seen it start just after a new print starts, the head will be as some random spot on the surface, and every message sent by the slicer is failing. So far this has only happened on the first layer. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
4 similar comments
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
As mentioned in the advance dev thread, I'm getting communication errors when printing over USB. The error messages are:
and
Printer: Lulzbot TAZ 5 (Rambo board). I'm also writing for others from the Lulzbot forum, where we have threads from time to time that are also based on communication problems.
What I know through tests:
I have not the skill to track this down in detail without help. I guess the main ISR can interrupt something in the USB communication, if that happens the transfer gets corrupted. There are more errors as the ISR rate or duration is increased because the probability increases that an ISR happens at the time where something with the communication is done.
If there are already ideas to test in some branches or forks, I would be happy to test them.
The text was updated successfully, but these errors were encountered: