-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimal encoding of Maxicode #279
Conversation
Why does Wikipedia claim that MaxiCode is public domain? |
Yeah needs a "not in citation given" tag, but true according to Benjamin Nelson's Punched Cards To Bar Codes p.303:
The ISO/IEC spec is copyrighted, as is all their stuff (which they then charge exorbitant amounts to access). |
Re example, that's a huge saving, from 51 pads to 63 |
Here's an A/B (and NS) example which could be one codeword less (immediate LCHB on "b" between NS segments instead of SHB and then LCHB after 2nd NS):
|
A pure A/B example that could be one codeword less (should SHB instead of LCHB then SHA)
|
I had to modify some test cases to suit the output produced by the modified encoder. Are they still valid? |
Yes, all looks good here (the Zero-pad US postcodes one and the Figure B2 are the same as Zint). Can't wait for the big reveal! |
I'm working on a stand-alone C header (just of the encoder) to match the PostScript version. I don't know the Zint source well enough to plug it in. Looking at a function called "maxi_text_process" it seems Zint breaks text up in segments each potentially starting with an ECI. That may just be the way Zint deals with ECI. The specs - :-) - say
I hope it's okay to briefly mention Zint in a discussion about BWIPP. |
Of course! BWIPP and Zint are good playmates. |
Okay so I guess this is the big reveal - except there are a lot of things I want to do differently now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an initial review to check the conversion to feature toggle.
Will review properly once it is ready, and perhaps @gitlost will too.
src/maxicode.ps.src
Outdated
1 1 4 {dup optimal_length n get exch get dup 3 index lt {4 2 roll} if pop pop} for | ||
/j exch def | ||
/state exch def | ||
% State D would be okay too, right? It has its own pad character. Could save a code word. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gitlost Note this if you choose to review.
I've reworked the tests to support GitHub-based review, so the updated patches will been to be rebased on top of this (filename changes in tests). |
Okay let's see if I can figure this out on my own. |
8b8de3a
to
8f3bc4c
Compare
I don't know how to avoid that force-push - it happens when I discard the 9 commits of my github branch. I'm rebasing on my local clone and then syncing that. I bet it works, but the pull request is still closed. |
It's synced back up. |
Oh no what did I do. Argh. |
@gitlost Can I open a pull request for Zint so we can discuss how (if?) to port this? |
Re Zint, please do! Re padding and final latch:
If the codewords fully fit then yes, a final latch isn't necessary. I don't think there's any other possible interpretation of the phrase
from the standard, and its various variants. So this is relevant to this discussion I think! - see below. Re using code set E pad, yes that's totally valid and a nice optimization. As for using latches etc. for code sets C and D, apart from not being mentioned in the standard it doesn't seem to achieve anything over latching to a code set that has a pad, does it? Anyway I've tested the PR and everything looks great. The only hiccup was with that final latch when encoding
which fails with I've still to do some performance analysis which I'll post when done, but a big thumbs up from me. |
The filtering of operations would generate a lot of arrays that were then immediately chucked. I suspect that this would hurt performance seeing as it was done in the inner loop. I think this is also a bit more readable. Especially because I commented it a bit.
The performance results I'm getting aren't too bad, about 80% slower (the mode 4 figures are skewed by the all numeric example which is unlikely data but left in as a worst case scenario):
I made two slight changes to the code, 1) to not put an unnecessary final latch/pad, for both
All in all it's still a big thumbs up from me. (P.S. I ported your Code 128 |
I think there's still a fair bit of opportunity for optimization (with z or s?) but I'm running out of steam. I trust you noticed the commit that avoids generating a bunch of filtered lists of "operations"? |
Oops, that was unintentional.. |
If you want to check and integrate the changes proposed by @gitlost during testing, then I'm happy to accept the PR. That way we get something acceptable over the line (initially behind the flag) and then you can improve things as you find the time and inclination to. |
I had some local changes and I couldn't figure out how to apply the diff so I did it manually. I hope it's correct. Test ran at least... |
Merged manually. |
I squashed things to clean up the history. Most of the attribution ends up in the right place, but with @gitlost gaining credit for reverting code the ECI tables to what was an earlier version by @lyngklip. (I think that life's too short to worry about that.) Anyway, thank you @lyngklip for a couple of great contributions and @gitlost for testing and stewardship! |
I see I missed the party, great stuff @lyngklip. There was another tweak that I did Sunday which filters the states by iterating through the data upfront, and lessens the slowdown for typical data to around 50%. A nice thing is it reduces the all-numeric case to more or less the same as the others. The figures are:
(Note the 10% increase in the mode 4 legacy case which shows these figures are very approximate.) The diff is below, which maybe @lyngklip you could use or if understandably you want to give it a rest, I can make into a PR:
|
That's clever, I think you should add it if you like. My main interest right now is optimal encoding (optimal in the sense most input/shortest output) and the feasibility of applying the cookie-cutter algorithm to other types of bar codes. I'm looking at Data Matrix right now which seems like another step up in complexity from MaxiCode. I can't even tell if the current encoder is already optimal. I've looked a bit at a Rust implementation which allegedly is optimal, but I have a hard time reading Rust. I've been wondering about something with regard to Data Matrix C40 and Text character sets. Am I allowed to pad with shift1 to reach a multiple of 3 - for the purpose of unlatching - not only in the corner cases close to the end but anywhere else? I suspect the answer is no. It seems BWIPP and other libraries only use C40 and Text if multiples of 3 code words can be mapped. I don't know if it could ever improve encoding length; it's just something I thought about. |
Some random comments (off the top of my head):
@lyngklip If you contact me directly via email then I might be able to help further. |
My First Direct Commit 8f07935! |
I can retire now :-) |
I haven't heard about that one - but then I haven't studied barcodes for long. I stumbled upon this MMCC - is that on your radar? Or is it not ready for prime time; I can't tell. I'll probably spend time on Data Matrix and then the other common 2D codes. Not necessarily writing PostScript for them - just understanding them. I don't have any priority or agenda other than procrastinating.
That's very kind :-) |
I haven't looked at it in detail but from a very brief reading it seems academic and lacks mitigations for certain physical phenomena that standardised polychromatic codes have.
That's definitely the best way to keep motivated over the long term... :-) |
A direct adaptation of this algorithm is now in Zint (commit e94727), works like a charm, thanks Bue! |
What do you say - is this test case correct? I produced it by hacking the encoder slightly. My observation was that A and B would only latch C, D, and E after 4 consecutive characters belonging to the same set. However a mix of C, D, and E could still benefit from a latch of either C, D or E. Shall Maxicode receive the "optimal encoding" makeover?