Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal encoding of Maxicode #279

Closed
wants to merge 15 commits into from
Closed

Conversation

lyngklip
Copy link
Contributor

@lyngklip lyngklip commented Oct 30, 2024

What do you say - is this test case correct? I produced it by hacking the encoder slightly. My observation was that A and B would only latch C, D, and E after 4 consecutive characters belonging to the same set. However a mix of C, D, and E could still benefit from a latch of either C, D or E. Shall Maxicode receive the "optimal encoding" makeover?

@lyngklip lyngklip changed the title First test case exposing sub-optimal encode - up for discussion Maxicode: test case exposing sub-optimal encode - for discussion Oct 30, 2024
@lyngklip lyngklip marked this pull request as draft October 30, 2024 15:05
@lyngklip
Copy link
Contributor Author

Why does Wikipedia claim that MaxiCode is public domain?

@gitlost
Copy link
Contributor

gitlost commented Oct 30, 2024

Yeah needs a "not in citation given" tag, but true according to Benjamin Nelson's Punched Cards To Bar Codes p.303:

A version of the code called Code 6 was placed in the public domain, and the name was later changed to Maxicode.

The ISO/IEC spec is copyrighted, as is all their stuff (which they then charge exorbitant amounts to access).

@gitlost
Copy link
Contributor

gitlost commented Oct 30, 2024

Re example, that's a huge saving, from 51 pads to 63

@gitlost
Copy link
Contributor

gitlost commented Oct 30, 2024

Here's an A/B (and NS) example which could be one codeword less (immediate LCHB on "b" between NS segments instead of SHB and then LCHB after 2nd NS):

10 200 moveto (A123456789b123456789bbbA) (parse mode=4) /maxicode /uk.co.terryburton.bwipp findresource exec

@gitlost
Copy link
Contributor

gitlost commented Oct 30, 2024

A pure A/B example that could be one codeword less (should SHB instead of LCHB then SHA)

10 200 moveto (b..A) (mode=4) /maxicode /uk.co.terryburton.bwipp findresource exec

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 5, 2024

I had to modify some test cases to suit the output produced by the modified encoder. Are they still valid?

@gitlost
Copy link
Contributor

gitlost commented Nov 6, 2024

Yes, all looks good here (the Zero-pad US postcodes one and the Figure B2 are the same as Zint). Can't wait for the big reveal!

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 6, 2024

Can't wait for the big reveal!

I'm working on a stand-alone C header (just of the encoder) to match the PostScript version. I don't know the Zint source well enough to plug it in. Looking at a function called "maxi_text_process" it seems Zint breaks text up in segments each potentially starting with an ECI. That may just be the way Zint deals with ECI. The specs - :-) - say

"A specific Extended Channel Interpretation may be invoked anywhere in the encoded message, except where special rules apply in Mode 2 and 3 symbols"

I hope it's okay to briefly mention Zint in a discussion about BWIPP.

@terryburton
Copy link
Member

I hope it's okay to briefly mention Zint in a discussion about BWIPP.

Of course! BWIPP and Zint are good playmates.

@lyngklip lyngklip marked this pull request as ready for review November 7, 2024 14:45
@lyngklip lyngklip marked this pull request as draft November 7, 2024 14:49
@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

Okay so I guess this is the big reveal - except there are a lot of things I want to do differently now.

Copy link
Member

@terryburton terryburton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initial review to check the conversion to feature toggle.

Will review properly once it is ready, and perhaps @gitlost will too.

src/maxicode.ps.src Outdated Show resolved Hide resolved
src/maxicode.ps.src Outdated Show resolved Hide resolved
src/maxicode.ps.src Outdated Show resolved Hide resolved
1 1 4 {dup optimal_length n get exch get dup 3 index lt {4 2 roll} if pop pop} for
/j exch def
/state exch def
% State D would be okay too, right? It has its own pad character. Could save a code word.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gitlost Note this if you choose to review.

@terryburton
Copy link
Member

I've reworked the tests to support GitHub-based review, so the updated patches will been to be rebased on top of this (filename changes in tests).

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

I've reworked the tests to support GitHub-based review, so the updated patches will been to be rebased on top of this (filename changes in tests).

Okay let's see if I can figure this out on my own.

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

I don't know how to avoid that force-push - it happens when I discard the 9 commits of my github branch. I'm rebasing on my local clone and then syncing that. I bet it works, but the pull request is still closed.

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

It's synced back up.

@lyngklip lyngklip marked this pull request as ready for review November 8, 2024 10:20
@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 8, 2024

Oh no what did I do. Argh.

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 8, 2024

@gitlost Can I open a pull request for Zint so we can discuss how (if?) to port this?

@gitlost
Copy link
Contributor

gitlost commented Nov 8, 2024

Re Zint, please do!

Re padding and final latch:

There is something imperfect about that final latch but it's a different discussion. I'll leave it for now.

If the codewords fully fit then yes, a final latch isn't necessary. I don't think there's any other possible interpretation of the phrase

Pad characters are added as needed to fill out the capacity of the symbol

from the standard, and its various variants. So this is relevant to this discussion I think! - see below.

Re using code set E pad, yes that's totally valid and a nice optimization. As for using latches etc. for code sets C and D, apart from not being mentioned in the standard it doesn't seem to achieve anything over latching to a code set that has a pad, does it?

Anyway I've tested the PR and everything looks great. The only hiccup was with that final latch when encoding

10 200 moveto (^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128) (parse mode=4 newencoder) /maxicode /uk.co.terryburton.bwipp findresource exec

which fails with /bwipp.maxicodeMode56TooLong. But that's easily worked around by making it conditional on whether the symbol's full or not, as you've pointed out.

I've still to do some performance analysis which I'll post when done, but a big thumbs up from me.

The filtering of operations would generate a lot of arrays that were then immediately chucked.
I suspect that this would hurt performance seeing as it was done in the inner loop.
I think this is also a bit more readable. Especially because I commented it a bit.
@gitlost
Copy link
Contributor

gitlost commented Nov 10, 2024

The performance results I'm getting aren't too bad, about 80% slower (the mode 4 figures are skewed by the all numeric example which is unlikely data but left in as a worst case scenario):

mode 2
len data
 81 (339010000^029840^029001^0291Z34567890^029UPSN^029102562^029034^029^0291/1^029^029Y^0292201 Second St^029Ft Myers^029FL^030^004)
 97 ([^041>^03001^02996152382802^029840^029001^0291Z00004951^029UPSN^02906X610^029159^0291234567^0291/1^029^029Y^029634 ALPHA DR^029PITTSBURGH^029PA^030^004)

1
    newencoder difn  300  330
    legacy     difl  170  170
2
    newencoder difn  290  330
    legacy     difl  170  170
3
    newencoder difn  290  330
    legacy     difl  170  170

    newencoder tots  880  990 totn 1870
    legacy     tots  510  510 totl 1020

mode 4
len data
 19 (MaxiCode \(19 chars\))
 44 (ABCDabcdAabcABabcABCabcABCDaABCabABCabcABC^233a)
 93 (THIS IS A 93 CHARACTER CODE SET A MESSAGE THAT FILLS A MODE 4, UNAPPENDED, MAXICODE SYMBOL...)
138 (999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999)

1
    newencoder difn  170  200  300  400
    legacy     difl  120  130  130  130
2
    newencoder difn  160  210  290  410
    legacy     difl  120  130  130  130
3
    newencoder difn  160  200  300  400
    legacy     difl  130  130  130  120

    newencoder tots  490  610  890 1210 totn 3200
    legacy     tots  370  390  390  380 totl 1530

I made two slight changes to the code, 1) to not put an unnecessary final latch/pad, for both legacy and newencoder, and 2) to go back to what you had before with one ECI entry in the ops table, as opposed to four separate ones, which improves the performance slightly (maybe 5%). The diff is

diff --git a/src/maxicode.ps.src b/src/maxicode.ps.src
index e95a3cb..945df94 100644
--- a/src/maxicode.ps.src
+++ b/src/maxicode.ps.src
@@ -278,6 +278,8 @@ begin
     /sete charvals 4 get def
 } ctxdef
 
+    /maxlen mode 5 eq {77} {mode 3 le {84} {93} ifelse} ifelse def
+
     encoding (legacy) eq {
 
     % Compute numeric runlengths
@@ -311,7 +313,7 @@ begin
     { % loop
         % Exit when no characters remain latching back to A if necessary
         i msglen eq {
-            cset (seta) ne cset (setb) ne and {
+            cset (seta) ne cset (setb) ne and out length maxlen lt and {
                 la cset load enc
                 /cset (seta) def
             } if
@@ -497,13 +499,13 @@ begin
                                                  {32768 lt {4} {5} ifelse} ifelse} def
 
         % Operations that don't fit directly into the list below
-        /enc_eci1 {c neg 1000000 sub                                    } def   % Extended
-        /enc_eci2 {c neg 1000000 sub dup  -6 bitshift 32 or  exch 63 and} def   % Channel
-        /enc_eci3 {c neg 1000000 sub dup -12 bitshift 48 or  exch               % Interpretation
-                                     dup  -6 bitshift 63 and exch 63 and} def
-        /enc_eci4 {c neg 1000000 sub dup -18 bitshift 56 or  exch
-                                     dup -12 bitshift 63 and exch
-                                     dup  -6 bitshift 63 and exch 63 and} def
+        /enc_eci  {c neg 1000000 sub dup 32 ge  {dup  1024 lt {
+                        dup  -6 bitshift 32 or  exch 63 and} {dup 32768 lt {
+                        dup -12 bitshift 48 or  exch
+                        dup  -6 bitshift 63 and exch 63 and} {
+                        dup -18 bitshift 56 or  exch
+                        dup -12 bitshift 63 and exch
+                        dup  -6 bitshift 63 and exch 63 and} ifelse} ifelse} if} def
         /enc_ns   {0 msg n 9 getinterval {48 sub exch 10 mul add} forall        % Numeric
                                      dup -24 bitshift        exch               % Sequence
                                      dup -18 bitshift 63 and exch
@@ -514,24 +516,21 @@ begin
                               seta msg n 2 add get get} def                     % Shift 3 A
 
         % Table of operations - operating table?
-        /op_tab [   % predicate                        applicable sets      encoding
-            << /can {2 eci eq    } /intake 1 /output 2 /sets 2#11111 /enc {27 enc_eci1  } >>    % ECI1 ABCDE
-            << /can {3 eci eq    } /intake 1 /output 3 /sets 2#11111 /enc {27 enc_eci2  } >>    % ECI2 ABCDE
-            << /can {4 eci eq    } /intake 1 /output 4 /sets 2#11111 /enc {27 enc_eci3  } >>    % ECI3 ABCDE
-            << /can {5 eci eq    } /intake 1 /output 5 /sets 2#11111 /enc {27 enc_eci4  } >>    % ECI4 ABCDE
-            << /can {digits 9 ge } /intake 9 /output 6 /sets 2#11111 /enc {31 enc_ns    } >>    % NS   ABCDE
-            << /can {seta c known} /intake 1 /output 1 /sets 2#00001 /enc {   seta c get} >>    % A    A
-            << /can {setb c known} /intake 1 /output 1 /sets 2#00010 /enc {   setb c get} >>    % B     B
-            << /can {setc c known} /intake 1 /output 1 /sets 2#00100 /enc {   setc c get} >>    % C      C
-            << /can {setd c known} /intake 1 /output 1 /sets 2#01000 /enc {   setd c get} >>    % D       D
-            << /can {sete c known} /intake 1 /output 1 /sets 2#10000 /enc {   sete c get} >>    % E        E
-            << /can {num_a 1 ge  } /intake 1 /output 2 /sets 2#00010 /enc {59 seta c get} >>    % SHA   B
-            << /can {num_a 2 ge  } /intake 2 /output 3 /sets 2#00010 /enc {56 enc_sha2  } >>    % SHA2  B
-            << /can {num_a 3 ge  } /intake 3 /output 4 /sets 2#00010 /enc {57 enc_sha3  } >>    % SHA3  B
-            << /can {setb c known} /intake 1 /output 2 /sets 2#00001 /enc {59 setb c get} >>    % SHB  A
-            << /can {setc c known} /intake 1 /output 2 /sets 2#11011 /enc {60 setc c get} >>    % SHC  ABCDE
-            << /can {setd c known} /intake 1 /output 2 /sets 2#10111 /enc {61 setd c get} >>    % SHD  ABCDE
-            << /can {sete c known} /intake 1 /output 2 /sets 2#01111 /enc {62 sete c get} >>    % SHE  ABCDE
+        /op_tab [   % predicate                                 applicable sets      encoding
+            << /can {eci 1 ge     } /intake 1 /output {out_eci} /sets 2#11111 /enc {27 enc_eci   } >>    % ECI1 ABCDE
+            << /can {digits 9 ge  } /intake 9 /output {6      } /sets 2#11111 /enc {31 enc_ns    } >>    % NS   ABCDE
+            << /can {seta c known } /intake 1 /output {1      } /sets 2#00001 /enc {   seta c get} >>    % A    A
+            << /can {setb c known } /intake 1 /output {1      } /sets 2#00010 /enc {   setb c get} >>    % B     B
+            << /can {setc c known } /intake 1 /output {1      } /sets 2#00100 /enc {   setc c get} >>    % C      C
+            << /can {setd c known } /intake 1 /output {1      } /sets 2#01000 /enc {   setd c get} >>    % D       D
+            << /can {sete c known } /intake 1 /output {1      } /sets 2#10000 /enc {   sete c get} >>    % E        E
+            << /can {num_a 1 ge   } /intake 1 /output {2      } /sets 2#00010 /enc {59 seta c get} >>    % SHA   B
+            << /can {num_a 2 ge   } /intake 2 /output {3      } /sets 2#00010 /enc {56 enc_sha2  } >>    % SHA2  B
+            << /can {num_a 3 ge   } /intake 3 /output {4      } /sets 2#00010 /enc {57 enc_sha3  } >>    % SHA3  B
+            << /can {setb c known } /intake 1 /output {2      } /sets 2#00001 /enc {59 setb c get} >>    % SHB  A
+            << /can {setc c known } /intake 1 /output {2      } /sets 2#11011 /enc {60 setc c get} >>    % SHC  ABCDE
+            << /can {setd c known } /intake 1 /output {2      } /sets 2#10111 /enc {61 setd c get} >>    % SHD  ABCDE
+            << /can {sete c known } /intake 1 /output {2      } /sets 2#01111 /enc {62 sete c get} >>    % SHE  ABCDE
         ] def
 
         % Add idx to each entry
@@ -550,7 +549,7 @@ begin
                     /org_ best_origin m get state get def   % Get the best prior code set
                     best_length m get org_ get              % Get the corresponding length of encoding
                     latch_length_s org_ get add             % Add latch length
-                    op_ /output get add                     % Add output length to yield resulting length
+                    op_ /output get exec add                % Add output length to yield resulting length
                     2 copy gt {
                         exch /op op_ def /org org_ def      % Pick the shortest length and make a note of it
                     } if pop                                % Pop off the longer (or equal) length
@@ -616,7 +615,7 @@ begin
         % If the message codewords exactly fill out all the message
         % regions of the symbol then the latch to A would be unnecessary
         % and would in fact overflow the buffer and cause an error.
-        pad_code state get 0 eq {j 1 add array dup j 58 put} {j array} ifelse
+        pad_code state get 0 eq j maxlen lt and {j 1 add array dup j 58 put} {j array} ifelse
         /padval pad_code final_code_set get def
         /len j def

All in all it's still a big thumbs up from me.

(P.S. I ported your Code 128 newencoder to Zint and got excellent results, as fast or faster than the current code, so will after some more testing and cleanup be committing this - thanks very much!)

@lyngklip
Copy link
Contributor Author

I think there's still a fair bit of opportunity for optimization (with z or s?) but I'm running out of steam. I trust you noticed the commit that avoids generating a bunch of filtered lists of "operations"?

@lyngklip lyngklip closed this Nov 11, 2024
@lyngklip
Copy link
Contributor Author

Oops, that was unintentional..

@terryburton terryburton reopened this Nov 11, 2024
@terryburton terryburton changed the title Maxicode: test case exposing sub-optimal encode - for discussion Optimal encoding of Maxicode Nov 11, 2024
@terryburton
Copy link
Member

I think there's still a fair bit of opportunity for optimization (with z or s?) but I'm running out of steam.

If you want to check and integrate the changes proposed by @gitlost during testing, then I'm happy to accept the PR.

That way we get something acceptable over the line (initially behind the flag) and then you can improve things as you find the time and inclination to.

@lyngklip
Copy link
Contributor Author

I had some local changes and I couldn't figure out how to apply the diff so I did it manually. I hope it's correct. Test ran at least...

@terryburton
Copy link
Member

Merged manually.

@terryburton
Copy link
Member

I squashed things to clean up the history. Most of the attribution ends up in the right place, but with @gitlost gaining credit for reverting code the ECI tables to what was an earlier version by @lyngklip. (I think that life's too short to worry about that.)

Anyway, thank you @lyngklip for a couple of great contributions and @gitlost for testing and stewardship!

@gitlost
Copy link
Contributor

gitlost commented Nov 12, 2024

I see I missed the party, great stuff @lyngklip.

There was another tweak that I did Sunday which filters the states by iterating through the data upfront, and lessens the slowdown for typical data to around 50%. A nice thing is it reduces the all-numeric case to more or less the same as the others. The figures are:

mode 2
len data
 81 (339010000^029840^029001^0291Z34567890^029UPSN^029102562^029034^029^0291/1^029^029Y^0292201 Second St^029Ft Myers^029FL^030^004)
 97 ([^041>^03001^02996152382802^029840^029001^0291Z00004951^029UPSN^02906X610^029159^0291234567^0291/1^029^029Y^029634 ALPHA DR^029PITTSBURGH^029PA^030^004)

1
    newencoder difn  230  270
    legacy     difl  170  160
2
    newencoder difn  240  260
    legacy     difl  170  170
3
    newencoder difn  240  260
    legacy     difl  160  170

    newencoder tots  710  790 totn 1500
    legacy     tots  500  500 totl 1000

mode 4
len data
 19 (MaxiCode \(19 chars\))
 44 (ABCDabcdAabcABabcABCabcABCDaABCabABCabcABC^233a)
 93 (THIS IS A 93 CHARACTER CODE SET A MESSAGE THAT FILLS A MODE 4, UNAPPENDED, MAXICODE SYMBOL...)
138 (999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999)

1
    newencoder difn  150  180  200  210
    legacy     difl  140  140  140  140
2
    newencoder difn  150  170  210  210
    legacy     difl  130  150  140  140
3
    newencoder difn  150  170  200  210
    legacy     difl  140  140  150  140

    newencoder tots  450  520  610  630 totn 2210
    legacy     tots  410  430  430  420 totl 1690

(Note the 10% increase in the mode 4 legacy case which shows these figures are very approximate.)

The diff is below, which maybe @lyngklip you could use or if understandably you want to give it a rest, I can make into a PR:

diff --git a/src/maxicode.ps.src b/src/maxicode.ps.src
index 91240c8..180aa9d 100644
--- a/src/maxicode.ps.src
+++ b/src/maxicode.ps.src
@@ -223,7 +223,7 @@ begin
         [ (Z)  (z)  218  250   26 ]  % 26
         [ eci  eci  eci  eci  eci ]  % 27
         [  28   28   28   28  pad ]  % 28
-        [  29   29   29   29  pad ]  % 29
+        [  29   29   29   29  pd2 ]  % 29
         [  30   30   30   30   27 ]  % 30
         [  ns   ns   ns   ns   ns ]  % 31
         [ ( )  ({)  219  251   28 ]  % 32
@@ -469,6 +469,27 @@ begin
 
     encoding (new) eq {
 
+        % Check states needed (apart from A, which for typical data is always used)
+        /haveb false def /havec false def /haved false def /havee false def
+        /havei false def  % ECI
+        0 1 msglen 1 sub {
+            msg exch get /c exch def
+            c -1000000 le {
+                /havei true def
+            } { c 28 lt c 30 gt or c 32 ne and {  % FS, GS, RS and space in all code sets
+                    setb c known { /haveb true def } if
+                    setc c known { /havec true def } if
+                    setd c known { /haved true def } if
+                    sete c known { /havee true def } if
+                } if
+            } ifelse
+        } for
+
+        % Code sets with a pad code are first pick
+        %          A        B            E            C            D
+        /priority [0 haveb {1} if havee {4} if havec {2} if haved {3} if] def
+        /priority_rest priority 1 priority length 1 sub getinterval def  % May be empty
+
         %                           Prior code set                  Later code set
         %                   A      B      C      D      E
         /latch_sequence [[[     ][63   ][58   ][58   ][58   ]]          % A
@@ -516,21 +537,21 @@ begin
                               seta msg n 2 add get get} def                     % Shift 3 A
 
         % Table of operations - operating table?
-        /op_tab [   % predicate                                 applicable sets      encoding
-            << /can {eci 1 ge    } /intake 1 /output {out_eci} /sets 2#11111 /enc {27 enc_eci   } >>    % ECI  ABCDE
-            << /can {digits 9 ge } /intake 9 /output {6      } /sets 2#11111 /enc {31 enc_ns    } >>    % NS   ABCDE
-            << /can {seta c known} /intake 1 /output {1      } /sets 2#00001 /enc {   seta c get} >>    % A    A
-            << /can {setb c known} /intake 1 /output {1      } /sets 2#00010 /enc {   setb c get} >>    % B     B
-            << /can {setc c known} /intake 1 /output {1      } /sets 2#00100 /enc {   setc c get} >>    % C      C
-            << /can {setd c known} /intake 1 /output {1      } /sets 2#01000 /enc {   setd c get} >>    % D       D
-            << /can {sete c known} /intake 1 /output {1      } /sets 2#10000 /enc {   sete c get} >>    % E        E
-            << /can {num_a 1 ge  } /intake 1 /output {2      } /sets 2#00010 /enc {59 seta c get} >>    % SHA   B
-            << /can {num_a 2 ge  } /intake 2 /output {3      } /sets 2#00010 /enc {56 enc_sha2  } >>    % SHA2  B
-            << /can {num_a 3 ge  } /intake 3 /output {4      } /sets 2#00010 /enc {57 enc_sha3  } >>    % SHA3  B
-            << /can {setb c known} /intake 1 /output {2      } /sets 2#00001 /enc {59 setb c get} >>    % SHB  A
-            << /can {setc c known} /intake 1 /output {2      } /sets 2#11011 /enc {60 setc c get} >>    % SHC  ABCDE
-            << /can {setd c known} /intake 1 /output {2      } /sets 2#10111 /enc {61 setd c get} >>    % SHD  ABCDE
-            << /can {sete c known} /intake 1 /output {2      } /sets 2#01111 /enc {62 sete c get} >>    % SHE  ABCDE
+        /op_tab [   %          predicate                                 applicable sets      encoding
+            havei { << /can {c -1000000 le} /intake 1 /output {out_eci} /sets 2#11111 /enc {27 enc_eci   } >> } if  % ECI1 ABCDE
+                    << /can {digits 9 ge  } /intake 9 /output {6      } /sets 2#11111 /enc {31 enc_ns    } >>       % NS   ABCDE
+                    << /can {seta c known } /intake 1 /output {1      } /sets 2#00001 /enc {   seta c get} >>       % A    A
+            haveb { << /can {setb c known } /intake 1 /output {1      } /sets 2#00010 /enc {   setb c get} >> } if  % B     B
+            havec { << /can {setc c known } /intake 1 /output {1      } /sets 2#00100 /enc {   setc c get} >> } if  % C      C
+            haved { << /can {setd c known } /intake 1 /output {1      } /sets 2#01000 /enc {   setd c get} >> } if  % D       D
+            havee { << /can {sete c known } /intake 1 /output {1      } /sets 2#10000 /enc {   sete c get} >> } if  % E        E
+                    << /can {num_a 1 ge   } /intake 1 /output {2      } /sets 2#00010 /enc {59 seta c get} >>       % SHA   B
+                    << /can {num_a 2 ge   } /intake 2 /output {3      } /sets 2#00010 /enc {56 enc_sha2  } >>       % SHA2  B
+                    << /can {num_a 3 ge   } /intake 3 /output {4      } /sets 2#00010 /enc {57 enc_sha3  } >>       % SHA3  B
+            haveb { << /can {setb c known } /intake 1 /output {2      } /sets 2#00001 /enc {59 setb c get} >> } if  % SHB  A
+            havec { << /can {setc c known } /intake 1 /output {2      } /sets 2#11011 /enc {60 setc c get} >> } if  % SHC  ABCDE
+            haved { << /can {setd c known } /intake 1 /output {2      } /sets 2#10111 /enc {61 setd c get} >> } if  % SHD  ABCDE
+            havee { << /can {sete c known } /intake 1 /output {2      } /sets 2#01111 /enc {62 sete c get} >> } if  % SHE  ABCDE
         ] def
 
         % Add idx to each entry
@@ -560,15 +581,14 @@ begin
             } forall
         } def
 
-        % Unrolled loop to get the best prior code set using a row of
+        % Loop to get the best prior code set using a row of
         % best encoded lengths and a row of latch sequence lengths.
         /get_best_origin {
             /latch_length_s latch_length state get def
-            best_length_0 0 get latch_length_s 0 get add                /orglen exch def     0
-            best_length_0 1 get latch_length_s 1 get add dup orglen lt {/orglen exch def pop 1} {pop} ifelse
-            best_length_0 2 get latch_length_s 2 get add dup orglen lt {/orglen exch def pop 2} {pop} ifelse
-            best_length_0 3 get latch_length_s 3 get add dup orglen lt {/orglen exch def pop 3} {pop} ifelse
-            best_length_0 4 get latch_length_s 4 get add dup orglen lt {/orglen exch def pop 4} {pop} ifelse
+            best_length_0                 0 get latch_length_s 0 get add                /orglen exch def     0
+            priority_rest {
+                /p exch def best_length_0 p get latch_length_s p get add dup orglen lt {/orglen exch def pop p} {pop} ifelse
+            } forall
         } def
 
         /digits 0 def           % Number of contiguous digits seen
@@ -579,10 +599,9 @@ begin
             /n exch def         % Input index
             /c msg n get def    % Input character
 
-            % Keep tabs on digits, characters in code set a, and ECI type
+            % Keep tabs on digits and characters in code set A
             /digits c 48 ge c 58 lt and {digits 1 add} {0} ifelse def
             /num_a  seta c known        {num_a  1 add} {0} ifelse def
-            /eci    c -1000000 le       {out_eci     } {0} ifelse def
 
             % Get rows of interest
             /path_op_0 path_op n get def
@@ -591,14 +610,13 @@ begin
             /best_origin_0 best_origin n 15 and get def
 
             % Get best encoded lengths, then best prior code sets
-            0 1 4 {/state exch def best_length_0 state get_best_length put} for
-            0 1 4 {/state exch def best_origin_0 state get_best_origin put} for
+            priority {/state exch def best_length_0 state get_best_length put} forall
+            priority {/state exch def best_origin_0 state get_best_origin put} forall
         } for
 
         /n msglen def
 
         % Get the best code set to end with. Code sets with a pad code are first pick
-        /priority [0 1 4 2 3] def
         0 max_int priority {dup best_length_0 exch get dup 3 index lt {4 2 roll} if pop pop} forall
         /j exch def
         /state exch def
diff --git a/tests/ps_tests/maxicode.ps.test b/tests/ps_tests/maxicode.ps.test
index 3e02b24..7220ec2 100644
--- a/tests/ps_tests/maxicode.ps.test
+++ b/tests/ps_tests/maxicode.ps.test
@@ -50,7 +50,7 @@
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 55 56 56 45 53 41 57  9 17 61  9 41 13 12  3 23 10 25 23 42 54 28 13  1
-   20 36 43 17 24 60  6 50 50 29 36 49 11 43 19 33]
+   20 36 43 17 24 60  6 50 50 29 36 49 11 43 19 33] debugIsEqual
 
 { % Don't stay in mode A when mode D or even C would be more efficient.
     (^192^224^224^224^192^224^224^224^192^224^224^224^192^224^224^224^192^224^224^224^192)
@@ -68,7 +68,7 @@
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 60 60 40 40  9  9 43 43 14 14 50 50 12 12 53 53 57 57 58 58 36 36 28 28
-   10 10 53 53 37 37 30 30 14 14 5 5 31 31 40 40] debugIsEqual
+   10 10 53 53 37 37 30 30 14 14  5  5 31 31 40 40] debugIsEqual
 
 % Figures
 

@lyngklip
Copy link
Contributor Author

That's clever, I think you should add it if you like. My main interest right now is optimal encoding (optimal in the sense most input/shortest output) and the feasibility of applying the cookie-cutter algorithm to other types of bar codes. I'm looking at Data Matrix right now which seems like another step up in complexity from MaxiCode. I can't even tell if the current encoder is already optimal. I've looked a bit at a Rust implementation which allegedly is optimal, but I have a hard time reading Rust.

I've been wondering about something with regard to Data Matrix C40 and Text character sets. Am I allowed to pad with shift1 to reach a multiple of 3 - for the purpose of unlatching - not only in the corner cases close to the end but anywhere else? I suspect the answer is no. It seems BWIPP and other libraries only use C40 and Text if multiples of 3 code words can be mapped. I don't know if it could ever improve encoding length; it's just something I thought about.

@terryburton
Copy link
Member

The diff is below, which maybe @lyngklip you could use or if understandably you want to give it a rest, I can make into a PR:

That's clever, I think you should add it if you like.

@gitlost PR or just commit it. I don't mind which!

@terryburton
Copy link
Member

My main interest right now is optimal encoding (optimal in the sense most input/shortest output) and the feasibility of applying the cookie-cutter algorithm to other types of bar codes.

Some random comments (off the top of my head):

  • Aztec Code, PDF 417 and MicroPDF 417: I expect that these are amenable to your current backtracking method.
  • Data Matrix and Code One: Basically shares the encoding scheme with Data Matrix, with its various end of data conditions. In fact, Data Matrix ECC 200 was a retrofit of an earlier (long deprecated) design, undertaken by the author of Code One.
  • DataBar Expanded (incl. Stacked): Has a basic three-state "General Purpose Field" (byte, alphanum, numeric).
  • QR Code and Han Xin: TLV segmentation. Han Xin has more modes as described in contrib/development/hanxin_encoding.txt, and the BWIPP encoder needs a refresh based on the ISO spec and doesn't currently optimise.
  • DotCode: Currently a greedy algorithm that can probably be improved.
  • JAB Code: BWIPP has a preliminary encoder based on an old spec. However the LDPC encoding process is likely far too expensive for a PostScript to be practical. So I've lost interest.
  • Ultracode: Don't get too hung up on this yet. It's in the process of gaining votes to undergo international standardisation, so it's still subject to change. I'm sure that the author (Clive Hohberger) would appreciate any improvements on what exists today. But there's plenty of time before this ship sails.

@lyngklip If you contact me directly via email then I might be able to help further.

@gitlost
Copy link
Contributor

gitlost commented Nov 12, 2024

My First Direct Commit 8f07935!

@terryburton
Copy link
Member

My First Direct Commit 8f07935!

I can retire now :-)

@lyngklip
Copy link
Contributor Author

  • Ultracode: Don't get too hung up on this yet. It's in the process of gaining votes to undergo international standardisation, so it's still subject to change. I'm sure that the author (Clive Hohberger) would appreciate any improvements on what exists today. But there's plenty of time before this ship sails.

I haven't heard about that one - but then I haven't studied barcodes for long. I stumbled upon this MMCC - is that on your radar? Or is it not ready for prime time; I can't tell.

I'll probably spend time on Data Matrix and then the other common 2D codes. Not necessarily writing PostScript for them - just understanding them. I don't have any priority or agenda other than procrastinating.

@lyngklip If you contact me directly via email then I might be able to help further.

That's very kind :-)

@terryburton
Copy link
Member

I stumbled upon this MMCC - is that on your radar? Or is it not ready for prime time; I can't tell.

I haven't looked at it in detail but from a very brief reading it seems academic and lacks mitigations for certain physical phenomena that standardised polychromatic codes have.

I don't have any priority or agenda other than procrastinating.

That's definitely the best way to keep motivated over the long term... :-)

@gitlost
Copy link
Contributor

gitlost commented Nov 25, 2024

A direct adaptation of this algorithm is now in Zint (commit e94727), works like a charm, thanks Bue!

@terryburton
Copy link
Member

A direct adaptation of this algorithm is now in Zint (commit e94727), works like a charm, thanks Bue!

I've now switched BWIPP to using the new encoder by default: 414be1e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants