Optimal encoding of Maxicode #279

lyngklip · 2024-10-30T14:57:26Z

What do you say - is this test case correct? I produced it by hacking the encoder slightly. My observation was that A and B would only latch C, D, and E after 4 consecutive characters belonging to the same set. However a mix of C, D, and E could still benefit from a latch of either C, D or E. Shall Maxicode receive the "optimal encoding" makeover?

lyngklip · 2024-10-30T15:19:02Z

Why does Wikipedia claim that MaxiCode is public domain?

gitlost · 2024-10-30T16:19:08Z

Yeah needs a "not in citation given" tag, but true according to Benjamin Nelson's Punched Cards To Bar Codes p.303:

A version of the code called Code 6 was placed in the public domain, and the name was later changed to Maxicode.

The ISO/IEC spec is copyrighted, as is all their stuff (which they then charge exorbitant amounts to access).

gitlost · 2024-10-30T17:19:59Z

Re example, that's a huge saving, from 51 pads to 63

gitlost · 2024-10-30T19:45:09Z

Here's an A/B (and NS) example which could be one codeword less (immediate LCHB on "b" between NS segments instead of SHB and then LCHB after 2nd NS):

10 200 moveto (A123456789b123456789bbbA) (parse mode=4) /maxicode /uk.co.terryburton.bwipp findresource exec

gitlost · 2024-10-30T22:18:22Z

A pure A/B example that could be one codeword less (should SHB instead of LCHB then SHA)

10 200 moveto (b..A) (mode=4) /maxicode /uk.co.terryburton.bwipp findresource exec

lyngklip · 2024-11-05T07:08:01Z

I had to modify some test cases to suit the output produced by the modified encoder. Are they still valid?

gitlost · 2024-11-06T00:13:54Z

Yes, all looks good here (the Zero-pad US postcodes one and the Figure B2 are the same as Zint). Can't wait for the big reveal!

lyngklip · 2024-11-06T07:39:31Z

Can't wait for the big reveal!

I'm working on a stand-alone C header (just of the encoder) to match the PostScript version. I don't know the Zint source well enough to plug it in. Looking at a function called "maxi_text_process" it seems Zint breaks text up in segments each potentially starting with an ECI. That may just be the way Zint deals with ECI. The specs - :-) - say

"A specific Extended Channel Interpretation may be invoked anywhere in the encoded message, except where special rules apply in Mode 2 and 3 symbols"

I hope it's okay to briefly mention Zint in a discussion about BWIPP.

terryburton · 2024-11-06T10:33:37Z

I hope it's okay to briefly mention Zint in a discussion about BWIPP.

Of course! BWIPP and Zint are good playmates.

lyngklip · 2024-11-07T14:50:37Z

Okay so I guess this is the big reveal - except there are a lot of things I want to do differently now.

terryburton

Just an initial review to check the conversion to feature toggle.

Will review properly once it is ready, and perhaps @gitlost will too.

src/maxicode.ps.src

terryburton · 2024-11-07T15:34:00Z

src/maxicode.ps.src

+            1 1 4 {dup optimal_length n get exch get dup 3 index lt {4 2 roll} if pop pop} for
+            /j exch def
+            /state exch def
+            % State D would be okay too, right? It has its own pad character. Could save a code word.


@gitlost Note this if you choose to review.

terryburton · 2024-11-07T17:09:37Z

I've reworked the tests to support GitHub-based review, so the updated patches will been to be rebased on top of this (filename changes in tests).

lyngklip · 2024-11-07T19:24:19Z

I've reworked the tests to support GitHub-based review, so the updated patches will been to be rebased on top of this (filename changes in tests).

Okay let's see if I can figure this out on my own.

lyngklip · 2024-11-07T19:29:14Z

I don't know how to avoid that force-push - it happens when I discard the 9 commits of my github branch. I'm rebasing on my local clone and then syncing that. I bet it works, but the pull request is still closed.

lyngklip · 2024-11-07T19:31:03Z

It's synced back up.

lyngklip · 2024-11-08T10:22:30Z

Oh no what did I do. Argh.

lyngklip · 2024-11-08T10:31:36Z

@gitlost Can I open a pull request for Zint so we can discuss how (if?) to port this?

gitlost · 2024-11-08T17:44:41Z

Re Zint, please do!

Re padding and final latch:

There is something imperfect about that final latch but it's a different discussion. I'll leave it for now.

If the codewords fully fit then yes, a final latch isn't necessary. I don't think there's any other possible interpretation of the phrase

Pad characters are added as needed to fill out the capacity of the symbol

from the standard, and its various variants. So this is relevant to this discussion I think! - see below.

Re using code set E pad, yes that's totally valid and a nice optimization. As for using latches etc. for code sets C and D, apart from not being mentioned in the standard it doesn't seem to achieve anything over latching to a code set that has a pad, does it?

Anyway I've tested the PR and everything looks great. The only hiccup was with that final latch when encoding

10 200 moveto (^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128^128) (parse mode=4 newencoder) /maxicode /uk.co.terryburton.bwipp findresource exec

which fails with /bwipp.maxicodeMode56TooLong. But that's easily worked around by making it conditional on whether the symbol's full or not, as you've pointed out.

I've still to do some performance analysis which I'll post when done, but a big thumbs up from me.

The filtering of operations would generate a lot of arrays that were then immediately chucked. I suspect that this would hurt performance seeing as it was done in the inner loop. I think this is also a bit more readable. Especially because I commented it a bit.

gitlost · 2024-11-10T12:28:45Z

The performance results I'm getting aren't too bad, about 80% slower (the mode 4 figures are skewed by the all numeric example which is unlikely data but left in as a worst case scenario):

mode 2
len data
 81 (339010000^029840^029001^0291Z34567890^029UPSN^029102562^029034^029^0291/1^029^029Y^0292201 Second St^029Ft Myers^029FL^030^004)
 97 ([^041>^03001^02996152382802^029840^029001^0291Z00004951^029UPSN^02906X610^029159^0291234567^0291/1^029^029Y^029634 ALPHA DR^029PITTSBURGH^029PA^030^004)

1
    newencoder difn  300  330
    legacy     difl  170  170
2
    newencoder difn  290  330
    legacy     difl  170  170
3
    newencoder difn  290  330
    legacy     difl  170  170

    newencoder tots  880  990 totn 1870
    legacy     tots  510  510 totl 1020

mode 4
len data
 19 (MaxiCode \(19 chars\))
 44 (ABCDabcdAabcABabcABCabcABCDaABCabABCabcABC^233a)
 93 (THIS IS A 93 CHARACTER CODE SET A MESSAGE THAT FILLS A MODE 4, UNAPPENDED, MAXICODE SYMBOL...)
138 (999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999)

1
    newencoder difn  170  200  300  400
    legacy     difl  120  130  130  130
2
    newencoder difn  160  210  290  410
    legacy     difl  120  130  130  130
3
    newencoder difn  160  200  300  400
    legacy     difl  130  130  130  120

    newencoder tots  490  610  890 1210 totn 3200
    legacy     tots  370  390  390  380 totl 1530

I made two slight changes to the code, 1) to not put an unnecessary final latch/pad, for both legacy and newencoder, and 2) to go back to what you had before with one ECI entry in the ops table, as opposed to four separate ones, which improves the performance slightly (maybe 5%). The diff is

diff --git a/src/maxicode.ps.src b/src/maxicode.ps.src
index e95a3cb..945df94 100644
--- a/src/maxicode.ps.src
+++ b/src/maxicode.ps.src
@@ -278,6 +278,8 @@ begin
     /sete charvals 4 get def
 } ctxdef
 
+    /maxlen mode 5 eq {77} {mode 3 le {84} {93} ifelse} ifelse def
+
     encoding (legacy) eq {
 
     % Compute numeric runlengths
@@ -311,7 +313,7 @@ begin
     { % loop
         % Exit when no characters remain latching back to A if necessary
         i msglen eq {
-            cset (seta) ne cset (setb) ne and {
+            cset (seta) ne cset (setb) ne and out length maxlen lt and {
                 la cset load enc
                 /cset (seta) def
             } if
@@ -497,13 +499,13 @@ begin
                                                  {32768 lt {4} {5} ifelse} ifelse} def
 
         % Operations that don't fit directly into the list below
-        /enc_eci1 {c neg 1000000 sub                                    } def   % Extended
-        /enc_eci2 {c neg 1000000 sub dup  -6 bitshift 32 or  exch 63 and} def   % Channel
-        /enc_eci3 {c neg 1000000 sub dup -12 bitshift 48 or  exch               % Interpretation
-                                     dup  -6 bitshift 63 and exch 63 and} def
-        /enc_eci4 {c neg 1000000 sub dup -18 bitshift 56 or  exch
-                                     dup -12 bitshift 63 and exch
-                                     dup  -6 bitshift 63 and exch 63 and} def
+        /enc_eci  {c neg 1000000 sub dup 32 ge  {dup  1024 lt {
+                        dup  -6 bitshift 32 or  exch 63 and} {dup 32768 lt {
+                        dup -12 bitshift 48 or  exch
+                        dup  -6 bitshift 63 and exch 63 and} {
+                        dup -18 bitshift 56 or  exch
+                        dup -12 bitshift 63 and exch
+                        dup  -6 bitshift 63 and exch 63 and} ifelse} ifelse} if} def
         /enc_ns   {0 msg n 9 getinterval {48 sub exch 10 mul add} forall        % Numeric
                                      dup -24 bitshift        exch               % Sequence
                                      dup -18 bitshift 63 and exch
@@ -514,24 +516,21 @@ begin
                               seta msg n 2 add get get} def                     % Shift 3 A
 
         % Table of operations - operating table?
-        /op_tab [   % predicate                        applicable sets      encoding
-            << /can {2 eci eq    } /intake 1 /output 2 /sets 2#11111 /enc {27 enc_eci1  } >>    % ECI1 ABCDE
-            << /can {3 eci eq    } /intake 1 /output 3 /sets 2#11111 /enc {27 enc_eci2  } >>    % ECI2 ABCDE
-            << /can {4 eci eq    } /intake 1 /output 4 /sets 2#11111 /enc {27 enc_eci3  } >>    % ECI3 ABCDE
-            << /can {5 eci eq    } /intake 1 /output 5 /sets 2#11111 /enc {27 enc_eci4  } >>    % ECI4 ABCDE
-            << /can {digits 9 ge } /intake 9 /output 6 /sets 2#11111 /enc {31 enc_ns    } >>    % NS   ABCDE
-            << /can {seta c known} /intake 1 /output 1 /sets 2#00001 /enc {   seta c get} >>    % A    A
-            << /can {setb c known} /intake 1 /output 1 /sets 2#00010 /enc {   setb c get} >>    % B     B
-            << /can {setc c known} /intake 1 /output 1 /sets 2#00100 /enc {   setc c get} >>    % C      C
-            << /can {setd c known} /intake 1 /output 1 /sets 2#01000 /enc {   setd c get} >>    % D       D
-            << /can {sete c known} /intake 1 /output 1 /sets 2#10000 /enc {   sete c get} >>    % E        E
-            << /can {num_a 1 ge  } /intake 1 /output 2 /sets 2#00010 /enc {59 seta c get} >>    % SHA   B
-            << /can {num_a 2 ge  } /intake 2 /output 3 /sets 2#00010 /enc {56 enc_sha2  } >>    % SHA2  B
-            << /can {num_a 3 ge  } /intake 3 /output 4 /sets 2#00010 /enc {57 enc_sha3  } >>    % SHA3  B
-            << /can {setb c known} /intake 1 /output 2 /sets 2#00001 /enc {59 setb c get} >>    % SHB  A
-            << /can {setc c known} /intake 1 /output 2 /sets 2#11011 /enc {60 setc c get} >>    % SHC  ABCDE
-            << /can {setd c known} /intake 1 /output 2 /sets 2#10111 /enc {61 setd c get} >>    % SHD  ABCDE
-            << /can {sete c known} /intake 1 /output 2 /sets 2#01111 /enc {62 sete c get} >>    % SHE  ABCDE
+        /op_tab [   % predicate                                 applicable sets      encoding
+            << /can {eci 1 ge     } /intake 1 /output {out_eci} /sets 2#11111 /enc {27 enc_eci   } >>    % ECI1 ABCDE
+            << /can {digits 9 ge  } /intake 9 /output {6      } /sets 2#11111 /enc {31 enc_ns    } >>    % NS   ABCDE
+            << /can {seta c known } /intake 1 /output {1      } /sets 2#00001 /enc {   seta c get} >>    % A    A
+            << /can {setb c known } /intake 1 /output {1      } /sets 2#00010 /enc {   setb c get} >>    % B     B
+            << /can {setc c known } /intake 1 /output {1      } /sets 2#00100 /enc {   setc c get} >>    % C      C
+            << /can {setd c known } /intake 1 /output {1      } /sets 2#01000 /enc {   setd c get} >>    % D       D
+            << /can {sete c known } /intake 1 /output {1      } /sets 2#10000 /enc {   sete c get} >>    % E        E
+            << /can {num_a 1 ge   } /intake 1 /output {2      } /sets 2#00010 /enc {59 seta c get} >>    % SHA   B
+            << /can {num_a 2 ge   } /intake 2 /output {3      } /sets 2#00010 /enc {56 enc_sha2  } >>    % SHA2  B
+            << /can {num_a 3 ge   } /intake 3 /output {4      } /sets 2#00010 /enc {57 enc_sha3  } >>    % SHA3  B
+            << /can {setb c known } /intake 1 /output {2      } /sets 2#00001 /enc {59 setb c get} >>    % SHB  A
+            << /can {setc c known } /intake 1 /output {2      } /sets 2#11011 /enc {60 setc c get} >>    % SHC  ABCDE
+            << /can {setd c known } /intake 1 /output {2      } /sets 2#10111 /enc {61 setd c get} >>    % SHD  ABCDE
+            << /can {sete c known } /intake 1 /output {2      } /sets 2#01111 /enc {62 sete c get} >>    % SHE  ABCDE
         ] def
 
         % Add idx to each entry
@@ -550,7 +549,7 @@ begin
                     /org_ best_origin m get state get def   % Get the best prior code set
                     best_length m get org_ get              % Get the corresponding length of encoding
                     latch_length_s org_ get add             % Add latch length
-                    op_ /output get add                     % Add output length to yield resulting length
+                    op_ /output get exec add                % Add output length to yield resulting length
                     2 copy gt {
                         exch /op op_ def /org org_ def      % Pick the shortest length and make a note of it
                     } if pop                                % Pop off the longer (or equal) length
@@ -616,7 +615,7 @@ begin
         % If the message codewords exactly fill out all the message
         % regions of the symbol then the latch to A would be unnecessary
         % and would in fact overflow the buffer and cause an error.
-        pad_code state get 0 eq {j 1 add array dup j 58 put} {j array} ifelse
+        pad_code state get 0 eq j maxlen lt and {j 1 add array dup j 58 put} {j array} ifelse
         /padval pad_code final_code_set get def
         /len j def

All in all it's still a big thumbs up from me.

(P.S. I ported your Code 128 newencoder to Zint and got excellent results, as fast or faster than the current code, so will after some more testing and cleanup be committing this - thanks very much!)

lyngklip · 2024-11-11T11:37:08Z

I think there's still a fair bit of opportunity for optimization (with z or s?) but I'm running out of steam. I trust you noticed the commit that avoids generating a bunch of filtered lists of "operations"?

lyngklip · 2024-11-11T14:02:17Z

Oops, that was unintentional..

terryburton · 2024-11-11T17:11:17Z

I think there's still a fair bit of opportunity for optimization (with z or s?) but I'm running out of steam.

If you want to check and integrate the changes proposed by @gitlost during testing, then I'm happy to accept the PR.

That way we get something acceptable over the line (initially behind the flag) and then you can improve things as you find the time and inclination to.

lyngklip · 2024-11-11T21:40:48Z

I had some local changes and I couldn't figure out how to apply the diff so I did it manually. I hope it's correct. Test ran at least...

terryburton · 2024-11-11T23:30:23Z

Merged manually.

terryburton · 2024-11-11T23:34:43Z

I squashed things to clean up the history. Most of the attribution ends up in the right place, but with @gitlost gaining credit for reverting code the ECI tables to what was an earlier version by @lyngklip. (I think that life's too short to worry about that.)

Anyway, thank you @lyngklip for a couple of great contributions and @gitlost for testing and stewardship!

gitlost · 2024-11-12T09:56:10Z

I see I missed the party, great stuff @lyngklip.

There was another tweak that I did Sunday which filters the states by iterating through the data upfront, and lessens the slowdown for typical data to around 50%. A nice thing is it reduces the all-numeric case to more or less the same as the others. The figures are:

mode 2
len data
 81 (339010000^029840^029001^0291Z34567890^029UPSN^029102562^029034^029^0291/1^029^029Y^0292201 Second St^029Ft Myers^029FL^030^004)
 97 ([^041>^03001^02996152382802^029840^029001^0291Z00004951^029UPSN^02906X610^029159^0291234567^0291/1^029^029Y^029634 ALPHA DR^029PITTSBURGH^029PA^030^004)

1
    newencoder difn  230  270
    legacy     difl  170  160
2
    newencoder difn  240  260
    legacy     difl  170  170
3
    newencoder difn  240  260
    legacy     difl  160  170

    newencoder tots  710  790 totn 1500
    legacy     tots  500  500 totl 1000

mode 4
len data
 19 (MaxiCode \(19 chars\))
 44 (ABCDabcdAabcABabcABCabcABCDaABCabABCabcABC^233a)
 93 (THIS IS A 93 CHARACTER CODE SET A MESSAGE THAT FILLS A MODE 4, UNAPPENDED, MAXICODE SYMBOL...)
138 (999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999)

1
    newencoder difn  150  180  200  210
    legacy     difl  140  140  140  140
2
    newencoder difn  150  170  210  210
    legacy     difl  130  150  140  140
3
    newencoder difn  150  170  200  210
    legacy     difl  140  140  150  140

    newencoder tots  450  520  610  630 totn 2210
    legacy     tots  410  430  430  420 totl 1690

(Note the 10% increase in the mode 4 legacy case which shows these figures are very approximate.)

The diff is below, which maybe @lyngklip you could use or if understandably you want to give it a rest, I can make into a PR:

diff --git a/src/maxicode.ps.src b/src/maxicode.ps.src
index 91240c8..180aa9d 100644
--- a/src/maxicode.ps.src
+++ b/src/maxicode.ps.src
@@ -223,7 +223,7 @@ begin
         [ (Z)  (z)  218  250   26 ]  % 26
         [ eci  eci  eci  eci  eci ]  % 27
         [  28   28   28   28  pad ]  % 28
-        [  29   29   29   29  pad ]  % 29
+        [  29   29   29   29  pd2 ]  % 29
         [  30   30   30   30   27 ]  % 30
         [  ns   ns   ns   ns   ns ]  % 31
         [ ( )  ({)  219  251   28 ]  % 32
@@ -469,6 +469,27 @@ begin
 
     encoding (new) eq {
 
+        % Check states needed (apart from A, which for typical data is always used)
+        /haveb false def /havec false def /haved false def /havee false def
+        /havei false def  % ECI
+        0 1 msglen 1 sub {
+            msg exch get /c exch def
+            c -1000000 le {
+                /havei true def
+            } { c 28 lt c 30 gt or c 32 ne and {  % FS, GS, RS and space in all code sets
+                    setb c known { /haveb true def } if
+                    setc c known { /havec true def } if
+                    setd c known { /haved true def } if
+                    sete c known { /havee true def } if
+                } if
+            } ifelse
+        } for
+
+        % Code sets with a pad code are first pick
+        %          A        B            E            C            D
+        /priority [0 haveb {1} if havee {4} if havec {2} if haved {3} if] def
+        /priority_rest priority 1 priority length 1 sub getinterval def  % May be empty
+
         %                           Prior code set                  Later code set
         %                   A      B      C      D      E
         /latch_sequence [[[     ][63   ][58   ][58   ][58   ]]          % A
@@ -516,21 +537,21 @@ begin
                               seta msg n 2 add get get} def                     % Shift 3 A
 
         % Table of operations - operating table?
-        /op_tab [   % predicate                                 applicable sets      encoding
-            << /can {eci 1 ge    } /intake 1 /output {out_eci} /sets 2#11111 /enc {27 enc_eci   } >>    % ECI  ABCDE
-            << /can {digits 9 ge } /intake 9 /output {6      } /sets 2#11111 /enc {31 enc_ns    } >>    % NS   ABCDE
-            << /can {seta c known} /intake 1 /output {1      } /sets 2#00001 /enc {   seta c get} >>    % A    A
-            << /can {setb c known} /intake 1 /output {1      } /sets 2#00010 /enc {   setb c get} >>    % B     B
-            << /can {setc c known} /intake 1 /output {1      } /sets 2#00100 /enc {   setc c get} >>    % C      C
-            << /can {setd c known} /intake 1 /output {1      } /sets 2#01000 /enc {   setd c get} >>    % D       D
-            << /can {sete c known} /intake 1 /output {1      } /sets 2#10000 /enc {   sete c get} >>    % E        E
-            << /can {num_a 1 ge  } /intake 1 /output {2      } /sets 2#00010 /enc {59 seta c get} >>    % SHA   B
-            << /can {num_a 2 ge  } /intake 2 /output {3      } /sets 2#00010 /enc {56 enc_sha2  } >>    % SHA2  B
-            << /can {num_a 3 ge  } /intake 3 /output {4      } /sets 2#00010 /enc {57 enc_sha3  } >>    % SHA3  B
-            << /can {setb c known} /intake 1 /output {2      } /sets 2#00001 /enc {59 setb c get} >>    % SHB  A
-            << /can {setc c known} /intake 1 /output {2      } /sets 2#11011 /enc {60 setc c get} >>    % SHC  ABCDE
-            << /can {setd c known} /intake 1 /output {2      } /sets 2#10111 /enc {61 setd c get} >>    % SHD  ABCDE
-            << /can {sete c known} /intake 1 /output {2      } /sets 2#01111 /enc {62 sete c get} >>    % SHE  ABCDE
+        /op_tab [   %          predicate                                 applicable sets      encoding
+            havei { << /can {c -1000000 le} /intake 1 /output {out_eci} /sets 2#11111 /enc {27 enc_eci   } >> } if  % ECI1 ABCDE
+                    << /can {digits 9 ge  } /intake 9 /output {6      } /sets 2#11111 /enc {31 enc_ns    } >>       % NS   ABCDE
+                    << /can {seta c known } /intake 1 /output {1      } /sets 2#00001 /enc {   seta c get} >>       % A    A
+            haveb { << /can {setb c known } /intake 1 /output {1      } /sets 2#00010 /enc {   setb c get} >> } if  % B     B
+            havec { << /can {setc c known } /intake 1 /output {1      } /sets 2#00100 /enc {   setc c get} >> } if  % C      C
+            haved { << /can {setd c known } /intake 1 /output {1      } /sets 2#01000 /enc {   setd c get} >> } if  % D       D
+            havee { << /can {sete c known } /intake 1 /output {1      } /sets 2#10000 /enc {   sete c get} >> } if  % E        E
+                    << /can {num_a 1 ge   } /intake 1 /output {2      } /sets 2#00010 /enc {59 seta c get} >>       % SHA   B
+                    << /can {num_a 2 ge   } /intake 2 /output {3      } /sets 2#00010 /enc {56 enc_sha2  } >>       % SHA2  B
+                    << /can {num_a 3 ge   } /intake 3 /output {4      } /sets 2#00010 /enc {57 enc_sha3  } >>       % SHA3  B
+            haveb { << /can {setb c known } /intake 1 /output {2      } /sets 2#00001 /enc {59 setb c get} >> } if  % SHB  A
+            havec { << /can {setc c known } /intake 1 /output {2      } /sets 2#11011 /enc {60 setc c get} >> } if  % SHC  ABCDE
+            haved { << /can {setd c known } /intake 1 /output {2      } /sets 2#10111 /enc {61 setd c get} >> } if  % SHD  ABCDE
+            havee { << /can {sete c known } /intake 1 /output {2      } /sets 2#01111 /enc {62 sete c get} >> } if  % SHE  ABCDE
         ] def
 
         % Add idx to each entry
@@ -560,15 +581,14 @@ begin
             } forall
         } def
 
-        % Unrolled loop to get the best prior code set using a row of
+        % Loop to get the best prior code set using a row of
         % best encoded lengths and a row of latch sequence lengths.
         /get_best_origin {
             /latch_length_s latch_length state get def
-            best_length_0 0 get latch_length_s 0 get add                /orglen exch def     0
-            best_length_0 1 get latch_length_s 1 get add dup orglen lt {/orglen exch def pop 1} {pop} ifelse
-            best_length_0 2 get latch_length_s 2 get add dup orglen lt {/orglen exch def pop 2} {pop} ifelse
-            best_length_0 3 get latch_length_s 3 get add dup orglen lt {/orglen exch def pop 3} {pop} ifelse
-            best_length_0 4 get latch_length_s 4 get add dup orglen lt {/orglen exch def pop 4} {pop} ifelse
+            best_length_0                 0 get latch_length_s 0 get add                /orglen exch def     0
+            priority_rest {
+                /p exch def best_length_0 p get latch_length_s p get add dup orglen lt {/orglen exch def pop p} {pop} ifelse
+            } forall
         } def
 
         /digits 0 def           % Number of contiguous digits seen
@@ -579,10 +599,9 @@ begin
             /n exch def         % Input index
             /c msg n get def    % Input character
 
-            % Keep tabs on digits, characters in code set a, and ECI type
+            % Keep tabs on digits and characters in code set A
             /digits c 48 ge c 58 lt and {digits 1 add} {0} ifelse def
             /num_a  seta c known        {num_a  1 add} {0} ifelse def
-            /eci    c -1000000 le       {out_eci     } {0} ifelse def
 
             % Get rows of interest
             /path_op_0 path_op n get def
@@ -591,14 +610,13 @@ begin
             /best_origin_0 best_origin n 15 and get def
 
             % Get best encoded lengths, then best prior code sets
-            0 1 4 {/state exch def best_length_0 state get_best_length put} for
-            0 1 4 {/state exch def best_origin_0 state get_best_origin put} for
+            priority {/state exch def best_length_0 state get_best_length put} forall
+            priority {/state exch def best_origin_0 state get_best_origin put} forall
         } for
 
         /n msglen def
 
         % Get the best code set to end with. Code sets with a pad code are first pick
-        /priority [0 1 4 2 3] def
         0 max_int priority {dup best_length_0 exch get dup 3 index lt {4 2 roll} if pop pop} forall
         /j exch def
         /state exch def
diff --git a/tests/ps_tests/maxicode.ps.test b/tests/ps_tests/maxicode.ps.test
index 3e02b24..7220ec2 100644
--- a/tests/ps_tests/maxicode.ps.test
+++ b/tests/ps_tests/maxicode.ps.test
@@ -50,7 +50,7 @@
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 55 56 56 45 53 41 57  9 17 61  9 41 13 12  3 23 10 25 23 42 54 28 13  1
-   20 36 43 17 24 60  6 50 50 29 36 49 11 43 19 33]
+   20 36 43 17 24 60  6 50 50 29 36 49 11 43 19 33] debugIsEqual
 
 { % Don't stay in mode A when mode D or even C would be more efficient.
     (^192^224^224^224^192^224^224^224^192^224^224^224^192^224^224^224^192^224^224^224^192)
@@ -68,7 +68,7 @@
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 33
    33 33 33 33 33 33 33 33 60 60 40 40  9  9 43 43 14 14 50 50 12 12 53 53 57 57 58 58 36 36 28 28
-   10 10 53 53 37 37 30 30 14 14 5 5 31 31 40 40] debugIsEqual
+   10 10 53 53 37 37 30 30 14 14  5  5 31 31 40 40] debugIsEqual
 
 % Figures

lyngklip · 2024-11-12T11:37:58Z

That's clever, I think you should add it if you like. My main interest right now is optimal encoding (optimal in the sense most input/shortest output) and the feasibility of applying the cookie-cutter algorithm to other types of bar codes. I'm looking at Data Matrix right now which seems like another step up in complexity from MaxiCode. I can't even tell if the current encoder is already optimal. I've looked a bit at a Rust implementation which allegedly is optimal, but I have a hard time reading Rust.

I've been wondering about something with regard to Data Matrix C40 and Text character sets. Am I allowed to pad with shift1 to reach a multiple of 3 - for the purpose of unlatching - not only in the corner cases close to the end but anywhere else? I suspect the answer is no. It seems BWIPP and other libraries only use C40 and Text if multiples of 3 code words can be mapped. I don't know if it could ever improve encoding length; it's just something I thought about.

terryburton · 2024-11-12T12:14:38Z

The diff is below, which maybe @lyngklip you could use or if understandably you want to give it a rest, I can make into a PR:

That's clever, I think you should add it if you like.

@gitlost PR or just commit it. I don't mind which!

terryburton · 2024-11-12T12:52:47Z

My main interest right now is optimal encoding (optimal in the sense most input/shortest output) and the feasibility of applying the cookie-cutter algorithm to other types of bar codes.

Some random comments (off the top of my head):

Aztec Code, PDF 417 and MicroPDF 417: I expect that these are amenable to your current backtracking method.
Data Matrix and Code One: Basically shares the encoding scheme with Data Matrix, with its various end of data conditions. In fact, Data Matrix ECC 200 was a retrofit of an earlier (long deprecated) design, undertaken by the author of Code One.
DataBar Expanded (incl. Stacked): Has a basic three-state "General Purpose Field" (byte, alphanum, numeric).
QR Code and Han Xin: TLV segmentation. Han Xin has more modes as described in contrib/development/hanxin_encoding.txt, and the BWIPP encoder needs a refresh based on the ISO spec and doesn't currently optimise.
DotCode: Currently a greedy algorithm that can probably be improved.
JAB Code: BWIPP has a preliminary encoder based on an old spec. However the LDPC encoding process is likely far too expensive for a PostScript to be practical. So I've lost interest.
Ultracode: Don't get too hung up on this yet. It's in the process of gaining votes to undergo international standardisation, so it's still subject to change. I'm sure that the author (Clive Hohberger) would appreciate any improvements on what exists today. But there's plenty of time before this ship sails.

@lyngklip If you contact me directly via email then I might be able to help further.

gitlost · 2024-11-12T12:56:57Z

My First Direct Commit 8f07935!

terryburton · 2024-11-12T13:08:03Z

My First Direct Commit 8f07935!

I can retire now :-)

lyngklip · 2024-11-12T14:13:56Z

Ultracode: Don't get too hung up on this yet. It's in the process of gaining votes to undergo international standardisation, so it's still subject to change. I'm sure that the author (Clive Hohberger) would appreciate any improvements on what exists today. But there's plenty of time before this ship sails.

I haven't heard about that one - but then I haven't studied barcodes for long. I stumbled upon this MMCC - is that on your radar? Or is it not ready for prime time; I can't tell.

I'll probably spend time on Data Matrix and then the other common 2D codes. Not necessarily writing PostScript for them - just understanding them. I don't have any priority or agenda other than procrastinating.

@lyngklip If you contact me directly via email then I might be able to help further.

That's very kind :-)

terryburton · 2024-11-12T20:41:36Z

I stumbled upon this MMCC - is that on your radar? Or is it not ready for prime time; I can't tell.

I haven't looked at it in detail but from a very brief reading it seems academic and lacks mitigations for certain physical phenomena that standardised polychromatic codes have.

I don't have any priority or agenda other than procrastinating.

That's definitely the best way to keep motivated over the long term... :-)

gitlost · 2024-11-25T10:20:08Z

A direct adaptation of this algorithm is now in Zint (commit e94727), works like a charm, thanks Bue!

terryburton · 2024-11-25T14:12:34Z

A direct adaptation of this algorithm is now in Zint (commit e94727), works like a charm, thanks Bue!

I've now switched BWIPP to using the new encoder by default: 414be1e

lyngklip changed the title ~~First test case exposing sub-optimal encode - up for discussion~~ Maxicode: test case exposing sub-optimal encode - for discussion Oct 30, 2024

lyngklip mentioned this pull request Oct 30, 2024

Optimal encoding of Code 128 with option "suppressc" #278

Merged

lyngklip marked this pull request as draft October 30, 2024 15:05

lyngklip mentioned this pull request Oct 30, 2024

Consider enabling Maxicode optimal encoding by default #280

Closed

lyngklip marked this pull request as ready for review November 7, 2024 14:45

lyngklip marked this pull request as draft November 7, 2024 14:49

terryburton reviewed Nov 7, 2024

View reviewed changes

lyngklip closed this Nov 7, 2024

lyngklip force-pushed the optimal-maxicode branch from 8b8de3a to 8f3bc4c Compare November 7, 2024 19:25

lyngklip added 7 commits November 7, 2024 20:26

First test case exposing sub-optimal encode

2521324

Test case exposing sub-optimal encoding supplied by gitlost

24c9508

Another test case supplied by gitlost

87edad8

Line breaks

ca443c0

Updated for optimal encoder

087a48a

Something works

2d4b84e

Add Maxicode feature toggle

902e2ef

Remove some unused arrays

61ba150

lyngklip marked this pull request as ready for review November 8, 2024 10:20

Brainfart

092dd73

lyngklip added 2 commits November 9, 2024 12:17

Optimised inner loop

264cbcb

The filtering of operations would generate a lot of arrays that were then immediately chucked. I suspect that this would hurt performance seeing as it was done in the inner loop. I think this is also a bit more readable. Especially because I commented it a bit.

Merge branch 'bwipp:master' into optimal-maxicode

8b9c53a

lyngklip closed this Nov 11, 2024

terryburton reopened this Nov 11, 2024

terryburton changed the title ~~Maxicode: test case exposing sub-optimal encode - for discussion~~ Optimal encoding of Maxicode Nov 11, 2024

gitlost fixed the final latch issue

147b144

terryburton closed this Nov 11, 2024

Optimal encoding of Maxicode #279

Optimal encoding of Maxicode #279

Conversation

lyngklip commented Oct 30, 2024 • edited Loading

lyngklip commented Oct 30, 2024

gitlost commented Oct 30, 2024

gitlost commented Oct 30, 2024

gitlost commented Oct 30, 2024

gitlost commented Oct 30, 2024

lyngklip commented Nov 5, 2024

gitlost commented Nov 6, 2024

lyngklip commented Nov 6, 2024

terryburton commented Nov 6, 2024

lyngklip commented Nov 7, 2024 • edited Loading

terryburton left a comment

Choose a reason for hiding this comment

terryburton Nov 7, 2024

Choose a reason for hiding this comment

terryburton commented Nov 7, 2024

lyngklip commented Nov 7, 2024

lyngklip commented Nov 7, 2024

lyngklip commented Nov 7, 2024

lyngklip commented Nov 8, 2024

lyngklip commented Nov 8, 2024

gitlost commented Nov 8, 2024

gitlost commented Nov 10, 2024

lyngklip commented Nov 11, 2024

lyngklip commented Nov 11, 2024

terryburton commented Nov 11, 2024

lyngklip commented Nov 11, 2024

terryburton commented Nov 11, 2024

terryburton commented Nov 11, 2024

gitlost commented Nov 12, 2024

lyngklip commented Nov 12, 2024

terryburton commented Nov 12, 2024

terryburton commented Nov 12, 2024

gitlost commented Nov 12, 2024

terryburton commented Nov 12, 2024

lyngklip commented Nov 12, 2024

terryburton commented Nov 12, 2024

gitlost commented Nov 25, 2024

terryburton commented Nov 25, 2024

lyngklip commented Oct 30, 2024 •

edited

Loading

lyngklip commented Nov 7, 2024 •

edited

Loading