Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an additional mixed-blitter? #1223

Closed
joseluis opened this issue Dec 17, 2020 · 33 comments
Closed

add an additional mixed-blitter? #1223

joseluis opened this issue Dec 17, 2020 · 33 comments
Labels
nonfix closed without a successful fix (invalid, wontfix) userquestion not quite bugs--inquiries from users
Milestone

Comments

@joseluis
Copy link
Collaborator

This C project seems to have an optimized algorithm to choose the most appropriate symbols https://github.com/csdvrx/derasterize maybe we could use something like that?

Examples:
lemur
snake
![snake_original](
image

PS found out about it on the interesting sixel support thread for microsoft terminal

@joseluis joseluis added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 17, 2020
@csdvrx
Copy link

csdvrx commented Dec 17, 2020

If you can wait a bit, I'm going to release a new version that's much faster. It was done to support videos with mpv in tmux-sixel

@dankamongmen
Copy link
Owner

Hey there, @csdvrx ! I've seen your work, and thought it good. It's nice to get all we terminal hackers 'round the fire.

I'll take closer look at this soon. @joseluis , does image quality beat us? Like your pretty snake, I move more quickly when threatened =].

@dankamongmen
Copy link
Owner

dankamongmen commented Jan 8, 2021

@joseluis , running derasterize.c and notcurses-view on sample/snake.jpg seems to generate almost indistinguishable images, except notcurses-view takes .074s, while derasterize.c takes .303s, about 4x the notcurses-view time.

[schwarzgerat](127) $ time  notcurses-view -t0 samples/snake.jpg
Term: 45x80 vte-256color (VTE with xterm 256-colors)

 notcurses 2.1.4 by nick black et al 
  45 rows 80 cols (56.25KiB) 16B cells 256 colors+RGB
  compiled with gcc-10.2.1 20201224, little-endian
  terminfo from ncurses 6.2.20201114
  avformat 58.45.100 avutil 56.51.100 swscale 5.7.100
1 render, 80.53µs total (80.53µs min, 80.53µs max, 80.53µs avg)
1 write, 15.64ms total (15.64ms min, 15.64ms max, 15.64ms avg)
132.05KiB total (132.05KiB min, 132.05KiB max, 132KiB avg)
0 failed renders, 0 failed writes, 0 refreshes
RGB emits:elides: def 4:37 fg 3543:55 bg 3511:50
Cell emits:elides: 3600/0 (0.00%) 90.24% 1.53% 1.40%
real 0m0.074s
user	0m0.033s
sys	0m0.008s
[schwarzgerat](0) $ 

vs

[schwarzgerat](0) $ time ./derasterize.c samples/snake.jpg > /dev/null
real	0m0.303s
user	0m0.318s
sys	0m0.014s
[schwarzgerat](0) $

2021-01-08-052321_1619x925_scrot

if derasterize.c looked better when you filed this bug, it's probably because we weren't yet using sextants by default. now, derasterize.c does use things like THREE EIGHTHS BLOCK, which we don't, and maybe ought consider. but i suspect that sexblitter is roughly equivalent. even where we can't use sexblitter (e.g. alacritty), quadblitter seems about as good:

2021-01-08-053018_1611x1439_scrot

so if you find a case where we're noticeably worse, i'm very interested, but for now i'm feeling like the fastest terminal cowboy in the west [blows smoke from guns]. =]

@csdvrx , please don't interpret this as a knock on your superb work! just proud of my own =].

@dankamongmen dankamongmen added nonfix closed without a successful fix (invalid, wontfix) userquestion not quite bugs--inquiries from users and removed documentation Improvements or additions to documentation enhancement New feature or request labels Jan 8, 2021
@dankamongmen dankamongmen added this to the 2.2.0 milestone Jan 8, 2021
@joseluis
Copy link
Collaborator Author

joseluis commented Jan 8, 2021

Athough sexblitter has been an improvement, it does have less resolution due to the lack of vertical and horizontal eight blocks, diagonal bloks, diagonal lines, crosses, etc. The racoon image is the one that better shows the extended shapes palette.

I'd still add this new mixed blitter at some point, once Charlotte releases the new version. No need to rush in anycase.

@dankamongmen
Copy link
Owner

I'll take a look at raccoon, thanks for the heads-up. @csdvrx , do you intend to expose your blitter as a library? It looks like it's only built as a binary at the moment.

it does have less resolution due to the lack of vertical and horizontal eight blocks
well, sexblitter gives you 3x vertical and 2x horizontal. they don't seem to be using the 1/8 horizontal blocks, though i'm sure it'd be trivial to add them. a bigger concern of mine is how to deal with the mixed mode. they're doing, so far as i can tell with a quick look at adjudicate(), a comparison of how similar the glyphs are to the desired output. does this work at different scales? hrmmm, i guess it does, probably

....i guess i'm thinking: we have a number of Y rows and X columns to which we're going to output, right? so we then resize the input to be Y * {height of blitter} and X * {width of blitter}, trusting ffmpeg/oiio to do a resize way better than we will, and then have a multiple of our comparison geometries. there's no guarantee of that here. which maybe isn't a problem? i'm unsure, will ponder it. this is why, for instance, we don't have a superblitter that utilizes both quads and sextants -- if we're mapping 3x2 pixels to a single cell, there's no
way we ever use an UPPER HALF BLOCK nor LOWER HALF BLOCK, because we'd need 1.5-pixel matches. but hrmmm, that might actually be a better solution -- imagine you have input pixels 0x000000, 0x888888, and 0xffffff. the best blit of that may well be 0x444444 + 0xbbbbbb using half blocks, rather than what you'd get right now (not better than 0x444444 + 0x444444 + 0xffffff). HRMM. yeah, i think i've been assuming some false constraints in my sexblitter. HRMMMM. hrm!

oh hey, it looks like some of this code is due my girl @jart. [wave!] (we worked together at Google NYC)

another worry is that derasterize approach doesn't seem easily composed with other planes, which it needn't worry about, but we do. see e.g. #1068 which would be severely exacerbated.

@jart
Copy link

jart commented Jan 8, 2021

I think @csdvrx has textmode supremacy here. Look at that lemur.

@dankamongmen
Copy link
Owner

2021-01-08-104108_1610x925_scrot

i personally think the undetailed area on the left looks better, color-wise, in y'alls', and mine is better with the shadow on the eye and less weird banding above the eye / bottom of the top rind of the ear. they're certainly close.

@jart
Copy link

jart commented Jan 8, 2021

If we judge better by how closely the terminal version resembles the original, then derasterize has the advantage.

image

What are you doing to the color? There appears to be significantly more green with notcurses. Are you applying the sRGB <-> Linear exp2.4 companding formula? What's your scaling algorithm?

For what it's worth, here's what printimage.com produces. You'll notice it creates lines of details around edges since magic kernel sharp (magikarp) does sharpening, which can help things look a little less washed out at this scale.

image

@jart
Copy link

jart commented Jan 8, 2021

Also here's some useful torture images.

maxwell100 yellow and blue make pink (nearly all monitors fail this test due to subpixel layout)
maxwell100

scaling uses gamma correction test:
gamma-1 0-or-2 2

die welle: scaling blurs frequencies outside nyquist limit test:
diewelle1920x1080

@csdvrx
Copy link

csdvrx commented Jan 13, 2021

Oops I only see that now!

Hi everybody!! Sorry I'm late to the party!

@joseluis , running derasterize.c and notcurses-view on sample/snake.jpg seems to generate almost indistinguishable images, except notcurses-view takes .074s, while derasterize.c takes .303s, about 4x the notcurses-view time.

Hmm, so you mean your code is currently faster that something we haven't touched in, like, a year? And where I only noticed a horrible bug 1 month ago? (oops)

Challenge accepted!

so if you find a case where we're noticeably worse, i'm very interested, but for now i'm feeling like the fastest terminal cowboy in the west [blows smoke from guns]. =]

Hmm, so Google cow-boys are picking up fights? I say let's meet in 2 days under the city clock :)

@csdvrx , please don't interpret this as a knock on your superb work! just proud of my own =].

Well, give me a day or two, and our work will crush yours :)

I think @csdvrx has textmode supremacy here. Look at that lemur.

@jart yes, visually, I agree: we do have textmode supremacy with derasterize.

But also, we could do so much better!! I've had a few other ideas since last year- like focusing on specific details that need improvement, in separate passes. And how to do that.

First, I think we should also think about the low hanging fruits: for example a Hough transform pass to find circles suitable for replacement with . ⴰ ⸰ ° ᴼ o ᴑ ○ O O ◯ ⚪ which may be cheap and fast, and could give us a better idea of how to handle multipass.

Another simple thing: combining characters. Go to https://onlineunicodetools.com/add-combining-characters and use the derasterize chars on the left:
█▄▀
▐▌▝▙▗▛▖▜▘▟▞▚
▔▁▁▂▃▄▅▆▇ █▉
▊▋▌▍▎
◢◣◥◤/╱\╲⋰⋱⤫xX╳⌿⍀
⑊﹨◺◿◸◹øØ⍉
⍁⍂━┉┃╋╹╺╻╸┏┛┓┗═⎻⎼__|¦

You will see that most of them don’t combine to anything helpful, at least with the font I’m using.... but some do!

And this depend on the font, so imagine a code generator that:

  • Takes a set of unicode chars
  • Takes a set of combining chars
  • Saves the resulting glyphs on some canvas (ex: 8x16) along with their “recipe” in a .h

This way , interesting combinations could be automatically identified, on a per-font basis, at compile time.

Take for ex: █͞▄ : the first one is a dud, but the second one would contribute to a denser mapping along the Y axis, so we would keep it.

If we keep the current approach of "throw stuff on the wall and keep what sticks", this could be used for finding the combining characters that "work well" and could be part of our set: if we don't want to do a compile time optimization on a per-font basis, a statistical analysis could tell us which combining chars are the most likely to be supported by a font -- and thus worth including as their positive contribution for a denser mapping is more important than their negative contribution to the curse of dimensionalities.

But this is just a first step. We could be more clever and do something better, a sort of reverse memoization: the goal would be to improve accuracy while keeping speed mostly the same (or at least control and limit the slowness): after the first pass has selected which unicodes work best (ex: ▄) a second pass could try to improve the bad fits (say the bottom b=20% of the fitness scores) by sprinkling a little magic dust made of unicode combining characters - the ones we already know in advance that will play well with that unicode, so we save time and don't test everything and the kitchensync.

This would not be very daring, but at least it would guarantee a non-regression of the fit (only improvements!) for a time cost that could be grossly estimated in advance by the size of the set we want to try.

This gives me ideas: like, if it's applied to a video, the b=20% could be a function of the "time left" to the next frame given the frame rate - so if the frame is too complicated, don't bother, b=0%

But that's just a very specific usecase. Now think about the bigger picture (quite literally!): combining characters are generally thin, which means they could help us for the high frequencies (fine lines etc) where it's easy to see we are currently underperforming.

Do you know Zalgo?

I mean stuff like กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ (check https://www.reddit.com/r/Unicode/comments/jd2d04/_/ for inspiration)

Some people want to sanitize Zalgo, like https://stackoverflow.com/questions/10414864/whats-up-with-these-unicode-combining-characters-and-how-can-we-filter-them

I say let's use this wonderful possibility instead! Zalgo could be super helpful with fine details, like the lemur hair, that we currently don't do right. Imagine a band pass, isolating the zones of fine details, and applying a hefty dose of stacked up (and down, and sideway...) diacritics - that could give us the fine lines that are missing from the picture. And using the reverse memoization (like above, one accent is for vertical lines, the other for 45 degrees for a given width etc) no need to compute each combination to select the best: just use whatever computer vision technique you like to isolate fine lines, and add some Zalgo to approximate them the best we can, piecewise.

Maybe it's a lame idea and it will crash and burn, but at least we will know why it. And it could be fun :)

For what it's worth, here's what printimage.com produces. You'll notice it creates lines of details around edges since magic kernel sharp (magikarp)

Yeah I remember our discussions of the magic kernel :) It's cool you implemented it :)

Please consider backporting some stuff to derasterize, as it may then find its way into tmux!! (more on that below)

@csdvrx , do you intend to expose your blitter as a library? It looks like it's only built as a binary at the moment.

@dankamongmen no, but the license is super permissive (thanks to @jart!) so feel free to contribute that!

Personally, instead of doing a library, I have integrated derasterize into tmux. This gives a much improved version of tmux-sixel: the idea is to intercept and recode sixel images on the fly, with derasterize, for all the lame terminals (like, cough cough, a certain Microsoft Terminal) that don't support sixels, so people can still enjoy pictures in their terminals!

It's very early, but at the moment, it does great on test, and not just on static pictures: you can play a video to a sixel output, and on a terminal that doesn't support sixels, you get moving ascii art instead :)

I've used that to display the output of xorg-sixel into Microsoft Terminal (yeah I know I'm weird lolol)

It's taking longer than expected, because I want to release a full set of things, and use tmux-sixel to make a statement.

Long story short: its creation was inspired by https://bugzilla.gnome.org/show_bug.cgi?id=729204

What I read there was very concerning: some people think it's ok to strategize, play politics, and delay or block a feature that people want, and some good people have already submitted patches for, like sixels, just because adding that may reduce the chances of success of their "new" pet format - something that doesn't exist yet, and that won't exist for a while, and that's currently depriving people of sixels.

I think that's super lame and that they need an attitude adjustment, John Cena style.

So tmux-sixel was made to bring back the libvte maintainers to planet earth: They refuse to integrate patches? They dont want the VTE terminals to support sixels? Ok, I'll take that as a given, as I certainly can't fork it all and maintain patches.

But instead, I can give their users what they wants: graphics in the terminal, through a tool that they can't stop, because it makes any terminal "somehow support" sixels (or at least a derasterize version of said sixels :) even if they really REALLY don't want to. That's the deviousness of tmux-sixel :)

They think "adding sixel support is straight harmful"?

Challenge accepted! tmux-sixel will deliver harm to them :)

[EDIT: typos]

@jart
Copy link

jart commented Jan 13, 2021

Concerning performance, printvideo.com has some permissively licensed code worth borrowing. It can play hd opera videos in the terminal at 60fps.

turandot

It can also play 4k videos in the terminal from a single cpu in a single process without threads (no gpu) without breaking a sweat.

love4k

But the code that makes it possible isn't exactly simple:

@dankamongmen
Copy link
Owner

Hmm, so you mean your code is currently faster that something we haven't touched in, like, a year? And where I only noticed a horrible bug 1 month ago? (oops)

sorry, i wasn't trying to pick a fight at all, just saying i saw no reason to go replacing our code wholesale. as noted, it's impressive work!

@jart
Copy link

jart commented Jan 13, 2021

Notcurses and Derasterize are both good projects written by some of the smartest terminal hackers on earth. I think competition between friends is fun. Notcurses is already perfect just the way it is. @csdvrx has some interesting ideas for improvement. I am convinced the color discrepancy is worth further attention. Please take that offline with me if there's factors re: color that I'm not considering here. I am always yours and at your service.

@dankamongmen
Copy link
Owner

one question i have for you two ladies: you're embracing a wide diversity of glyphs, whereas i'm working strictly with box-drawing characters (halfblocks, quadblocks, sextants). my main motivations here are:

(1) simple, completely regular mapping and,
(2) (more importantly) equivalence among fonts (indeed, several terminals draw the box drawing characters directly, rather than use the font, and indeed this is what i do for the linux console where we're unlikely to have them available in the font)

how are you modeling your glyphs, when you don't know what font is being used? or do you claim to know this? sure, there's only so many ways to draw say WHITE CIRCLE WITH LOWER LEFT QUADRANT, but there are a few ways, and while they're superficially similar, the small differences seem (to me) like they would eliminate much of the benefit of such fine glyph-matching. am i missing something?

@dankamongmen
Copy link
Owner

dankamongmen commented Jan 13, 2021

(also, @csdvrx , i'd be sad if you couldn't beat notcurses-view on speed -- even in that simple case, i've still got to drag around a bunch of code for implementing the full TUI thing -- multiplanes, efficient redraws, all that kind of garbage. ncdirect operates a bit faster (see ncls, for example), but yeah, no way am i going to beat a purpose-optimized single-pass image blitter. if you run perf on notcurses-view or ncls, about a third of the time for a typical image view is FFmpeg playing with itself anyway =].)

but i welcome the competitive pressure!

once you're competitive, anyway =] =] =].

hack on!

@jart
Copy link

jart commented Jan 13, 2021

With derasterize the modeling is based on an 8x8 bitboard. It's biased towards PragmataPro.

      /* ▁ *\
    251b▕┛▎box drawings heavy up and left
      \* ▔ */
    G(0,1,1,0,
      0,1,1,0,
      0,1,1,0,
      1,1,1,0,
      1,1,1,0,
      0,0,0,0,
      0,0,0,0,
      0,0,0,0),

The code is generic enough that the array could be shortened easily to only those characters in IBM CP-437. However the extreme extended set of characters is specifically what makes derasterize so cool. It takes it further than any other project. It's also the most computationally expensive, since all you need for IBM CP-437 is a 4x4 board.

The Linux framebuffer console is certainly missing a lot of characters. I wouldn't be too concerned about supporting that since (a) I'm not even sure how maintained it is anymore and (b) there's a relatively straightforward dma pixel api for it. I made an attempt with printvideo.com to support it not too long ago.

Windows Consolas is one font that I feel really ought to support more characters. With printvideo.com what I did was just have flags and keymappings for choosing between CP-437 and the extended UNICODE set. Since there really is no sane way to know, across terminals, what the underlying font is. For example, on MacOS Terminal.App, the user is generally required to go a step further and tune the kerning in its settings in order to get blocks to not have thin spaces between them!

@csdvrx
Copy link

csdvrx commented Jan 13, 2021

Hmm, so you mean your code is currently faster that something we haven't touched in, like, a year? And where I only noticed a horrible bug 1 month ago? (oops)

sorry, i wasn't trying to pick a fight at all, just saying i saw no reason to go replacing our code wholesale. as noted, it's impressive work!

@dankamongmen it was a joke :) It's rare to find other people with my interests, so I was just trying to tickle your competitive spirit, to have good reasons for us all to release cool code!!

Notcurses and Derasterize are both good projects written by some of the smartest terminal hackers on earth. I think competition between friends is fun.

@jart that's totally the spirit of what I meant! Some linux terminal hackers give me the creeps, but here all I feel is good vibes!

one question i have for you two ladies: you're embracing a wide diversity of glyphs, whereas i'm working strictly with box-drawing characters (halfblocks, quadblocks, sextants).
how are you modeling your glyphs, when you don't know what font is being used? or do you claim to know this?

@jart already explained:

With derasterize the modeling is based on an 8x8 bitboard. It's biased towards PragmataPro.

For derasterise latest version, it's currently a 8x16 bitboard (which caused a few issues with 128 bit stuff, but it was fun to learn about bitmasks and bitlogic!).

For tmux-sixels, it's both: it has a fallback to a lower res bitboard for characters where it doesn't matter (ex: half blocks)

sure, there's only so many ways to draw say WHITE CIRCLE WITH LOWER LEFT QUADRANT, but there are a few ways, and while they're superficially similar, the small differences seem (to me) like they would eliminate much of the benefit of such fine glyph-matching. am i missing something?

Consider what the differences will be: in the worst case, they will be off by a few pixels, but you will still have a circle. What matters more: to be off by a few pixels, or to lose a shape?

I postulate it won't matter, as the proportionality of the diameter of the circle will be kept inside the font (ex: o will always be about half of O). It even offers a good excuse to use less computationally intensive code: instead of trying each of your say 4 circles in each block, run 4 Hough transform at the picture level, one for each diameter. Use the results to optimally "align" the chunks (if most of your circles coverage area would fall on the border inbetween 2 blocks, for a one off picture, just introduce a margin). Then put the correct circle in the chunks where most of the circle coverage area falls.

(also, @csdvrx , i'd be sad if you couldn't beat notcurses-view on speed

I'll try! derasterize is my fun Christmas time project, mostly to learn stuff. It's not super optimized either. tmux-sixel will be different, meaner, a software with a purpose :)

but i welcome the competitive pressure!

This, yey!

once you're competitive, anyway =] =] =].
hack on!

Damn, with that I've got to release today, not tomorrow!!

@csdvrx
Copy link

csdvrx commented Jan 13, 2021

However the extreme extended set of characters is specifically what makes derasterize so cool. It takes it further than any other project. It's also the most computationally expensive, since all you need for IBM CP-437 is a 4x4 board.

The latest version uses 60 glyphs and a 8x16 board. It got a few optimizations to remain usable so your cool math stuff is temporarily gone :(

I've stopped working on it to focus on tmux-sixel.

Still, I think it could beat notcurses. I just got to clean up and release. My public gloating will force me to act on that and stop procrastinating :)

@csdvrx has some interesting ideas for improvement.

I started discussing them offline with @hpjansson recently, who has similar interests to us and wrote the Chafa library. We're mostly talking about making a tool that takes a font, and dump glyphs (and combined glyphs), so we can optimize the font-variant part. You could benefit from that too, given what you said about fonts. Would you like to join in the discussion?

I am convinced the color discrepancy is worth further attention.

To me it looks like a math issue, possibly due to a rounding error when changing colorspace.

Please take that offline with me if there's factors re: color that I'm not considering here. I am always yours and at your service.

Thanks, likewise!!!

I really like what you did, if you need help on some part involving stats, just ask!

@dankamongmen
Copy link
Owner

i should look back into this now that 2.4.0 is out and see what all can be improved

@cloud11665
Copy link

Hello and first of all i want to apologize for necroposting on this thread.
I'm in the middle of rewriting the mpv tct backend and this thread looks like the perfect place to share my knowladge and ask some questions.

Currently, only the quadblitter is implemented and it's quite fast due to copious ammounts of optimizations on the rendering side, but that's not the thing im here for. I'm here, because i wanted to get some feedback on the rasterizer side of things.

I've came up with a fast and accurate solution for planar (every pixel combination supported), but color bound representations (quadrants and sextants): For starters, a implementation of the K-Means algorithm is O(n^k), but since there are only 2 colors per rune, k=2, meaning that we don't have to fuck with any randomized heuristics. The kmeans I'm using is modified a bit, because after finding the clusters, I compute the nearest neighbours and make them the color of the cluster, meaning that we don't get washed out colors due to linear interpolation. (I'm also working on a SSE4 implementation, but there are still some errors, and the speedup is negligible even for sextants)

Anyways, here is my version of the lemur:
image

As for non-planar blitters, I was running with a 2x6 bit matrix per rune, supporting both quadrants and sextants and other glyphs. Since there are only 12 bits in the matrix, I have a 4096 element lut, that for masks defined in the runeset returned them, and for the ones that were not defined, it defaulted to nearest sextant. (I only have some "old" pictures, because the code is on my second computer and I still haven't pushed it)
image
Forgive the hideous artifacts, they are probably some off by 1 error in the rasterizer.

@jart
Copy link

jart commented Apr 21, 2022

For an apples to apples comparison, here's @cloud11665's renderer cropped:

mpv-2022

Here's @csdvrx's derasterize at that size, which in this case appears to be configured to use the STB scaling algorithm:

derasterize-2022

Here's @jart's printimage.com at that size, which is using the Gyrados (Generalized Magic Kernel) scaling algorithm:

printimage-2022

Here's @jart's printvideo.com at that size, which is the same as printimage but it uses the Magikarp (John Costella's Magic Kernel) scaling algorithm:

magikarp-2022

Yours is certainly one of the best we've seen so far, aside from derasterize. How fast have you made it so far? For instance, derasterize achieves the highest quality by being the slowest. On the other hand, the printvideo magikarp renderer is designed to play 60fps high-def opera videos in the terminal without the assistance of a gpu.

  • derasterize takes 4,284 ms
  • printimage.com + gyarados takes 190 ms
  • printimage.com + magikarp takes 20 ms

Those measurements are for scaling/blitting/printing. In other words, (1) how long it takes to decimate the image to the terminal's size, plus (2) how long it takes to turn its pixels into ANSI+UNICODE text, plus (3) how long it takes to write to /dev/null.

In any case, very exciting stuff. Thank you for telling us you're working on it. I'll look forward to reading the code when you feel comfortable sharing it.

@cloud11665
Copy link

Thanks for the quick response @jart !

In terms of performance in the current implementation, everything is up to the user, but I get the best results with --profile=sw-fast (cpu only). You can try it out for yourself by compiling my fork of mpv and running it with --vo=tct --vo-tct-algo=quad --profile=sw-fast.
As far as improvements go I'm currently focusing on the quantizer / blitter, because my encoder is already very fast. I have also tried to approach the problem of blitting from a different, more heuristic angle, and I think I have found some very important information that I wanted to share.

  1. Matrices of  size (k, 2k) are the optimal choice, because it makes the scaling constant for both axes.

  2. We can give up the 2x6 matrix kernels for smaller, faster (Ad.1) 2x4 kernels by introducing some nonuniformity.

matrix_kernels

2x4 kernels mean only 2^8 possible combinations, which is both small enough to be easily memoaizable (at 8 bytes per bitmask) and fine-tunable by hand. Sextants give us 64 glyphs and quadrants give us 16, meaning that even with those alone, we have a ~31% coverage of the whole matrix, which in and of itself is a great score that can be improved just by adding more glyphs. And just like last time, when a bitmask is not a sextant nor a quadrant, we approximate it to the nearest sextant.

Another benefit to 8 piece matrices is that they can fit in a avx2 256-bit int vector, but It is important to not get distracted with premature optimizations, because it may just be, that the scalar quantizer is fast enough.

PS. When compiling the fork, there is a possibility of failure, because it uses an old version of libplacebo. You just have to fetch the upstream, there won't be any merge conflicts.

@hpjansson
Copy link

Nice work. If you want another point of comparison, you may want to look at Chafa. It uses bigger matrices (8x8, and 16x8 for wide characters) and is intended to be very general. You can have hundreds of glyphs and still be fast enough for animation by filtering with the popcount builtin to get the hamming distances.

@cloud11665
Copy link

@hpjansson Nice, very nice.
For starters, thank you for commenting. Thanks to chafa, today i have learned, that vte codes are chainable.

While the idea of hamming distances using popcnt came to my mind, doesn't it require iterating over every glyph in the glyphset to find out which one has the lowest distance?
Hdist also lacks spatial coherence (please correct me if I am worng) and that could pose some issues.

Another thing that may come off as controversial, is my personal dislike for non-rectengular shapes. While other glyphs help increase the fidelity, after just a couple extra glyphs, it starts to come off as libcaca 2.0 (when running with a low resolution), which i try to avoid at all cost.

While it looks like chafa is focusing on the single frame picture side of things, I am aiming for smoothest video playback experience, which makes sense, because mpv is a video player.
I have also promised to make 720p (as in the number of character pixels) playback in terminal feasible. It's a pipe dream, but a pipe dream that i want work towards.

image
People on the dev irc seem to dissagree... But I don't care 😃

@hpjansson
Copy link

@hpjansson Nice, very nice. For starters, thank you for commenting. Thanks to chafa, today i have learned, that vte codes are chainable.

While the idea of hamming distances using popcnt came to my mind, doesn't it require iterating over every glyph in the glyphset to find out which one has the lowest distance? Hdist also lacks spatial coherence (please correct me if I am worng) and that could pose some issues.

It does, but if you pack the glyph bitmaps as an array of u64 (in the 8x8 case), you can xor them all with the shape you're looking for using SIMD and calculate the popcounts pretty quickly. When I've benchmarked this, the receiving terminal has always been the limiting factor. Not sure what you mean by spatial coherence; you do get the right shapes this way, not just coverage amount. E.g. you can use it to pick the correct sextants in a sextant-only scenario.

Anyway, it's true that more specialized approaches such as your sextant one (or Notcurses' specialized blitters) are always going to be faster, and it looks pretty good too.

Another thing that may come off as controversial, is my personal dislike for non-rectengular shapes. While other glyphs help increase the fidelity, after just a couple extra glyphs, it starts to come off as libcaca 2.0 (when running with a low resolution), which i try to avoid at all cost.

It's a matter of taste and use case. I think at a minimum, the 1/8 block symbols improve the output, but I also want to be able to print different flavors of retro art and support less capable terminals/printers. E.g. old-school ASCII/Baudot code teletypes should work with the same pipeline by just throwing a few switches. Or how about a C64, or a ZX81? Obviously it's not for everyone, and in some cases it's stepping out of the realm of the practical, but it feels more complete this way. Maybe it doubles as a homage to the character art genre(s)...

While it looks like chafa is focusing on the single frame picture side of things, I am aiming for smoothest video playback experience, which makes sense, because mpv is a video player. I have also promised to make 720p (as in the number of character pixels) playback in terminal feasible. It's a pipe dream, but a pipe dream that i want work towards.

image People on the dev irc seem to dissagree... But I don't care smiley

Sounds good to me, and probably achievable with the right terminal. Happy hacking.

@cloud11665
Copy link

cloud11665 commented Apr 21, 2022

By spatial coherence, I have meant this situation:

given an input of

input:        glyph a:        glyph b:
X             XX               
   X          X  X               X
  XX            XX              XX
 XXX           XXX            XXXX

a has a higher Hdist than b, but it's clear that a is a better choice, due to having more details.

An alternative approach would be the minimum sum of absolute differences (this heuristic is used in the PNG spec to determine the best postfilter for a given row) between the glyph and the input, but that would be VERY slow

edited to make it more of an edge case

@hpjansson
Copy link

Oh, I see. I deal with that by using the hamming distance to pick a set of candidates with low distance (size of set adjustable with a performance switch), and then deciding between those using sum of squared differences on RGB or DIN99 pixels.

@cloud11665
Copy link

Damn, that's nice. Thanks for your insight!

Now I am left wondering what approach to choose...
2^8 is small enough to fit everything in the cache and a cache lookup is like 20 cycles, so I will stick with it, but now I know how to would handle bigger matrices.

@cloud11665
Copy link

@hpjansson
One more question.
What color quantization algorithm are you using ?

@hpjansson
Copy link

Nothing special, just median cut with a shellsort. The sixel encoder has an interesting palette mapping approach, but for character cells it's much simpler, at least if you're using the universal 216-color cube + grays.

@cloud11665
Copy link

Ok, I have implemented the special quad + sextant kernel, and here are the results:

without linear interpolation on a per-centroid basis:
image

with lerp:
image

Here is also the code for rounding to the nearest sextant

struct {char data[5]; uint16_t width;} charmap[256] = {
    [0b00000000] = {" ", 1},
    /* ... */
    [0b01010101] = {"▐", 3}, //U+2590  RIGHT HALF BLOCK
    [0b10100000] = {"▘", 3}, //U+2598  QUADRANT-1
    /* ... */
    [0b00001010] = {"▖", 3}, //U+2596  QUADRANT-4
    [0b10000000] = {"🬀", 4}, //U+1FB00 BLOCK SEXTANT-1
    /* ... */
    [0b01111111] = {"🬻", 4}, //U+1FB3B BLOCK SEXTANT-23456
};
for (int i=0; i<256; i++) {
    if (chrmap[i].width) continue;
    int newmask = i;
    if (i & 0b00101000) newmask |= 0b00101000;
    if (i & 0b00010100) newmask |= 0b00010100;
    chrmap[i] = chrmap[newmask];
}

@cloud11665
Copy link

Now, the only thing left in terms of image quality is the scaling algorithm, because as far as I know, mpv uses Lanczos by default, which isn't the best in terms of reducing ringing artifacts. But for now, I'll experiment with the different downscalers (provided by libswscale) and their parameters.

Because the code is long due a refactor and some standardization is very much needed. After the refactor I'll port the current drawing optimizations to the new "algo" (mpv's way of saying blitter) and implement more additional optimizations that I have thought up during the last month.

@csdvrx
Copy link

csdvrx commented May 4, 2022

(late to the party as usual)

It's very interesting direction, and totally worth the necroposting :) @cloud11665 I'll be eager to try your mpv when you're done!

Now, for the glyph difference in #1223 (comment) I think you should look besides simple approaches, while still considering the computational cost: just fancy it up a little bit!

Looking at your example, glyph A is a better option because it keeps a piece of information: even if it's badly approximated, it's better than losing the detail: the lemur example is nice because you can quickly see that effect at play on the chin hairs: when they're gone, it's like having a band-pass on the picture as only the big detail remains. Please forgive this bad analogy, but hopefully it's good enough to drive the point that you want to keep the lemur chin hair!

As for how, I would try to keep using Hdist but with a twist: segmenting the glyph (here we have 3 regions) and comparing the number of segments to those of the candidate glyph (which can be precomputed to save time): here, glyph A has 3 regions, while glyph B only has 2.

This should act as a prefilter: outside pathological cases, no glyph with only 2 regions should be used to approximate something that has 3, at the risk of losing details such as the lemur chain hair.

This would also have the nice side benefit of limiting the input set: the best way to optimize is to avoid computing what you know won't be needed!

In a way, this would be akin to a bloom filter on the set of candidate glyphs, something you could easily improve besides the number of regions by adding other metrics, like border angularity (here 45 degrees) to exclude even more glyphs (say the square blocks that have vertical or horizontal borders).

If you use the angles, you may need to make that a set (as u2598 quadrant upper left has both a vertical and horizontal border), but you get the idea: it has 2 regions and horizontal+vertical borders, meaning it is going to be a poor candidate anyway, so you might as well exclude it from the set on which you'll run computations: it may not be a problem with a limited number of glyph, but it will scale non-linearly with the more glyph you add, since every glyph of the set needs to be compared to every chunk of the image!

This simple approach is not going to be perfect: but it might help with other things like emergent artifacts ("scaling blurs frequencies outside nyquist limit test" as reminded by @jart ) by preserving the difference as long as possible given the set of glyphs available - which you will be able to grow without impacting the computation time too much thanks to this bloom-filter like approach (and maybe even use composing characters?)

joseluis added a commit to andamira/textos that referenced this issue Aug 29, 2023
- misc. updates.

----
# improve unsafe

- add #Safety section to unsafe functions
- add unsafe sub-features, like in devela THINK

-----------
- Add some pixel fonts a short selection, and option to import
  - (receiving some kind of slice, typed or interpreted
    - (array from one type to another as long as they both implement X
    - (this pattern can be reused throughout all libera

---
- implement From/To Char enum from devela (TODO)

---

- [ ] bring functions from devela
  - strcount
  - buffer fmt
  - strings …

----

- WIP TRY add TryFrom/ From?
  - DESIGN Decide whether to fail, (TryFrom) or return From converting up until capacity.

------
- add new search methods using memchr

----
- add new NulTerminatedString types
  - crate
  - https://github.com/CAD97/cstr8/

- take ideas from other crates
  - https://crates.io/crates/fstr (conversion of just the right size?)

-----

update Egc

- [ ] impl TryFrom Egc (using string) for both static
- … ops

----------

impl From<CharXX> for given string type aliases.

----
update unicode strings

- new type `StaticNonSpecificString`.
- make `StaticNonNulString` a type alias of `StaticNonSpecificString`.

- [ ] TODO generalize strings even more:
  - `StaticNonSpecificString<const CHAR: &str, const CAP: usize> {}`
    - specialize StaticNonNulString as a type alias:
      `pub type StaticNonNulString<const CAP: usize> = StaticNonSpecificString<'\x00', CAP>;`

----
- pop_egc
- push_egc (convenience)

- push_utf8_bytes (from a byte slice) (impl from functions from_utf8?
- pop utf8_bytes?

------
- graphemes? &[egc] ?

----
## IDEAS

chars:

- [ ] new constructor for `Chars` trait and for concrete types.

- add TryFrom<T: Chars>? or as a methods in chars? try_to_XXX

- new char types e.g.: CharMath, CharBoxDrawing
  - those would be... enums :)
    - macro to generate an enum from a list of correspondences.

- [ ] TODO: impl partialEq & partialOrd between all 4 char types

- [ ] TODO: add more constants?
  - [ ] NUL

strings:

- add more alias string sizes? 96 (32×3), 192 (64×3) 384 (128 * 3)

- [ ] MAYBE make counter_string a CounterString trait
  - [ ] impl over String, str, StaticStringU8, StaticNonNullString.

- add non-unicode strings (sixbit?) less bit? with custom translation
  using custom tables. (link to a slice of the specific const-generic size)
- generalize over unicode strings and non-unicode strings… single trait?

------
- TODO: add more methods?
    - escape_unicode
    - escape_debug
    - escape_default

- wrap fns that returns iterators: to_lowercase, to_uppercase
  - https://doc.rust-lang.org/nightly/std/primitive.char.html#method.to_lowercase

----
- make tests, and examples.
- impl push_str & try_push_str & try_push_str_complete for StaticStringU8

----

add indent format

- add own `Align` enum, & impl from fmt::Alignmnent, instead of using that.

-------

- TextCell
- TextGrid (via ladata/Grid2?) to libera type grid2d?? (recuperar cuadra como grid2d?
- TextLine (rows, or
- TextRope
- traits
  - Ascii
  - Utf8
  - Utf16
- struct Str<const BYTES: usize> {} // owns its buffer
-

---

alignment functions WIP examples

- think about using cow

-----
- THINK
  - no_std functions (how to group them)
  - a trait over String?

-------

- THINK alternatives
  - dirty utf versions?
  - stateful struct to customize these things?

-------
## improve box drawing chars

- separate lines
  - add dotted
  - add half
  - add thin to heavy
  - eight vertical & horizontal lines (U 15)
  - diagonal lines (U15)

- add block elements
  - shades
    - half block shades
  - quadrants
  - sextants
  - eight vertical & horizontal blocks
  - eight vertical & horizontal blocks upper & right (u15)
  - corners
  - diagonal blocks (U15)

- add a binary encoding

----

-----
## alt. unicode-width
- add an opinionated wrapper over unicode-width
  - maybe create traits depending on UnicodeWidthStr / UnicodeWidthChar
    - save custom tables for special symbols
    - for the rest of symbols, derive to unicode-width
  - deal with >2 width chars
  - deal with https://github.com/unicode-rs/unicode-width/issues/

----

- From rational to select unicode char on filling

--------------

I want to provide scaffolding for apunta bin + revela lib)
a text buffer that refreshes, that can be analized, strings positioned…
areas detected, layers... analisys at certain frequency / Hz

---

- refactor cargo manifest

----
# functions beneficial to implement in a Rust library for text manipulation

- https://chat.openai.com/c/17b13959-8073-42ef-bcf4-0b9ef59eb97c

-----------
# rope
- example crop

alternatives
- https://docs.rs/crop
- https://docs.rs/jumprope
- https://docs.rs/ropey/*/ropey/struct.Rope.html
- https://docs.rs/any-rope/*/any_rope

----

# TODO
- unicode columns based on width + override specially long characters
  - (perfect ad-hoc hash-map associated)
  - https://github.com/AldaronLau/unicode-columns/blob/stable/src/lib.rs

-----

# UNICODE BLOCKS
- https://docs.rs/unicode-blocks/0.1.5/unicode_blocks/struct.UnicodeBlock.html

- `block` module. re-export crate.
- desired API:
  'a'.block() -> UnicodeBlock
  UnicodeBlock.name()
  UnicodeBlock.start()
  UnicodeBlock.end()
  UnicodeBlock.contains('a')

------

# UNICODE SEGMENTATION
- https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/index.html

------

# mixed blitter
- dankamongmen/notcurses#1223

----

add collections of unicode characters with metadata

- allow to support pixel ttf fonts, etc...

- e.g.: dots: … ‥ . · etc…

codepages
- https://crates.io/crates/codepage-strings
- https://crates.io/crates/oem-cp

# IDEAS
- recreate sixbit, in a more general way… choosing bw 64 & 128-bit codepages
  - https://crates.io/crates/sixbit/

--------
- no_std formatting?
  - https://doc.rust-lang.org/core/macro.format_args.html
  - https://doc.rust-lang.org/core/fmt/struct.Arguments.html

--------
- ASCII utilities
  - see: https://doc.rust-lang.org/std/ascii/trait.AsciiExt.html#tymethod.eq_ignore_ascii_case

---
- CASE utilities
  - https://stackoverflow.com/questions/38406793/why-is-capitalizing-the-first-letter-of-a-string-so-convoluted-in-rust

# ISSUES
- "Available on non-crate feature `safe` only."
  rust-lang/rust#43781 (comment)

## FONTS (TODO move to `trazas` crate?)
- https://news.ycombinator.com/item?id=34689213

# LEARN FFI cstring, etc.
- https://www.reddit.com/r/rust/comments/s3x1e3/const_const_u8/
  - u8 is always 8-bits, but it's a mistake to use u8 rather than std::os::raw::c_char

# LEARN unicode
- https://unicode.org/glossary/#byte_order_mark
- https://en.wikipedia.org/wiki/Homoglyph#Unicode_homoglyphs

# CRATES

- https://github.com/bcmyers/num-format (also for numera, ladata…)
- > https://docs.rs/memchr/2.5.0/memchr/

- https://crates.io/crates/texcore
- https://crates.io/crates/tectonic  :) a complete TeX/LaTeX engine

- https://crates.io/crates/character-set (for inspiration) (source not in github)
  - https://docs.rs/crate/character-set/0.4.0/source/src/builtin/general_category.rs

# WAITING for
- https://doc.rust-lang.org/nightly/core/ascii/enum.Char.html
  - rust-lang/rust#110998

- rust-lang/rust#109814 string.leak for 1.72.0

## DISCARDED IDEAS

- [ ] remove Char types, move to devela?
  - better not, because I need good trait coupling…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nonfix closed without a successful fix (invalid, wontfix) userquestion not quite bugs--inquiries from users
Projects
None yet
Development

No branches or pull requests

6 participants