Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract font name, font binary data, font 2D coordinates, image binary data, image 2D coordinates within a PDF page? #31

Closed
ChinaChenMingQuan opened this issue Apr 11, 2024 · 1 comment

Comments

@ChinaChenMingQuan
Copy link

as shown in the title.

deepl translator.

@arklumpus
Copy link
Owner

Hi! Sorry for the delay, I missed this... With version 1.10.1, you can now access fonts and images from the StructuredTextPage object. For example:

//Initialise the MuPDF context. This is needed to open or create documents.
using MuPDFContext ctx = new MuPDFContext();

//Open a PDF document
using MuPDFDocument doc = new MuPDFDocument(ctx, "path/to/file.pdf");

// Obtain a MuPDFStructuredTextPage from the first page of the document, preserving the images.
using MuPDFStructuredTextPage sTextPage = doc1.GetStructuredTextPage(0, preserveImages: true);

// Enumerate the MuPDFStructuredTextPage
foreach (MuPDFStructuredTextBlock block in sTextPage)
{
    // Image block
    if (block is MuPDFImageStructuredTextBlock imageBlock)
    {
        // Access the image.
        MuPDFImage image = imageBlock.Image;

        // Save it to a file
        image.Save(@"C:\Users\Giorgio\Downloads\test.png", RasterOutputFileTypes.PNG);

        // Get an RGB representation
        byte[] pixelBytes = image.GetBytes(PixelFormats.RGB);
    }
    // Text block
    else if (block is MuPDFTextStructuredTextBlock textBlock)
    {
        // Enumerate the lines in the text block.
        foreach (MuPDFStructuredTextLine line in textBlock)
        {
            // Enumerate the characters in the text block.
            foreach (MuPDFStructuredTextCharacter character in line)
            {
                // Each MuPDFStructuredTextCharacter contains a MuPDFFont
                MuPDFFont font = character.Font;

                // Useful properties you access on the font.
                string fontName = font.Name;
                bool bold = font.IsBold;
                bool italic = font.IsItalic;
                bool monospaced = font.IsMonospaced;
                bool serif = font.IsSerif;
            }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants