Fix issue 62 #63

hderms · 2012-11-08T20:54:38Z

This fixes an issue I opened on the main repo found at #62

The feature proposed adds a new option to extracting images via Docsplit.extract_images. If you pass an additional argument in the form :and_return => :images, you receive an array of the paths of extracted images as a return value, instead of the path to the intermediate PDF, in the case of powerpoints.
Example:

Docsplit.extract_images('/tmp/some_ppt.ppt', :size => '1000x', :format => [:png, :jpg])

With a return value of:

["/tmp/some_ppt.png", "/tmp/some_ppt.jpg"]

The default return-value behavior of this method is preserved when that option is not specified, which is to return an array of PDFs that are returned by ensure_pdfs.

The justification for this feature is that I feel like users would benefit from getting the paths to the images created immediately upon calling the extract_images method. Having them determine these paths themselves seems out of spirit with the nature of this gem.

KurtPreston · 2012-11-12T20:45:18Z

lib/docsplit/image_extractor.rb

          previous = size if @rolling
        end
      end
+      case @return_value
+      when :images


Perhaps this return type should be called "image_paths" instead of "images"?

hderms · 2013-02-22T16:48:04Z

Fixed inaccurate text in body of pull request. Just to be clear I modified the return value behavior for every case I could find that was appropriate, causing functions to return an array of file paths to the extracted data rather than the intermediate PDF, as it was previously.

KurtPreston · 2013-03-27T20:37:41Z

Any idea when this might get merged in? This is a great feature.

antoinelyset · 2013-06-03T15:06:08Z

lib/docsplit/text_extractor.rb

@@ -29,18 +29,23 @@ def initialize
    def extract(pdfs, opts)
      extract_options opts
      FileUtils.mkdir_p @output unless File.exists?(@output)
+      pdfs = pdfs.is_a?(Array) ? pdfs : [pdfs]


You can do :

pdfs = Array(pdfs)

Thanks for the input. I was misinformed about the nature of Array()

antoinelyset · 2013-06-03T15:06:41Z

👍

dmayer · 2013-08-16T15:27:00Z

Is there anything one can do to help getting this merged? It is rather messy to write code around extract_images to determine the generated file names based on "reverse engineering" the naming scheme and doing string manupulations. Thanks!

bridgway · 2013-08-16T15:49:46Z

+1 @dmayer's commet. I also think it would useful to have a similar feature for extract_pages as well so one can easily determine the generate file names when a PDF is split into individual pages. Cheers

sandstrom · 2013-12-10T13:28:36Z

ping @knowtheory @jashkenas

antoinelyset · 2013-12-10T17:38:28Z

If you're interested I did a ruby gem for this :

https://github.com/antoinelyset/poleica

sandstrom · 2014-06-26T08:45:21Z

I agree with @bridgway, would be useful on extract_pages too.

@knowtheory, what are your thoughts on this?

steverob · 2016-04-05T18:55:39Z

Would love to see this merged.

Cleaning up code Flatten the return value Make it not add the path to the return value if an exception-worthy event occurred. Instead, merely raise that exception Make text_extractor also return paths to processed files Make function extract_images always return array of image paths Refine specs Fix tests Add nil check Refactor tests to better isolate functionality remove debugger remove logger Add printf debugging Sanity checking Printfs Remove puts Remove annoying line Cleanup Fix unnecessary usage of ternary operation to 'wrap' an Array and replaced with Array() as it is more idiomatic revert to original

hderms · 2016-04-05T19:08:32Z

@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer

steverob · 2016-04-05T20:05:10Z

Thank you! :)

Regards
Steve Robinson

On 06-Apr-2016, at 12:38 AM, Dermot Haughey [email protected] wrote:

@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

KurtPreston reviewed Nov 12, 2012
View reviewed changes

antoinelyset reviewed Jun 3, 2013
View reviewed changes

sandstrom mentioned this pull request Jun 26, 2014

Fix for Issue #83: Leading Zeros #97

Open

hderms force-pushed the fix_issue_62 branch 2 times, most recently from aea6533 to 9789dd5 Compare April 5, 2016 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 62 #63

Fix issue 62 #63

hderms commented Nov 8, 2012

KurtPreston Nov 12, 2012

hderms commented Feb 22, 2013

KurtPreston commented Mar 27, 2013

antoinelyset Jun 3, 2013

hderms Jun 3, 2013

antoinelyset commented Jun 3, 2013

dmayer commented Aug 16, 2013

bridgway commented Aug 16, 2013

sandstrom commented Dec 10, 2013

antoinelyset commented Dec 10, 2013

sandstrom commented Jun 26, 2014

steverob commented Apr 5, 2016

hderms commented Apr 5, 2016

steverob commented Apr 5, 2016

Fix issue 62 #63

Are you sure you want to change the base?

Fix issue 62 #63

Conversation

hderms commented Nov 8, 2012

KurtPreston Nov 12, 2012

Choose a reason for hiding this comment

hderms commented Feb 22, 2013

KurtPreston commented Mar 27, 2013

antoinelyset Jun 3, 2013

Choose a reason for hiding this comment

hderms Jun 3, 2013

Choose a reason for hiding this comment

antoinelyset commented Jun 3, 2013

dmayer commented Aug 16, 2013

bridgway commented Aug 16, 2013

sandstrom commented Dec 10, 2013

antoinelyset commented Dec 10, 2013

sandstrom commented Jun 26, 2014

steverob commented Apr 5, 2016

hderms commented Apr 5, 2016

steverob commented Apr 5, 2016