-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 62 #63
base: master
Are you sure you want to change the base?
Fix issue 62 #63
Conversation
previous = size if @rolling | ||
end | ||
end | ||
case @return_value | ||
when :images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this return type should be called "image_paths" instead of "images"?
Fixed inaccurate text in body of pull request. Just to be clear I modified the return value behavior for every case I could find that was appropriate, causing functions to return an array of file paths to the extracted data rather than the intermediate PDF, as it was previously. |
Any idea when this might get merged in? This is a great feature. |
@@ -29,18 +29,23 @@ def initialize | |||
def extract(pdfs, opts) | |||
extract_options opts | |||
FileUtils.mkdir_p @output unless File.exists?(@output) | |||
pdfs = pdfs.is_a?(Array) ? pdfs : [pdfs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do :
pdfs = Array(pdfs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input. I was misinformed about the nature of Array()
👍 |
Is there anything one can do to help getting this merged? It is rather messy to write code around extract_images to determine the generated file names based on "reverse engineering" the naming scheme and doing string manupulations. Thanks! |
+1 @dmayer's commet. I also think it would useful to have a similar feature for extract_pages as well so one can easily determine the generate file names when a PDF is split into individual pages. Cheers |
ping @knowtheory @jashkenas |
If you're interested I did a ruby gem for this : |
I agree with @bridgway, would be useful on @knowtheory, what are your thoughts on this? |
Would love to see this merged. |
Cleaning up code Flatten the return value Make it not add the path to the return value if an exception-worthy event occurred. Instead, merely raise that exception Make text_extractor also return paths to processed files Make function extract_images always return array of image paths Refine specs Fix tests Add nil check Refactor tests to better isolate functionality remove debugger remove logger Add printf debugging Sanity checking Printfs Remove puts Remove annoying line Cleanup Fix unnecessary usage of ternary operation to 'wrap' an Array and replaced with Array() as it is more idiomatic revert to original
aea6533
to
9789dd5
Compare
@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer |
Thank you! :) Regards
|
This fixes an issue I opened on the main repo found at #62
The feature proposed adds a new option to extracting images via Docsplit.extract_images. If you pass an additional argument in the form :and_return => :images, you receive an array of the paths of extracted images as a return value, instead of the path to the intermediate PDF, in the case of powerpoints.
Example:
With a return value of:
The default return-value behavior of this method is preserved when that option is not specified, which is to return an array of PDFs that are returned by ensure_pdfs.
The justification for this feature is that I feel like users would benefit from getting the paths to the images created immediately upon calling the extract_images method. Having them determine these paths themselves seems out of spirit with the nature of this gem.