Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial zlib uncompress, compress support. Based partially on #553. #1206

Merged
merged 1 commit into from
Aug 28, 2012

Conversation

kmsquire
Copy link
Member

RIght now, this only includes compress and uncompress functions. These are a "high-level" interface in zlib. @choffstein had originally started writing a wrapper in C around these functions, but it turns out they are easily callable directly from Julia. Lower level C functions offering more control of the compression/decompression process are available, but require passing of structs, and therefore may be more challenging to interact with without a C wrapper.

I've exposed a number of zlib.h internal constants. Most of these are not used by compress/uncompress and related functions. Some are used by the gzip related functions, and some are used by lower-level functions that I'm not planning to implement right now (though someone else should feel free to if they have a need). I do have julia wrapper functions for almost all of the gzip file functions, along with an IOStream-like interface. I'll send a pull request for that once this gets through comments (or package it separately in its own repo).

Question of the day: this depends on zlib, another C library. It's a pretty core library in many places, but is it something you want as part of the main distribution, or should it be put into a separate module?

Kevin

@kmsquire
Copy link
Member Author

(I should add that I couldn't get his C wrapper code to work. Julia finds libz.so fine, but the c-wrapper code couldn't see any of the functions. I never did figure out why, but since these functions work fine in Julia, I didn't worry about it too much.)

Kevin


export compress, uncompress

_jl_zlib = dlopen("libz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can drop the _jl_ in front of library handles, now that it is in a module?

@ViralBShah
Copy link
Member

Does it make sense to have a CompressedByteArray type, and have convert methods as well?

@kmsquire
Copy link
Member Author

Make sense, and that was in #553 as well. I'll carry it over and address the other issues you brought up.

Also should note the discussion at #548 here as well, which I noticed but forgot to read.

@kmsquire
Copy link
Member Author

This should be ready for review.

Discussion on CompressedByteArray and related things here: https://groups.google.com/d/topic/julia-dev/PlZjZRBARck/discussion

@kmsquire
Copy link
Member Author

Finally figured out how to squash commits and have that appear on github!

@ViralBShah
Copy link
Member

Is it possible to have an iscompressed function that can tell whether a particular input is compressed or not?

@ViralBShah
Copy link
Member

The doubling of the allocated result in uncompress seems to be something that can lead to unforeseen performance issues. The uncompress routine should error out when given a size value that is too small (and not shrink if it is too large either). Without the destination array as an input argument, it could grow by powers of 2, and shrink eventually.

This may be a larger issue over and above the basic zlib interface - whether we should have some kind of a header in the compressed array that saves the size of the uncompressed data.

@kmsquire
Copy link
Member Author

On Saturday, August 25, 2012, Viral B. Shah wrote:

Is it possible to have an iscompressed function that can tell whether a
particular input is compressed or not?

I looked into that, and beyond creating our own format, the only way to
really tell if a buffer is compressed is to try to decompress it. Gzip uses
the same compression format and has a header, and I think a magic number,
that let you tell if the file is gzip compressed. But the basic
compression does not.

I can fix up the "CompressedByteArray" branch and submit that as a separate
pull, and the merits can be discussed over code, rather than the previous
discussion in the abstract. I guess it really comes down to a matter of
philosophy.

Kevin

@kmsquire
Copy link
Member Author

The doubling of the allocated result in uncompress seems to be something
that can lead to unforeseen performance issues. The uncompress routine in
my should error out when given a size value that is too small (and not
shrink if it is too large either). Without the destination array as an
input argument, it could grow by powers of 2, and shrink eventually.

This may be a larger issue over and above the basic zlib interface, if we
should have some kind of a header in a compressed array that saves the size
of the uncompressed data.

I should be able to do that. I think the best way is to create a separate
function, uncompress_to_buf, which does exactly that, and returns the
number of bytes uncompressed to the buffer (and errors if the buffer is too
small).

In reality, we need to wrap the streaming zlib interface, which will
decompress until the given buffer is full, and give a return value
indicating that there's more data available. It's a bit more flexible, and
could be hooked into the io framework or any of the streaming/serializing
interfaces people have been talking about. I don't have time to look into
it right now, though (and I was hoping that someone would develop a better
method for creating c-structs in julia (the streaming interface needs it,
the current one doesn't)).

Kevin

@kmsquire
Copy link
Member Author

@ViralBShah, let me know what you think of this and what else might be needed.

@kmsquire
Copy link
Member Author

One thought: when building zlib on a 64-bit machine, it would be nice (though not necessary) to specify -64. Is there an easy way of knowing if we're on a 64-bit machine? (OpenBLAS seems to have some tests, but they seemed a little too specific.)

@ViralBShah
Copy link
Member

I think we are in reasonable shape for a first cut. As we start using it more, we will have more clarity on the higher level interfaces (CompressedArray) and lower-level zlib functions.

As for building, doesn't zlib configure detect it is on a 64-bit machine? What are the benefits of -64?

@kmsquire
Copy link
Member Author

On Sun, Aug 26, 2012 at 12:10 AM, Viral B. Shah [email protected]:

I think we are in reasonable shape for a first cut. As we start using it
more, we will have more clarity on the higher level interfaces
(CompressedArray) and lower-level zlib functions.

As for building, doesn't zlib configure detect it is on a 64-bit machine?
What are the benefits of -64?

Ok, that should have been -m64, and it's not actually necessary. So, carry
on... :-)

Kevin

@kmsquire
Copy link
Member Author

BTW, I'm in the process of finalizing/documenting this. One final change I plan to make is to split out the header stuff (similar to glpk_h.jl), so that I can use it in a separate gzip module.

@kmsquire
Copy link
Member Author

Anything else? Should I squash this to one commit?

@ViralBShah
Copy link
Member

I am ok with merging this. Unless anyone says otherwise, I will merge it in a day or so.

@kmsquire
Copy link
Member Author

Okay, really done now. I switched the types to the expected corresponding julia types, and added a test to make sure they match the compile types of zlib. The functions are a bit cleaner this way, IMHO.

@pao
Copy link
Member

pao commented Aug 28, 2012

I think this makes more sense as a single, squashed commit. I don't think the refactoring history will be much use in master.

@kmsquire
Copy link
Member Author

Done!

StefanKarpinski added a commit that referenced this pull request Aug 28, 2012
Initial zlib uncompress, compress support.  Based partially on #553.
@StefanKarpinski StefanKarpinski merged commit 43ab984 into JuliaLang:master Aug 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants