-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for extracting hard links #102
Conversation
Codecov Report
@@ Coverage Diff @@
## master #102 +/- ##
==========================================
+ Coverage 96.30% 96.47% +0.17%
==========================================
Files 4 4
Lines 649 681 +32
==========================================
+ Hits 625 657 +32
Misses 24 24
Continue to review full report at Codecov.
|
For posterity, here's an untested cut at checking that a hardlink is safe in full generality: diff --git a/src/extract.jl b/src/extract.jl
index 0d72af0..8593f8a 100644
--- a/src/extract.jl
+++ b/src/extract.jl
@@ -351,6 +351,60 @@ function read_tarball(
path = isempty(path) ? part : "$path/$part"
end
paths[path] = hdr.type == :symlink ? hdr.link : hdr.type
+ # check that hard links refer to already-seen files
+ if hdr.type == :hardlink
+ link_parts = split(hdr.link, '/')
+ target_parts = String[]
+ while !isempty(link_parts)
+ part = popfirst!(link_parts)
+ (isempty(part) || part == ".") && continue
+ if part == ".."
+ isempty(target) && error("""
+ Refusing to extract hardlink with external target
+ * path: $(repr(hdr.path))
+ * hardlink: $(repr(hdr.link))
+ """)
+ pop!(target)
+ continue
+ end
+ target = join(target_parts, '/')
+ type = get(paths, target, nothing)
+ type === nothing && error("""
+ Refusing to extract hardlink with non-existent target
+ * path: $(repr(hdr.path))
+ * hardlink: $(repr(hdr.link))
+ """)
+ if !isempty(link_parts)
+ # prefix of final link target
+ if type == :symlink
+ link = paths[target]
+ link[1] == '/' && error("""
+ Refusing to extract hardlink with absolute symlink prefix
+ * path: $(repr(hdr.path))
+ * hardlink: $(repr(hdr.link))
+ """)
+ prepend!(link_parts, split(link, '/'))
+ elseif type != :directory
+ error("""
+ Refusing to extract hardlink with $type prefix
+ * path: $(repr(hdr.path))
+ * hardlink: $(repr(hdr.link))
+ """)
+ end
+ else
+ # final link target
+ type == :file || error("""
+ Refusing to extract hardlink to a $type
+ * path: $(repr(hdr.path))
+ * hardlink: $(repr(hdr.link))
+ """)
+ if hdr.link != target
+ hdr = Header(hdr; link=target)
+ end
+ end
+ end
+ end
+ # apply callback, checking that it consumes IO correctly
before = applicable(position, tar) ? position(tar) : 0
callback(hdr, split(path, '/', keepempty=false))
applicable(position, tar) || continue I've decided that this is too complex and not worth it. For a first cut, I will add support for extracting hard links but with the restriction that they can only refer to (already-extracted) file paths. I suspect this will be what actually occurs 99.99% of the time. What the above complex logic allows for is a hard link which refers to an already-extracted file via some already-extracted symlink prefix, which would work but makes it much harder to ensure that something bad isn't happening. |
d7c1233
to
a9140a8
Compare
Thanks, works great. The restriction doesn't pose a problem for the tarballs I'm using. |
I would be surprised if there's any way to get command-line |
f9bbb4e
to
411e8cf
Compare
Finally, this works. |
This adds support for hardlinks, including: - extracting them by copying the linked file (no hardlink created) - tree hashing them as they are extracted - rewriting by duplicating the linked file This only supports hardlinks whose target is a plain file that has already been seen in the tarball that is being processed. You cannot have a hardlink that appears before the file that is linked. If the target of a hardlink is overwritten later, the link copies the current version of the file at the time of extraction. Tree hashing and rewrite are both consistent with this behavior. It is not supported to extract hardlinks where the link involves symlinks, even if the link refers to a path that would be a file — the target must be a plain file. Close #101.
411e8cf
to
c4b28fb
Compare
It turns out that implementing hard links is easy: you just copy the hard link target instead of creating the file directly. The tricky part is making sure that a malicious tarball doesn't do something dangerous with a combination of hard links and symlinks, and this can get very tricky indeed. Closes #101.
I'm also including a couple of unrelated helper changes here:
Tar.list
with callback: allow passing raw header data — this is useful for analyzing wild tarballsHeader(hdr)
constructor with keywords to modifyhdr
— I need to copy a header and this is convenient