This is a Nix library that allows you to download files from the internet
without needing to provide an output hash. It even works in Nix's pure-eval
mode.
This library relies on Nix's support for SHA1, an unsafe hash function. It utilizes known SHA1 hash collisions in order to sneak single bits of data out of fixed-output derivations.
This library is comically inefficient, and should never be used in any actual codebase. But it is a fun trick!
This library provides an evilDownloadUrl
function, which takes a single URL
as an argument, and downloads the file.
WARNING:
evilDownloadUrl
is terribly inefficient. It may use up significant disk space, and DOS the site you're trying to download from. I don't recommend using it to download a file larger than 50 bytes or so (unless you really know what you're doing). The reasons for this inefficiency are explained in the next section.This section uses an example file that is only a few bytes long, so in general it should be safe to test out and play around with.
You can play around with this function in the Nix REPL:
$ nix repl ./nix
nix-repl> :b evilDownloadUrl "https://raw.githubusercontent.com/cdepillabout/small-example-text-files/177c95e490cf44bcc42860bf0652203d3dc87900/hello-world.txt"
This derivation produced the following outputs:
out -> /nix/store/jhyzz6l9ryjl1npdf4alqyi1fy2qx1f0-fetchBytes-6bba65f4567f4165109177a5dafd5972882643e15d454018586fed35b068acf5-12
You can confirm that this file actually contains the contents of the URL:
$ cat /nix/store/jhyzz6l9ryjl1npdf4alqyi1fy2qx1f0-fetchBytes-6bba65f4567f4165109177a5dafd5972882643e15d454018586fed35b068acf5-12
hello world
$ curl "https://raw.githubusercontent.com/cdepillabout/small-example-text-files/177c95e490cf44bcc42860bf0652203d3dc87900/hello-world.txt"
hello world
There is also a top-level default.nix
file that can be used to play around with
this function:
$ nix-build --argstr url "https://raw.githubusercontent.com/cdepillabout/small-example-text-files/177c95e490cf44bcc42860bf0652203d3dc87900/hello-world.txt"
$ cat ./result
hello world
You can also use the flake.nix
file to play around with this.
First, edit ./flake.nix
and replace url = "..."
with the URL you want to
download. Then, build the default package in the flake:
$ nix build
$ cat ./result
hello world
The neat (evil) thing about evilDownloadUrl
is that it even works in Nix's
pure-eval
mode. In theory, pure-eval
is supposed to require all downloaded files to
have a hash specified, but evilDownloadUrl
works around this limitation:
$ nix build --pure-eval
$ cat ./result
hello world
Due to the way this hack works, evilDownloadUrl
is extremely inefficient.
It performs one request to the URL for every bit (!!) of the file it is
trying to download. For instance, if you were trying to download a
10 byte file, evilDownloadUrl
would make 80 requests to the URL, and
download the file 80 times.
evilDownloadUrl
also makes a lot of garbage in your Nix store. Downloading a
50 byte file creates about 4MB of garbage in your Nix store. This scales
linearly, so for example a 100 byte would create about 8MB of garbage.
It is also very slow. Downloading a 50 byte file takes about 30 seconds on my machine.
While this is a fun library, you should never use this in any actual codebase.
After playing around with evilDownloadUrl
, you may want to clean up the
garbage in your Nix store. While you should always be able to use
nix-collect-garbage
to clean up your Nix store, you may want to specifically
only delete files created by evilDownloadUrl
.
First, make sure you don't have the result
output creating a GC root:
$ rm ./result
Then, delete files created by evilDownloadUrl
:
$ shopt -s nullglob # you may want to enable null globs in Bash to ignore globs that don't match
$ nix-store --delete /nix/store/*-bitvalue-* /nix/store/*BitNum-* /nix/store/*-fetchFileSize* /nix/store/*-fetchByte*
The evilDownloadUrl
function works by internally creating a fixed-output
derivation which outputs one of two known PDF files, both with the same SHA1 hash.
This fixed-output derivation is allowed to access the network, and outputs
one PDF file to represent a single 1
bit, and the other PDF file to represent a
single 0
bit. This effectively leaks one bit of information from the
internet in a non-reproducible manner.
evilDownloadUrl
combines many of these 1-bit-leaking fixed-output derivations
in order to download the entire specified file from the internet.
The next section is an introduction to Nix for a non-Nixer (or anyone that
needs a refresher), focusing on the concepts needed to explain how
evilDownloadUrl
works. The section after is a technical explanation
of how evilDownloadUrl
works.
This section introduces the Nix concepts required for understanding evil-nix
.
It does this mostly by drawing comparisons to other build tools, including
Docker. These will be rough comparisons, intending to give you an idea about
what is going on without having to dive head-first into the inner-workings of
Nix.
While these explanations are intended to give you some idea of what is going on, they may not be 100% completely technically accurate. If you're a long-time Nix user, you're recommended to just jump directly to the technical explanation in the following section.
The Nix ecosystem is comprised of quite a few different things:
-
Nix is a build tool and system daemon, somewhat similar to
docker build
anddockerd
. -
Nix is also a programming language, roughly comparable to the language used to write
Dockerfile
s (although Nix is more powerful and composable) -
There is a large set of packages defined using the Nix language, called Nixpkgs. Nixpkgs is somewhat similar to the Debian or Arch Linux package sets. Because of Nix's programability and composability, Nixpkgs feels much more flexible than other Linux distro's package sets.
-
There is a Linux distribution called NixOS, which uses the Nix programming language to define system settings (for example, the contents of files in
/etc
), and uses packages from Nixpkgs. NixOS feels like a cross between a normal Linux distribution and a tool like Ansible/Puppet/Chef.
In order to understand evil-nix
, we only need to look at Nix
the-programming-language and Nix the-build-tool.
The Nix programming language has a concept of a derivation. A derivation can
be thought of as a recipe to build a software package. It is roughly similar
to a Dockerfile
. Let's look at a simple derivation:
stdenv.mkDerivation {
name = "hello-2.12.1";
src = ./.;
nativeBuildInputs = [ gcc make ];
}
This is a derivation to build the
GNU Hello program. It declares its name
and version (name = "hello-2.12.1"
), and takes the input source code from the
current directory (src = ./.
). It declares two needed build tools, gcc
and
make
.
You may be wondering why there is no code explicitly running ./configure && make && make install
. The mkDerivation
function has internal support for
checking if there is an Automake build system, and automatically runs these
commands for us. Pretty convenient! Nix of course allows you to specify any
arbitrary build commands (similar to a Dockerfile
), but in this
case it is not needed.
If this derivation is saved to a file in the current directory called
hello.nix
(and the current directory also contains the source code for the
GNU Hello package), you should be able to build this derivation with a command
like the following:
$ nix-build ./hello.nix
This
./hello.nix
file is not a real derivation, and you can't save it to disk and run it as-is. It has been modified to be easier to understand for people new to Nix.If you're interesting in writing your first real Nix derivation, checkout one of the following tutorials:
This nix-build
tool passes off the derivation to a system daemon. The system
daemon starts up a sandbox using Linux namespaces and cgroups, and pulls in all
the declared build tools, system libraries, and source code. In our case, the
only build tools that have been declared and available in the sandbox are gcc
and make
. The system daemon then runs the specified build steps in the
sandbox environment (which in our case are ./configure && make && make install
). After running a build, the derivation is responsible for
outputting a file (or a directory containing multiple files). The output
file is is known as the output of the derivation (or just the "output"). The
output of a derivation is generally an ELF binary, HTML documentation, man
pages, etc.
The sandbox is a key element here. The Nix system daemon sandbox is very similar to the Docker daemon sandbox. However, the Nix system daemon goes one step further. It doesn't even allow network access. Since network access is not allowed during builds, you can be reasonably sure that your derivation is 100% reproducible. Regardless of what computer you run it on, it should always succeed or fail the same way.
At this point, you may have the question, "Well, that's good and all, but if I don't have network access, how do I download the source code I need to build? I don't want to always have to keep the source code I want to build in my current directory!"
To solve this problem, Nix provides a special type of derivation, called a fixed-output derivation. Here's an example of a fixed-output derivation:
fetchurl {
url = "https://ftp.gnu.org/pub/gnu/hello/hello-2.12.1.tar.gz";
sha256 = "sha256-jZkUKv2SV28wsM18tCqNxoCZmLxdYH2Idh9RLibH2yA=";
}
Fixed-output derivations are special in that they require the hash of the
output of the derivation to be specified in advance (the sha256 =
line above). In
exchange for specifying the hash, the Nix system daemon sandbox allows access
to the network. The above derivation is allowed to use curl
to download the
hello-2.12.1.tar.gz
file, and set it as the output of this fixed-output
derivation. This hello-2.12.1.tar.gz
file must have the SHA256 hash
sha256-jZkUKv2SV28wsM18tCqNxoCZmLxdYH2Idh9RLibH2yA=
.
Since the output hash is known, we can expect full 100% reproducibility of this derivation. If the derivation doesn't produce an output that exactly matches the hash, then the Nix system daemon will get angry and fail the build.
Just like non-fixed-output derivations, fixed-output derivations allow you to
specify any arbitrary build commands you want. The above fetchurl
function
is setup to use curl
, but you could potentially write a similar function that
internally uses wget
instead. Or maybe a derivation that uses ftp
to pull
a file from an FTP server.
You can see that the above derivation uses a SHA256 hash. Nix supports a few
different hash types, including SHA1. evil-nix
exploits the fact that SHA1
has known hash collisions.
In practice, the Nix language makes it very easy to combine multiple derivations together. For instance, the following Nix code is a normal derivation for GNU Hello, where the source code for GNU Hello is taken as the output of a fixed-output derivation:
stdenv.mkDerivation {
name = "hello-2.12.1";
src = fetchurl {
url = "https://ftp.gnu.org/pub/gnu/hello/hello-2.12.1.tar.gz";
sha256 = "sha256-jZkUKv2SV28wsM18tCqNxoCZmLxdYH2Idh9RLibH2yA=";
};
nativeBuildInputs = [ gcc make ];
}
This ability to programmatically and easily combine different derivations makes
Nix quite useful! (As an aside, both the gcc
and make
build inputs are
also just normal derivations, defined quite similarly to this GNU hello
derivation).
With this knowledge of normal derivations, fixed-output derivations, and
derivation outputs, you should be set to understand how evil-nix
exploits
fixed-output derivations.
The main trick in evilDownloadUrl
is a fixed-output derivation that returns
one bit of (non-hashed) data from the internet. This fixed-output derivation
works by first preparing two different output files. Let's call these output files
pdfA
and pdfB
.
These are special PDF files that have the same SHA1 hash. The hash of the fixed-output derivation is set to this SHA1 hash. This works because Nix still supports fixed-output derivations using SHA1 hashes (in the name of backwards compatibility).
This fixed-output derivation takes a URL and a bit index as input. It downloads
the input URL using curl
, and inspects the bit at the given input index of
the file. If the bit is a 1
, the fixed-output derivation sets pdfA
as the
output. If the bit is a 0
, it sets pdfB
as the output.
This is the critical trick. From an information-theoretic perspective, you would expect that a fixed-output derivation is not able to realistically produce any additional information that is not already accounted for in the hash of the output. However, by combining a fixed-output derivation and a hash function with known collisions, it is possible to sneak out a single bit of data by changing which PDF is output.
You can see what this fixed-output derivation looks like in the file
nix/evil/downloadBitNum.nix
. This
derivation is referred to as downloadBitNum
in the evil-nix
codebase.
downloadBitNum
is then wrapped with a simple (non-fixed-output) derivation
that inspects the output of downloadBitNum
. This simple derivation is
referred to as fetchBit
in the codebase.
fetchBit
inspects the output of downloadBitNum
and sees whether it matches
pdfA
or pdfB
. If pdfA
has been output, then fetchBit
will create a
single output file with the contents of an ASCII 1
character. If
downloadBitNum
has output pdfB
, then fetchBit
will create a single output
file with the contents of an ASCII 0
character.
You can see what fetchBit
looks like in the file
nix/evil/fetchBit.nix
.
fetchBit
is then repeated 8 times, and the subsequent outputs combined to
form a full byte of the input URL. This is done in the function fetchByte
,
which is defined in nix/evil/fetchByte.nix
.
fetchByte
is then repeated for each byte of the input file. This is done in
the function fetchBytes
, which is defined in
nix/evil/fetchBytes.nix
.
fetchBytes
outputs the full file we wanted to download from the input URL.
By utilizing fixed-output derivations with SHA1 collisions, we're able to download all the individual bits of the input URL, and carefully reassemble them to form the full file.
Due to the extreme inefficiency of evilDownloadUrl
, the main use-case is not
for Nix users, but actually for non-Nix users.
If you work in IT, I'm sure you have at least one coworker who is waaaaaaaaaay too into Nix. They likely bring up Nix in every conversation about your project's build system, CI, packaging, deployment, etc. You've probably heard them say the word "hermetic" at least 5 times in the last week.
The main use-case of evil-nix
is for you. Next time you hear your coworker
start to bring up Nix, hit them with "Eh, I heard Nix isn't that great. You
can trivially download unhashed files. Talk about lack of reproducibility,
haha"
Your coworker will likely start sputtering about sandboxes, unsafe hash functions, build purity, composability, etc. However, you can safely ignore them, and rest assured in your current build system's mishmash of Makefiles, Bash scripts, YAML files, and containers.
If your coworker still won't take the hint, suggest to them that they should learn a real build tool, like Docker.
No.
evil-nix
gives you a way to write derivations that are potentially less
reproducible, even in
pure-eval
mode (where you would expect that all downloaded files must be hashed).
However, reproducibility of Nix builds can be thwarted in many other ways
than just evil-nix
, so the techniques from evil-nix
are not something
to worry about in practice.
(You should, of course, be careful with evaluating any untrusted Nix code from the internet. You should be very careful with building any untrusted Nix derivations from the internet. And you should be extremely careful with running any untrusted binaries from the internet.)
Yes.
Nix currently supports many different hash types for fixed-output derivations, including insecure hash functions like MD5 and SHA1.
The technique used by evil-nix
relies on SHA1 collisions, but MD5
collisions could be used instead.
Does evilDownloadUrl
require import from derivation (IFD)?
No.
evilDownloadUrl
does currently makes use of IFD in order to read the length
of the file in bytes before downloading. However, it would be trivial to have
evilDownloadUrl
also take the file length as an input.
The end-user would have to specify the file length they want to download,
but then evilDownloadUrl
could work with the
--no-allow-import-from-derivation
option.
Nix provides a few built-in functions that enable you to download files
from the internet without needing to specify a hash. One example is
builtins.fetchTarball
.
The difference between builtins.fetchTarball
and evilDownloadUrl
is
that evilDownloadUrl
works even in Nix's
pure-eval
mode. If you try to use builtins.fetchTarball
in pure-eval mode without
specifying a hash, Nix will give you an error message.
Yes.
If Nix removed support for MD5 and SHA1 hashes for fixed-output
derivations, that would stop evilDownloadUrl
from working.
However, it appears that MD5 and SHA1 hashes are still supported
in the name of
backwards compatibility.
Here are two potential changes that could be made to Nix that would stop
evilDownloadUrl
from working, but wouldn't completely break backwards
compatibility:
-
Disallow MD5 and SHA1 hashes for fixed-output derivations in pure-eval mode.
If someone wanted to use a new version of Nix to evaluate old Nix code that contained MD5 or SHA1 hashes, they would have to turn off pure-eval mode. This seems like it could be a reasonable trade-off, especially since pure-eval mode is a relatively recent addition to Nix.
-
Completely disable MD5 and SHA1 support by default, and hide functionality behind a config option.
If someone wanted to use a new version of Nix to evaluate old Nix code that contained MD5 of SHA1 hashes, they would have to explicitly turn on the option that enables support for these weaker hash functions.
In practice, no recent Nix code uses MD5 or SHA1 hashes. I don't think I've ever seen an MD5 or SHA1 hash in Nix code in the wild, at least in the last 5 years or so.
Imagine you have a URL like http://example.com/random
that returns a
random number everytime it is called:
$ curl http://example.com/random
16
$ curl http://example.com/random
42
Is it possible have evilDownloadUrl
also return a completely random
number everytime it is called with this same URL?
Sort of.
If you run a command like the following, evilDownloadUrl
will
return the contents of the URL, and all the build artifacts will
be cached to the Nix store:
$ nix-build --argstr url "http://example.com/random"
$ cat ./result
16
If you have a friend run the same command on their computer, they will get
a different output (like 42
).
However, if you build it again on your own machine, since all the build outputs are already in the Nix store, you will get the same output as previously:
$ nix-build --argstr url "http://example.com/random"
$ cat ./result
16
One way to work around this is to collect the garbage in your Nix store, and re-run the build:
$ rm ./result
$ nix-collect-garbage
$ nix-build --argstr url "http://example.com/random"
$ cat ./result
77
Alternatively, you could modify evilDownloadUrl
to take an additional
argument that allows you to "bust" the cache:
$ rm ./result
$ nix-collect-garbage
$ nix-build --argstr url "http://example.com/random" --argstr cache-buster foo
$ cat ./result
48
$ nix-build --argstr url "http://example.com/random" --argstr cache-buster bar
$ cat ./result
3
This cache-busting value would need to be supplied by the end user.
As far as I can tell, @aszlig came up with the idea of using fixed-output derivations with hash collisions in order to impurely check whether or not given URLs can be downloaded. The first implementation was in this commit.
evil-nix
generalizes this technique to allow downloading arbitrary bits of
files from the internet. It then generalizes the approach even more to allow
downloading full files from the internet, without needing to specify the hash
of the file.
Thanks to @sternenseemann for originally linking me to the above commit, and suggesting it as a potential approach to this issue.