Skip to content

Commit

Permalink
get: add example about checking out different artifact versions
Browse files Browse the repository at this point in the history
per #611 (comment)
but for #487
  • Loading branch information
jorgeorpinel committed Sep 14, 2019
1 parent af424ca commit 6fd5104
Show file tree
Hide file tree
Showing 2 changed files with 70 additions and 7 deletions.
2 changes: 1 addition & 1 deletion static/docs/commands-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ diff for 'data/features'
0 files deleted, size was increased by 2.9 MB
```

## Examples: Confirming that a target has not changed
## Example: Confirming that a target has not changed

Let's use our example repo once again, which has several
[available tags](https://github.com/iterative/example-get-started/tags) for
Expand Down
75 changes: 69 additions & 6 deletions static/docs/commands-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,16 @@ created in the current working directory, with its original file name.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples
## Example: Machine learning model deployment

> Note that `dvc get` can be used form anywhere in the file system, as long as
> Note that `dvc get` can be used from anywhere in the file system, as long as
> DVC is [installed](/doc/get-started/install).
We can use `dvc get` to download the resulting model file from our
[get started example repo](https://github.com/iterative/example-get-started),
which is a DVC project external to the current working directory). The desired
file is located in the root of the external repo, and named `model.pkl`.
which is a DVC repository external to the current working directory). The
desired file is tracked in the root of the external <abbr>project</abbr>, and
named `model.pkl`.

```dvc
$ dvc get https://github.com/iterative/example-get-started model.pkl
Expand All @@ -72,8 +73,8 @@ model.pkl
```

Note that the `model.pkl` file doesn't actually exist in the
[data directory](https://github.com/iterative/example-get-started/tree/master/)
of the external Git repo. Instead, the corresponding DVC-file
[root directory](https://github.com/iterative/example-get-started/tree/master/)
of the external Git repository. Instead, the corresponding DVC-file
[train.dvc](https://github.com/iterative/example-get-started/blob/master/train.dvc)
is found, which specifies `model.pkl` in its outputs (`outs`). DVC then
[pulls](/doc/commands-reference/pull) the file from the default
Expand All @@ -91,3 +92,65 @@ can be automated leveraging DVC with
The same example applies to raw or intermediate data files as well, of course,
for cases where we want to download those files and perform some analysis on
them.

## Example: Compare different versions of the same experiment

`dvc get` has the `--rev` option, to specify which version of the repository to
download a <abbr>data artifact</abbr> from. It also has the `--out` option to
specify the file or directory path and file name for the download. Combining
these two options allows us to do something we can't achieve with the regular
`git checkout` + `dvc checkout` process – see for example the
[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get
Started_ section.

Let's use the
[get started example repo](https://github.com/iterative/example-get-started)
again, like in the previous example. But this time, clone it first to see
`dvc get` in action inside a <abbr>DVC project</abbr>.

```dvc
$ git clone [email protected]:iterative/example-get-started.git
$ cd example-get-started
```

If you are familiar with our [Get Started](/doc/get-started) example, you may
know that each chapter has a corresponding
[tag](https://github.com/iterative/example-get-started/tags). Tag `7-train` is
where we train a first version of the example model, and tag `9-bigrams-model`
has an improved model (trained using bigrams). What if we wanted to have both
versions of the model "checked out" at the same time? `dvc get` provides an easy
way to do this:

```dvc
$ dvc get . model.pkl --rev 7-train --out model.monograms.pkl
```

> Notice that the `url` provided to `dvc get` above is `.`. `dvc get` accepts
> file system paths as a "URL" to the repository to get the data from for edge
> cases.
The `model.monograms.pkl` file now contains the older version of the model. To
get the most recent one, we use a similar command, but with

`-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev`
(since it's the latest version anyway). In fact in this case using `dvc pull`
should suffice, downloading the file as just `model.pkl`, which we can then
rename to make it extra obvious:

```dvc
$ dvc pull train.dvc
$ mv model.pkl model.bigrams.pkl
```

And that's it! Now we have both model files in the <abbr>workspace</abbr>, with
different names, and not currently tracked by Git:

```dvc
$ git status
...
Untracked files:
(use "git add <file>..." to include in what will be committed)
model.bigrams.pkl
model.monograms.pkl
```

0 comments on commit 6fd5104

Please sign in to comment.