Skip to content

Commit

Permalink
get: add example about checking out different artifact versions
Browse files Browse the repository at this point in the history
per #611 (comment)
but for #487
  • Loading branch information
jorgeorpinel committed Sep 14, 2019
1 parent af424ca commit 8e9e947
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 7 deletions.
2 changes: 1 addition & 1 deletion static/docs/commands-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ diff for 'data/features'
0 files deleted, size was increased by 2.9 MB
```

## Examples: Confirming that a target has not changed
## Example: Confirming that a target has not changed

Let's use our example repo once again, which has several
[available tags](https://github.com/iterative/example-get-started/tags) for
Expand Down
72 changes: 66 additions & 6 deletions static/docs/commands-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,16 @@ created in the current working directory, with its original file name.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples
## Example: Machine learning model deployment

This comment was marked as resolved.

Copy link
@shcheklein

shcheklein Sep 14, 2019

Member

let's use something more generic - Getting (or accessing? or downloading) model or data

Deployment scenario is too specific

This comment was marked as resolved.

Copy link
@jorgeorpinel

jorgeorpinel Sep 17, 2019

Author Contributor

Replying in #585 (review).


> Note that `dvc get` can be used form anywhere in the file system, as long as
> Note that `dvc get` can be used from anywhere in the file system, as long as
> DVC is [installed](/doc/get-started/install).
We can use `dvc get` to download the resulting model file from our
[get started example repo](https://github.com/iterative/example-get-started),
which is a DVC project external to the current working directory). The desired
file is located in the root of the external repo, and named `model.pkl`.
which is a DVC repository external to the current working directory). The
desired file is tracked in the root of the external <abbr>project</abbr>, and
named `model.pkl`.

```dvc
$ dvc get https://github.com/iterative/example-get-started model.pkl
Expand All @@ -72,8 +73,8 @@ model.pkl
```

Note that the `model.pkl` file doesn't actually exist in the
[data directory](https://github.com/iterative/example-get-started/tree/master/)
of the external Git repo. Instead, the corresponding DVC-file
[root directory](https://github.com/iterative/example-get-started/tree/master/)
of the external Git repository. Instead, the corresponding DVC-file
[train.dvc](https://github.com/iterative/example-get-started/blob/master/train.dvc)
is found, which specifies `model.pkl` in its outputs (`outs`). DVC then
[pulls](/doc/commands-reference/pull) the file from the default
Expand All @@ -91,3 +92,62 @@ can be automated leveraging DVC with
The same example applies to raw or intermediate data files as well, of course,
for cases where we want to download those files and perform some analysis on
them.

## Example: Compare different versions of the same experiment

This comment was marked as resolved.

Copy link
@shcheklein

shcheklein Sep 14, 2019

Member

Compare different versions of data or model?

This comment was marked as resolved.

Copy link
@jorgeorpinel

jorgeorpinel Sep 17, 2019

Author Contributor

Replied in #585 (review).


`dvc get` has the `--rev` option, to specify which version of the repository to
download a <abbr>data artifact</abbr> from. It also has the `--out` option to
specify the file or directory path and file name for the download. Combining
these two options allows us to do something we can't achieve with the regular
`git checkout` + `dvc checkout` process – see for example the
[Get Older Data Version](/doc/get-started/older-versions) chapter of our _Get
Started_ section.

Let's use the
[get started example repo](https://github.com/iterative/example-get-started)
again, like in the previous example. But this time, clone it first to see
`dvc get` in action inside a <abbr>DVC project</abbr>.

```dvc
$ git clone [email protected]:iterative/example-get-started.git
$ cd example-get-started
```

If you are familiar with our [Get Started](/doc/get-started) example, you may
know that each chapter has a corresponding
[tag](https://github.com/iterative/example-get-started/tags). Tag `7-train` is
where we train a first version of the example model, and tag `9-bigrams-model`
has an improved model (trained using bigrams). What if we wanted to have both
versions of the model "checked out" at the same time? `dvc get` provides an easy
way to do this:

```dvc
$ dvc get [email protected]:iterative/example-get-started.git model.pkl \

This comment was marked as resolved.

Copy link
@shcheklein

shcheklein Sep 14, 2019

Member

since we cloned it, can we do dvc get . --rev ...?

This comment was marked as resolved.

Copy link
@jorgeorpinel

jorgeorpinel Sep 17, 2019

Author Contributor

Replied in #585 (review).

--rev 7-train --out model.monograms.pkl
```

The `model.monograms.pkl` file now contains the older version of the model. To
get the most recent one, we use a similar command, but with

`-o model.bigrams.pkl` and `--rev 9-bigrams-model` or even without `--rev`
(since it's the latest version anyway). In fact in this case using `dvc pull`
should suffice, downloading the file as just `model.pkl`, which we can then
rename to make it extra obvious:

```dvc
$ dvc pull train.dvc
$ mv model.pkl model.bigrams.pkl
```

And that's it! Now we have both model files in the <abbr>workspace</abbr>, with
different names, and not currently tracked by Git:

```dvc
$ git status
...
Untracked files:
(use "git add <file>..." to include in what will be committed)
model.bigrams.pkl
model.monograms.pkl
```

0 comments on commit 8e9e947

Please sign in to comment.