diff --git a/static/docs/command-reference/import.md b/static/docs/command-reference/import.md
index 4dca2bd27f..13f67c2748 100644
--- a/static/docs/command-reference/import.md
+++ b/static/docs/command-reference/import.md
@@ -3,7 +3,7 @@
Download or copy file or directory from any DVC project in a Git
repository (e.g. hosted on GitHub) into the workspace, and track
changes in this [external dependency](/doc/user-guide/external-dependencies).
-Creates a DVC-file.
+Creates a special DVC-file a.k.a _import stage_.
> See also `dvc get`, that corresponds to the first step this command performs
> (just download the data).
@@ -23,11 +23,11 @@ positional arguments:
DVC provides an easy way to reuse datasets, intermediate results, ML models, or
other files and directories tracked in another DVC repository into the
workspace. The `dvc import` command downloads such a data artifact
-in a way that it is tracked with DVC, so it can be updated when the external
-data source changes.
+in a way that it is tracked with DVC, so it can be updated when the data source
+changes.
The `url` argument specifies the address of the Git repository containing the
-external project. Both HTTP and SSH protocols are supported for
+source project. Both HTTP and SSH protocols are supported for
online repositories (e.g. `[user@]server:project.git`). `url` can also be a
local file system path to an "offline" repository.
@@ -35,31 +35,31 @@ The `path` argument of this command is used to specify the location of the data
to be downloaded within the source project. It should point to a data file or
directory tracked by that project – specified in one of the
[DVC-files](/doc/user-guide/dvc-file-format) of the repository at `url`. (You
-will not find these files directly in the source Git repository.) The source
+will not find these files directly in the external Git repository.) The source
project should have a default [DVC remote](/doc/command-reference/remote)
configured, containing them.)
> See `dvc import-url` to download and tack data from other supported URLs.
After running this command successfully, the imported data is placed in the
-current working directory with its original file name e.g. `data.txt`. An import
-stage (DVC-file) is then created extending the full file or directory name of
-the imported data e.g. `data.txt.dvc` – similar to having used `dvc run` to
-generate the same output.
+current working directory with its original file name e.g. `data.txt`. An
+_import stage_ (DVC-file) is then created, extending the full file or directory
+name of the imported data e.g. `data.txt.dvc` – similar to having used `dvc run`
+to generate the same output.
DVC supports DVC-files that refer to data in an external DVC repository (hosted
-on a Git server). In such a DVC-file, the `deps` section specifies the `repo`
-URL and data `path`, and the `outs` section contains the corresponding local
-path in the workspace. It records enough data from the external file or
-directory to enable DVC to efficiently check it to determine whether the local
-copy is out of date.
+on a Git server) a.k.a _import stages_. In such a DVC-file, the `deps` section
+specifies the `repo` URL and data `path`, and the `outs` section contains the
+corresponding local path in the workspace. It records enough data from the
+external file or directory to enable DVC to efficiently check it to determine
+whether the local copy is out of date.
To actually [track the data](https://dvc.org/doc/get-started/add-files),
-`git add` (and `git commit`) the import stage (DVC-file).
+`git add` (and `git commit`) the import stage.
Note that import stages are considered always "locked", meaning that if you run
`dvc repro`, they won't be updated. Use `dvc update` on them to update the
-downloaded data artifact from the external DVC repository.
+downloaded data artifact from the source DVC repository.
## Options
@@ -72,8 +72,10 @@ downloaded data artifact from the external DVC repository.
- `--rev` - specific
[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
(such as a branch name, a tag, or a commit hash) of the DVC repository to
- import the data from. The tip of the default branch is used by default when
- this option is not specified.
+ import the data from. The tip of the repository's default branch is used by
+ default when this option is not specified. Note that this adds a `rev` field
+ in the import stage that fixes it to this revision. This can impact the
+ behavior of `dvc update`.
- `-h`, `--help` - prints the usage/help message, and exit.
@@ -120,3 +122,35 @@ outs:
Several of the values above are pulled from the original stage file
`model.pkl.dvc` in the external DVC repo. `url` and `rev_lock` fields are used
to specify the origin and version of the dependency.
+
+## Example: fixed revisions & re-importing
+
+When the `--rev` option is used, the import stage
+([DVC-file](/doc/user-guide/dvc-file-format)) will include a `rev` field under
+`repo` like this:
+
+```yaml
+deps:
+ - path: data/data.xml
+ repo:
+ url: git@github.com:iterative/dataset-registry.git
+ rev: cats-dogs-v1
+ rev_lock: 0547f5883fb18e523e35578e2f0d19648c8f2d5c
+```
+
+If the Git revision moves, such as a branch, this doesn't have much of an effect
+on the import/update workflow. However, for static refs such as tags (unless
+manually updated), or for SHA commits, `dvc update` will not have any effect on
+the import. In this cases, in order to actually "update" an import, it's
+necessary to **re-import the data** instead, by using `dvc import` again without
+or with a different `--rev`. For example:
+
+```dvc
+$ dvc import --rev master \
+ git@github.com:iterative/dataset-registry.git \
+ use-cases/cats-dogs
+```
+
+This will overwrite the import stage (DVC-file) either removing or replacing the
+`rev` field. This can produce an import stage that is able to be updated
+normally with `dvc update` going forward.
diff --git a/static/docs/command-reference/update.md b/static/docs/command-reference/update.md
index 9c99f92125..c7c8df6f44 100644
--- a/static/docs/command-reference/update.md
+++ b/static/docs/command-reference/update.md
@@ -1,6 +1,6 @@
# update
-Update data artifacts imported from other DVC repositories.
+Update data artifacts imported from external DVC repositories.
## Synopsis
@@ -15,16 +15,24 @@ positional arguments:
After creating import stages
([DVC-files](/doc/user-guide/dvc-file-format)) with `dvc import` or
-`dvc import-url`, the external data source can change. Use `dvc update` to bring
-these imported file, directory, or data artifact up to date.
+`dvc import-url`, the data source can change. Use `dvc update` to bring these
+imported file, directory, or data artifact up to date.
+
+To indicate which import stages to update, we must specify the corresponding
+DVC-file `targets` as command arguments.
Note that import stages are considered always "locked", meaning that if you run
`dvc repro`, they won't be updated. `dvc update` is the only command that can
-update them. Also, for `dvc import` DVC-files, the `rev_lock` field is updated
-by `dvc update`.
+update them. Also, for `dvc import` import stages, the `rev_lock` field is
+updated by `dvc update`.
-To indicate which import stages to update, we must specify the corresponding
-DVC-file `targets` as command arguments.
+Another detail to note is that when the `--rev` (revision) option of
+`dvc import` has been used to create an import stage, DVC is not aware of what
+kind of
+[Git revision](https://git-scm.com/book/en/v2/Git-Internals-Git-References) this
+is, for example a branch or a tag. For static refs such as tags (unless manually
+updated), or for SHA commits, `dvc update` will not have any effect on the
+import.
## Options
@@ -60,4 +68,7 @@ Output 'model.pkl' didn't change. Skipping saving.
Saving information to 'model.pkl.dvc'.
```
-This time nothing has changed, since the source repository is rather stable.
+This time nothing has changed, since the source project is rather
+stable.
+
+> Refer to this [re-importing example]() for
diff --git a/static/docs/use-cases/data-registry.md b/static/docs/use-cases/data-registry.md
index 842adb6c3a..e2b7bb79ec 100644
--- a/static/docs/use-cases/data-registry.md
+++ b/static/docs/use-cases/data-registry.md
@@ -47,22 +47,24 @@ containing 2800 images of cats and dogs. We partitioned the dataset in two for
our [Versioning Tutorial](/doc/tutorials/versioning), and backed up the parts on
a storage server, downloading them with `wget` in our examples. This setup was
then revised to download the dataset with `dvc get` instead, so we created the
-[dataset-registry](https://github.com/iterative/dataset-registry)) project, a
+[dataset-registry](https://github.com/iterative/dataset-registry)) repository, a
DVC project hosted on GitHub, to version the dataset (see its
[`tutorial/ver`](https://github.com/iterative/dataset-registry/tree/master/tutorial/ver)
directory).
-However, there are a few problems with the way this dataset is structured (in 2
-parts). Most importantly, this single dataset is tracked by 2 different
+However, there are a few problems with the way this dataset is structured. Most
+importantly, this single dataset is tracked by 2 different
[DVC-files](/doc/user-guide/dvc-file-format), instead of 2 versions of the same
one, which would better reflect the intentions of this dataset... Fortunately,
we have also prepared an improved alternative in the
[`use-cases/`](https://github.com/iterative/dataset-registry/tree/master/use-cases)
directory of the same repository.
-As step one, we extracted the first part of the dataset into the
-`use-cases/cats-dogs` directory (illustrated below), and ran dvc add
-use-cases/cats-dogs
to
+To create a
+[first version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases)
+of our dataset, we extracted the first part into the `use-cases/cats-dogs`
+directory (illustrated below), and ran dvc add use-cases/cats-dogs
+to
[track the entire directory](https://dvc.org/doc/command-reference/add#example-directory).
```dvc
@@ -77,14 +79,11 @@ use-cases/cats-dogs
└── dogs [400 image files]
```
-This first version uses the
-[`cats-dogs-v1`](https://github.com/iterative/dataset-registry/tree/cats-dogs-v1/use-cases)
-Git tag. In a local DVC project, we can obtain this dataset with the following
-command (note the usage of `--rev`):
+In a local DVC project, we could have obtained this dataset at this point with
+the following command:
```dvc
-$ dvc import --rev cats-dogs-v1 \
- git@github.com:iterative/dataset-registry.git \
+$ dvc import git@github.com:iterative/dataset-registry.git \
use-cases/cats-dogs
```
@@ -92,18 +91,37 @@ $ dvc import --rev cats-dogs-v1 \
> always needs to run from an [initialized](/doc/command-reference/init) DVC
> project.
+
+
+### Expand for actionable command (optional)
+
+The command above is meant for informational purposes only. If you actually run
+it in a DVC project, although it should work, it will import the latest version
+of `use-cases/cats-dogs` from `dataset-registry`. The following command would
+actually bring in the version in question:
+
+```dvc
+$ dvc import --rev cats-dogs-v1 \
+ git@github.com:iterative/dataset-registry.git \
+ use-cases/cats-dogs
+```
+
+See the `dvc import` command reference for more details on the `--rev`
+(revision) option.
+
+
+
Importing keeps the connection between the local project and data registry where
we are downloading the dataset from. This is achieved by creating a special
-DVC-file (a.k.a. an _import stage_) – which can be used for versioning the
-import with Git in the local project. This connection will come in handy when
-the source data changes, and we want to obtain these updates...
+DVC-file (a.k.a. _import stage_) – that can be used for versioning the import
+with Git. This connection will come in handy when the source data changes, and
+we want to obtain these updates...
-Back in our **dataset-registry** repository, the second (and last) version of
-our dataset exists under the
-[`cats-dogs-v2`](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases)
-tag. It was created by extracting the second part of the dataset, with 1000
-additional images (500 cats, 500 dogs) in the same directory structure, and
-simply running dvc add use-cases/cats-dogs
again.
+Back in our **dataset-registry** repository, a
+[second version](https://github.com/iterative/dataset-registry/tree/cats-dogs-v2/use-cases)
+of our dataset was created by extracting the second part, with 1000 additional
+images (500 cats, 500 dogs), into the same directory structure. Then, we simply
+ran dvc add use-cases/cats-dogs
again.
In our local project, all we have to do in order to obtain this latest version
of the dataset is to run:
@@ -112,6 +130,25 @@ of the dataset is to run:
$ dvc update cats-dogs.dvc
```
+
+
+### Expand for actionable command (optional)
+
+As with the previous hidden note, actually trying the commands above should
+produced the expected results, but not for obvious reasons. Specifically, the
+initial `dvc import` command would have already obtained the latest version of
+the dataset (as noted before), so this `dvc update` is unnecessary and won't
+have an effect.
+
+If you ran the `dvc import --rev cats-dogs-v1 ...` command instead, its import
+stage (DVC-file) would be fixed to that Git tag (`cats-dogs-v1`). In order to
+update it, do not use `dvc update`. Instead, re-import the data by using the
+original import command (without `--rev`). Refer to
+[this example](http://localhost:3000/doc/command-reference/import#example-fixed-revisions-re-importing)
+for more information.
+
+
+
This downloads new and changed files in `cats-dogs/` from the source project,
and updates the metadata in the import stage DVC-file.