From 5575a43d3cf5508560f3eab8e754f9c401e4ed07 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 28 Jun 2019 19:56:22 -0500 Subject: [PATCH 01/45] import-url: finish changing old references to `import`, now `import-url` --- static/docs/commands-reference/import-url.md | 14 +++++++------- static/docs/commands-reference/remote_add.md | 2 +- static/docs/user-guide/dvc-file-format.md | 2 +- static/docs/user-guide/external-dependencies.md | 6 +++++- 4 files changed, 14 insertions(+), 10 deletions(-) diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 0e99208b6d..2a732e3c2e 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -27,12 +27,12 @@ Examples: DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to an external data location, see -[External Dependencies](/doc/user-guide/external-dependencies). In such a DVC -file, the `deps` section specifies a remote URL, and the `outs` section lists -the corresponding local path in the workspace. It records enough data from the -remote file or directory to enable DVC to efficiently check it to determine if -the local copy is out of date. DVC uses this remote URL to download the data to -the workspace initially, and to re-download it upon changes. +[External Dependencies](/doc/user-guide/external-dependencies). In such a +DVC-file, the `deps` section specifies a remote URL, and the `outs` section +lists the corresponding local path in the workspace. It records enough data from +the remote file or directory to enable DVC to efficiently check it to determine +if the local copy is out of date. DVC uses this remote URL to download the data +to the workspace initially, and to re-download it upon changes. The `dvc import-url` command helps the user create such an external data dependency. The `url` argument should provide the location of the data to be @@ -87,7 +87,7 @@ from having to install CLI tools for each service. When DVC inspects a DVC-file, its dependencies will be checked to see if any have changed. A changed dependency will appear in the `dvc status` report, -indicating the need to reproduce this import stage. When DVC inspects an +indicating the need to reproduce this imported stage. When DVC inspects an external dependency, it uses a method appropriate to that dependency to test its current status. diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index feda3b9e55..441aca2c84 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -260,7 +260,7 @@ $ dvc remote add myremote hdfs://user@example.com/path/to/dir > > - `pull` > - `fetch` -> - `import` +> - `import-url` > - As an [external dependency](/doc/user-guide/external-dependencies) ```dvc diff --git a/static/docs/user-guide/dvc-file-format.md b/static/docs/user-guide/dvc-file-format.md index dff021b3f6..6fcb09c550 100644 --- a/static/docs/user-guide/dvc-file-format.md +++ b/static/docs/user-guide/dvc-file-format.md @@ -34,7 +34,7 @@ outs: locked: True # Comments like this line persist through multiple executions of -# dvc repro/commit but not through dvc run/add/import commands. +# dvc repro/commit but not through dvc run/add/import-url commands. meta: # Special key to contain arbitary user data name: John diff --git a/static/docs/user-guide/external-dependencies.md b/static/docs/user-guide/external-dependencies.md index 045a9e27f0..85d4c96215 100644 --- a/static/docs/user-guide/external-dependencies.md +++ b/static/docs/user-guide/external-dependencies.md @@ -28,6 +28,10 @@ $ dvc run -d /home/shared/data.txt \ cp /home/shared/data.txt data.txt ``` +> ```yml +> TODO: What does the DVC-file looks like? +> ``` + ### Amazon S3 ```dvc @@ -90,7 +94,7 @@ $ dvc run -d remote://example/data.txt \ Please refer to `dvc remote add` for more details like setting up access credentials for certain remotes. -## Using import +## Using import-url In the previous command examples, downloading commands were used: `aws s3 cp`, `scp`, `wget`, etc. `dvc import-url` simplifies the downloading part for all the From c8b37399c5d4aeab16ab19c0daf3fd9620541a29 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 28 Jun 2019 22:55:58 -0500 Subject: [PATCH 02/45] term: review and increment usage of "protocol" (in the context of DVC remotes and remote locations) --- static/docs/commands-reference/import-url.md | 8 ++++---- static/docs/commands-reference/pull.md | 2 +- static/docs/commands-reference/push.md | 2 +- static/docs/commands-reference/remote_add.md | 4 +++- static/docs/commands-reference/remote_modify.md | 2 ++ static/docs/get-started/configure.md | 6 +++--- static/docs/user-guide/external-dependencies.md | 4 ++-- static/docs/user-guide/external-outputs.md | 4 ++-- 8 files changed, 18 insertions(+), 14 deletions(-) diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 2a732e3c2e..27151ed777 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -1,8 +1,8 @@ # import-url -Import file from any supported URL (it could be `http://`, as well as `s3://`, -`ssh://`, and other supported external storage URLs) or local directory to local -workspace and track changes in remote file or directory. +Import file from any supported URL (for example `http://`, `s3://`, `ssh://`, or +other supported protocols) or local directory to local workspace and track +changes in remote file or directory. ## Synopsis @@ -39,7 +39,7 @@ dependency. The `url` argument should provide the location of the data to be imported, while `out` is used to specify the (path and) name of the imported data file or directory in the workspace. -DVC supports several types of (local or) remote locations: +DVC supports several types of (local or) remote locations (protocols): | Type | Discussion | URL format | | -------- | ------------------------------------------------------- | ------------------------------------------ | diff --git a/static/docs/commands-reference/pull.md b/static/docs/commands-reference/pull.md index bd74a7aea5..8fccd9bb06 100644 --- a/static/docs/commands-reference/pull.md +++ b/static/docs/commands-reference/pull.md @@ -116,7 +116,7 @@ $ dvc remote list r1 ssh://_username_@_host_/path/to/dvc/cache/directory ``` -> DVC supports several protocols for remote storage. For details, see the +> DVC supports several remote types. For details, see the > [`remote add`](/doc/commands-reference/remote-add) documentation. With a remote cache containing some images and other files, we can pull all diff --git a/static/docs/commands-reference/push.md b/static/docs/commands-reference/push.md index e503d332f5..62c68beee7 100644 --- a/static/docs/commands-reference/push.md +++ b/static/docs/commands-reference/push.md @@ -122,7 +122,7 @@ the example, let's define an SSH remote with the `dvc remote add` command: r1 ssh://_username_@_host_/path/to/dvc/cache/directory ``` -> DVC supports several protocols for remote storage. For details, see the +> DVC supports several remote types. For details, see the > [`remote add`](/doc/commands-reference/remote-add) documentation. Push all data file caches from the current Git branch to the default remote: diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 441aca2c84..c97727311d 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -18,7 +18,7 @@ usage: dvc remote add [-h] [--global] [--system] [--local] [-q | -v] positional arguments: name Name of the remote. - url URL. (See supported URLs below.) + url URL. (See supported URLs in the examples below.) ``` ## Description @@ -79,6 +79,8 @@ Use `dvc config` to unset/change the default remote as so: ## Examples +The following are the types and of remotes (protocols) supported: +
### Click for a local remote example diff --git a/static/docs/commands-reference/remote_modify.md b/static/docs/commands-reference/remote_modify.md index 8f6b1263ae..4541a81a5e 100644 --- a/static/docs/commands-reference/remote_modify.md +++ b/static/docs/commands-reference/remote_modify.md @@ -54,6 +54,8 @@ This command modifies a `remote` section in the DVC ## Examples +The following are the types and of remotes (protocols) supported: +
### Click for AWS S3 available options diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 43e9afad31..6eb84ce704 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -32,8 +32,8 @@ $ git commit .dvc/config -m "initialize DVC local remote" > [use cases](/doc/use-cases), other "more remote" types of remotes will be > required. -Adding a remote should be specified by both its type prefix and its path. DVC -currently supports seven types of remotes: +Adding a remote should be specified by both its type prefix (protocol) and its +path. DVC currently supports seven types of remotes: - `local` - Local directory - `s3` - Amazon Simple Storage Service @@ -41,7 +41,7 @@ currently supports seven types of remotes: - `azure` - Azure Blob Storage - `ssh` - Secure Shell - `hdfs` - The Hadoop Distributed File System -- `http` - Support for HTTP and HTTPS protocol +- `http` - HTTP and HTTPS protocols > Depending on the [remote storage](/doc/commands-reference/remote) type you > plan to use to keep and share your data you might need to specify one of the diff --git a/static/docs/user-guide/external-dependencies.md b/static/docs/user-guide/external-dependencies.md index 85d4c96215..0bcefb67ed 100644 --- a/static/docs/user-guide/external-dependencies.md +++ b/static/docs/user-guide/external-dependencies.md @@ -2,8 +2,8 @@ With DVC you can specify external files as dependencies for your pipeline stages. DVC will track changes in those files and will reflect that in your -pipeline state. Currently DVC supports the following types of external -dependencies: +pipeline state. Currently, the following types of external dependencies +(protocols) are supported: 1. Local files and directories outside of your dvc repository; 2. Amazon S3; diff --git a/static/docs/user-guide/external-outputs.md b/static/docs/user-guide/external-outputs.md index be60d51713..fb85251891 100644 --- a/static/docs/user-guide/external-outputs.md +++ b/static/docs/user-guide/external-outputs.md @@ -3,8 +3,8 @@ You can specify external files as outputs for [DVC-files](/doc/user-guide/dvc-file-format) created by `dvc run` (stage files). DVC will track changes in those files and will reflect so in your pipeline -[status](/doc/commands-reference/status). Currently DVC supports these types of -external outputs: +[status](/doc/commands-reference/status). Currently, the following types of +external outputs (protocols) are supported: 1. Local files and directories outside of your dvc repository; 2. Amazon S3; From a25620d1707f0e5171eb136a33c6b3b7c22a62bb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 29 Jun 2019 12:19:37 -0500 Subject: [PATCH 03/45] cmd ref: full name for subcommands (I know I did this before...) /shrug --- static/docs/commands-reference/cache_dir.md | 2 +- static/docs/commands-reference/metrics_add.md | 2 +- static/docs/commands-reference/metrics_modify.md | 2 +- static/docs/commands-reference/metrics_remove.md | 2 +- static/docs/commands-reference/metrics_show.md | 2 +- static/docs/commands-reference/pipeline_list.md | 2 +- static/docs/commands-reference/pipeline_show.md | 2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/static/docs/commands-reference/cache_dir.md b/static/docs/commands-reference/cache_dir.md index 0890694f08..59bed40d37 100644 --- a/static/docs/commands-reference/cache_dir.md +++ b/static/docs/commands-reference/cache_dir.md @@ -1,4 +1,4 @@ -# dir +# cache dir Set/unset the cache directory location intuitively (compared to using `dvc config cache`). diff --git a/static/docs/commands-reference/metrics_add.md b/static/docs/commands-reference/metrics_add.md index 73365e6dfc..79ee788b86 100644 --- a/static/docs/commands-reference/metrics_add.md +++ b/static/docs/commands-reference/metrics_add.md @@ -1,4 +1,4 @@ -# add +# metrics add Tag the file located at `path` as a metric file. diff --git a/static/docs/commands-reference/metrics_modify.md b/static/docs/commands-reference/metrics_modify.md index 6fef1ef0da..78c966a2dd 100644 --- a/static/docs/commands-reference/metrics_modify.md +++ b/static/docs/commands-reference/metrics_modify.md @@ -1,4 +1,4 @@ -# modify +# metrics modify Modify metric settings (like type, path expression that is used to parse it, etc). diff --git a/static/docs/commands-reference/metrics_remove.md b/static/docs/commands-reference/metrics_remove.md index 19b6c55764..9df94a4a44 100644 --- a/static/docs/commands-reference/metrics_remove.md +++ b/static/docs/commands-reference/metrics_remove.md @@ -1,4 +1,4 @@ -# remove +# metrics remove Keep file as an output, remove metric flag and stop tracking as a metric file. diff --git a/static/docs/commands-reference/metrics_show.md b/static/docs/commands-reference/metrics_show.md index 5e77913891..abf78694de 100644 --- a/static/docs/commands-reference/metrics_show.md +++ b/static/docs/commands-reference/metrics_show.md @@ -1,4 +1,4 @@ -# show +# metrics show Find and print project metrics. diff --git a/static/docs/commands-reference/pipeline_list.md b/static/docs/commands-reference/pipeline_list.md index 2b213c4842..71a8bdedf9 100644 --- a/static/docs/commands-reference/pipeline_list.md +++ b/static/docs/commands-reference/pipeline_list.md @@ -1,4 +1,4 @@ -# list +# pipeline list Show connected groups (pipelines) of [stage](/doc/commands-reference/run) that are independent of each other. diff --git a/static/docs/commands-reference/pipeline_show.md b/static/docs/commands-reference/pipeline_show.md index 0c1f6d632c..f3e171e5c3 100644 --- a/static/docs/commands-reference/pipeline_show.md +++ b/static/docs/commands-reference/pipeline_show.md @@ -1,4 +1,4 @@ -# show +# pipeline show Show [stages](/doc/commands-reference/run) in a pipeline that lead to the specified stage. By default it lists From 618584aa00bde68750b9583e535b52625bd381c7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 30 Jun 2019 21:27:46 -0500 Subject: [PATCH 04/45] term: S3 buckets have "keys" (not paths) --- static/docs/commands-reference/config.md | 2 +- static/docs/commands-reference/remote_add.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/static/docs/commands-reference/config.md b/static/docs/commands-reference/config.md index 0dcea52aea..38060c0207 100644 --- a/static/docs/commands-reference/config.md +++ b/static/docs/commands-reference/config.md @@ -190,7 +190,7 @@ Add an S3 remote and set it as the project default: > to create your bucket. ```dvc -$ dvc remote add myremote s3://bucket/path +$ dvc remote add myremote s3://bucket/key $ dvc config core.remote myremote ``` diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index c97727311d..1fa32f376c 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -129,7 +129,7 @@ $ cat .dvc/config > to create your bucket. ```dvc -$ dvc remote add myremote s3://bucket/path +$ dvc remote add myremote s3://bucket/key ``` By default DVC expects your AWS CLI is already From 48f91210d9351cc20f4c026de2b853f372a68a4d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 30 Jun 2019 21:28:43 -0500 Subject: [PATCH 05/45] get-url: adds first cmd ref doc For #385 --- src/Documentation/sidebar.json | 2 + static/docs/commands-reference/get-url.md | 176 +++++++++++++++++++ static/docs/commands-reference/import-url.md | 16 +- 3 files changed, 188 insertions(+), 6 deletions(-) create mode 100644 static/docs/commands-reference/get-url.md diff --git a/src/Documentation/sidebar.json b/src/Documentation/sidebar.json index 21e1634083..27dd74fd5a 100644 --- a/src/Documentation/sidebar.json +++ b/src/Documentation/sidebar.json @@ -92,6 +92,7 @@ "destroy.md", "diff.md", "fetch.md", + "get-url.md", "gc.md", "import-url.md", "init.md", @@ -135,6 +136,7 @@ "destroy.md": "destroy", "diff.md": "diff", "fetch.md": "fetch", + "get-url.md": "get-url", "gc.md": "gc", "import-url.md": "import-url", "init.md": "init", diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md new file mode 100644 index 0000000000..17ff13709b --- /dev/null +++ b/static/docs/commands-reference/get-url.md @@ -0,0 +1,176 @@ +# get-url + +Download or copy file or directory from any supported URL (for example +`http://`, `s3://`, `ssh://`, and other protocols) or local directory to the +workspace. + +> Unlike `dvc import-url`, this command does not track the downloaded file nor +> does it creates a DVC-file. + +## Synopsis + +```usage +usage: dvc get-url [-h] [-q | -v] url [out] + +positional arguments: + url (See supported URLs in the description.) + out Destination path to put data to. +``` + +## Description + +In some cases it is convenient to get a data file or directory from a remote +location and into the workspace. The `dvc get-url` command helps the user do so. +The `url` argument should provide the location of the data to be imported, while +`out` is used to specify the (path and) name of the file or directory in the +workspace. + +DVC supports several types of (local or) remote locations (protocols): + +| Type | Discussion | URL format | +| -------- | ------------------------------------------------------- | ------------------------------------------ | +| `local` | Local path | `/path/to/local/file` | +| `s3` | Amazon S3 | `s3://mybucket/data.csv` | +| `gs` | Google Storage | `gs://mybucket/data.csv` | +| `ssh` | SSH server | `ssh://user@example.com:/path/to/data.csv` | +| `hdfs` | HDFS | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | +| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` | + +> `remote://myremote/path/to/file` notation just means that a DVC +> [remote](/doc/commands-reference/remote) `myremote` is defined, and when DVC +> is running it internally expands this URL into a regular S3, SSH, GS, etc URL +> by appending `/path/to/file` to the `myremote`'s configured base path. + +Another way to understand the `dvc get-url` command is as a tool for downloading +data files. + +On GNU/Linux systems for example, instead of `dvc get-url` with HTTP(S) it's +possible to instead use: + +```dvc +$ wget https://example.com/path/to/data.csv +``` + +## Options + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. + +## Examples + +
+ +### Click and expand for a local example + +```dvc +$ dvc get-url /local/path/to/data +``` + +The above command will copy the `/local/path/to/data` file or directory into +`./dir`. + +
+ +### Click for AWS S3 example + +This command will copy an S3 bucket key into the local workspace with the same +file name: + +```dvc +$ dvc get-url s3://bucket/key +``` + +By default DVC expects your AWS CLI is already +[configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html). +DVC will be using default AWS credentials file to access S3. To override some of +these settings, you could the options described in `dvc remote modify`. + +We use `boto3` library to set up a client and communicate with AWS S3. The +following API methods are performed: + +- `list_objects_v2`, `list_objects` +- `head_object` +- `download_file` +- `upload_file` +- `delete_object` +- `copy` + +So, make sure you have the following permissions enabled: + +- s3:ListBucket +- s3:GetObject +- s3:PutObject +- s3:DeleteObject + +
+ +
+ +### Click for Google Cloud Storage example + +```dvc +$ dvc get-url gs://bucket/path file +``` + +The above command downloads the `/path` file (or directory) into `./file`. + +
+ +
+ +### Click for SSH example + +```dvc +$ dvc get-url ssh://user@example.com/path/to/data +``` + +Using default SSH credentials, the above command gets the `data` file (or +directory). + +
+ +
+ +### Click for HDFS example + +```dvc +$ dvc get-url hdfs://user@example.com/path/to/data +``` + +
+ +
+ +### Click for HTTP example + +> Both HTTP and HTTPS protocols are supported. + +```dvc +$ dvc get-url https://example.com/path/to/data +``` + +
+ +
+ +### Click for DVC remote example + +First, register a new remote, in this case with the S3 protocol: + +```dvc +$ dvc remote add myremote ssh://user@example.com/path/to/dir +``` + +Then use the `remote://` prefix to refer to the remote in order to download +`data` from that location: + +```dvc +$ dvc get-url remote://myremote/data +``` + +
diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 27151ed777..56e81a29f3 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -1,8 +1,12 @@ # import-url -Import file from any supported URL (for example `http://`, `s3://`, `ssh://`, or -other supported protocols) or local directory to local workspace and track -changes in remote file or directory. +Download or copy file or directory from any supported URL (for example +`http://`, `s3://`, `ssh://`, and other protocols) or local directory to the +workspace, and track changes in the remote source with DVC. Creates +a DVC-file. + +> See also `dvc get-url` which corresponds to the first step this command +> performs (just download the ). ## Synopsis @@ -16,7 +20,7 @@ positional arguments: ## Description -In some cases it is convenient to add a data file or a directory to a workspace +In some cases it is convenient to add a data file or directory to the workspace such that it will be automatically updated when the data source is updated. Examples: @@ -56,9 +60,9 @@ DVC supports several types of (local or) remote locations (protocols): > is necessary to track if the specified remote file (URL) changed to download > it again. -> `remote://myremote/path/to/file` notation just means that there is a DVC +> `remote://myremote/path/to/file` notation just means that a DVC > [remote](/doc/commands-reference/remote) `myremote` is defined and when DVC is -> running it internally expands this URL into a regular S3, SSH, GS, etc URL by +> running, it internally expands this URL into a regular S3, SSH, GS, etc URL by > appending `/path/to/file` to the `myremote`'s configured base path. Another way to understand the `dvc import-url` command is as a short-cut for a From 2be096a5c0867381fcb9b1b6d7c5e77d7b52c3f5 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 30 Jun 2019 22:27:19 -0500 Subject: [PATCH 06/45] remote: small update to remote small duplicity --- static/docs/commands-reference/remote.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/static/docs/commands-reference/remote.md b/static/docs/commands-reference/remote.md index 377f305447..8e61aa1eb7 100644 --- a/static/docs/commands-reference/remote.md +++ b/static/docs/commands-reference/remote.md @@ -43,11 +43,7 @@ Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually `.dvc/cache` directory inside your repository), which enables basic DVC usage scenarios out of the box. -[Add](/doc/commands-reference/remote-add), -[default](/doc/commands-reference/remote-default), -[list](/doc/commands-reference/remote-list), -[modify](/doc/commands-reference/remote-modify), and -[remove](/doc/commands-reference/remote-remove) commands read or modify DVC +`dvc remote` commands read or modify DVC [config files](/doc/user-guide/dvc-files-and-directories). Alternatively, `dvc config` can be used or these files could be edited manually. From 661705d13fbcf88a27516ab64330d040f9ed61b1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 1 Jul 2019 13:44:49 -0500 Subject: [PATCH 07/45] term: std way to use "current" and "present" (working directory) --- static/docs/commands-reference/cache.md | 5 ++-- static/docs/commands-reference/config.md | 4 ++-- static/docs/commands-reference/diff.md | 2 +- static/docs/commands-reference/gc.md | 2 +- static/docs/commands-reference/metrics.md | 2 +- static/docs/commands-reference/pull.md | 8 +++---- static/docs/commands-reference/remote_add.md | 8 +++---- static/docs/commands-reference/repro.md | 4 ++-- static/docs/commands-reference/root.md | 2 +- static/docs/commands-reference/status.md | 23 +++++++++---------- static/docs/commands-reference/version.md | 2 +- static/docs/get-started/add-files.md | 4 ++-- static/docs/get-started/retrieve-data.md | 4 ++-- static/docs/user-guide/update-tracked-file.md | 2 +- 14 files changed, 35 insertions(+), 37 deletions(-) diff --git a/static/docs/commands-reference/cache.md b/static/docs/commands-reference/cache.md index 429f417439..0d799b68cb 100644 --- a/static/docs/commands-reference/cache.md +++ b/static/docs/commands-reference/cache.md @@ -21,9 +21,8 @@ default `cache` directory. The DVC cache is where your data files, models, etc (anything you want to version with DVC) are actually stored. The corresponding files you see in the -working directory or "workspace" simply link to the ones in cache. (See -`dvc config cache` `type` setting for more information on file links on -different platforms.) +workspace simply link to the ones in cache. (See `dvc config cache` `type` +setting for more information on file links on different platforms.) > For more cache-related configuration options refer to `dvc config cache`. diff --git a/static/docs/commands-reference/config.md b/static/docs/commands-reference/config.md index 38060c0207..14bdd85e49 100644 --- a/static/docs/commands-reference/config.md +++ b/static/docs/commands-reference/config.md @@ -103,8 +103,8 @@ details.) effect. (It affects only files that are under DVC control.) Due to the way DVC handles linking between the data files in the cache and - their counterparts in the working directory, it's easy to accidentally corrupt - the cached version of a file by editing or overwriting it. Turning this config + their counterparts in the workspace, it's easy to accidentally corrupt the + cached version of a file by editing or overwriting it. Turning this config option on forces you to run `dvc unprotect` before updating a file, providing an additional layer of security to your data. diff --git a/static/docs/commands-reference/diff.md b/static/docs/commands-reference/diff.md index acf5010b22..4df02d7789 100644 --- a/static/docs/commands-reference/diff.md +++ b/static/docs/commands-reference/diff.md @@ -37,7 +37,7 @@ by the Git SCM, for example when `dvc init` was used with the `--no-scm` option. - `-t TARGET`, `--target TARGET` - Source path to a data file or directory. If not specified, compares all files and directories that are under DVC control - in the current workspace. + in the workspace. - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/static/docs/commands-reference/gc.md b/static/docs/commands-reference/gc.md index ff857111b0..a17e86ae5a 100644 --- a/static/docs/commands-reference/gc.md +++ b/static/docs/commands-reference/gc.md @@ -69,7 +69,7 @@ $ du -sh .dvc/cache/ ``` When you run `dvc gc` it removes all objects from cache that are not referenced -in the current workspace (by collecting hash sums from the DVC-files): +in the workspace (by collecting hash sums from the DVC-files): ```dvc $ dvc gc diff --git a/static/docs/commands-reference/metrics.md b/static/docs/commands-reference/metrics.md index f84266b368..b52db7e2f8 100644 --- a/static/docs/commands-reference/metrics.md +++ b/static/docs/commands-reference/metrics.md @@ -56,7 +56,7 @@ $ dvc run -d code/evaluate.py -M data/eval.json \ > running `dvc metrics add data/eval.json` to explicitly mark `data/eval.json` > as a metric file. -Now let's print metric values that we are tracking in the current project: +Now let's print metric values that we are tracking in this DVC project: ```dvc $ dvc metrics show -a diff --git a/static/docs/commands-reference/pull.md b/static/docs/commands-reference/pull.md index 8fccd9bb06..d53b76c630 100644 --- a/static/docs/commands-reference/pull.md +++ b/static/docs/commands-reference/pull.md @@ -85,10 +85,10 @@ reflinks or hardlinks to put it in the workspace without copying. See path for this option to have effect. Determines the files to pull by searching each target directory and its subdirectories for DVC-files to inspect. -- `-f`, `--force` - does not prompt when removing working directory files, which - occurs during the process of updating the workspace. This option surfaces - behavior from the `dvc checkout` command because `dvc pull` in effect performs - a _checkout_ after downloading files. +- `-f`, `--force` - does not prompt when removing workspace files, which occurs + during the process of updating the workspace. This option surfaces behavior + from the `dvc checkout` command because `dvc pull` in effect performs a + _checkout_ after downloading files. - `-j JOBS`, `--jobs JOBS` - specifies number of jobs to run simultaneously while downloading files from the remote cache. The effect is to control the diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 1fa32f376c..4f71118c62 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -26,10 +26,10 @@ positional arguments: `name` and `url` are required. `url` specifies a location to store your data. It could be S3 path, SSH path, Azure, Google cloud, Aliyun OSS local directory, etc. (See more examples below.) If `url` is a local relative path, it will be -resolved relative to the current directory but saved **relative to the config -file location** (see LOCAL example below). Whenever possible DVC will create a -remote directory if it doesn't exists yet. It won't create an S3 bucket though -and will rely on default access settings. +resolved relative to the present working directory but saved **relative to the +config file location** (see LOCAL example below). Whenever possible DVC will +create a remote directory if it doesn't exists yet. It won't create an S3 bucket +though and will rely on default access settings. > If you installed DVC via `pip`, and depending on the remote type you plan to > use you might need to install optional dependencies: `s3`, `gs`, `azure`, diff --git a/static/docs/commands-reference/repro.md b/static/docs/commands-reference/repro.md index 7d3cefb224..bfdc0e5cfb 100644 --- a/static/docs/commands-reference/repro.md +++ b/static/docs/commands-reference/repro.md @@ -20,8 +20,8 @@ positional arguments: `dvc repro` provides an interface to run the commands in a computational graph (a.k.a. pipeline) again, as defined in the stage files (DVC-files) found in the -current workspace. (A pipeline is typically defined using the `dvc run` command, -while data input nodes are defined by the `dvc add` command.) +workspace. (A pipeline is typically defined using the `dvc run` command, while +data input nodes are defined by the `dvc add` command.) There's a few ways to restrict the stages that will be run again by this command: by specifying stage file(s) as `targets`, or by using the diff --git a/static/docs/commands-reference/root.md b/static/docs/commands-reference/root.md index c9bb0bcd75..dea7ae263f 100644 --- a/static/docs/commands-reference/root.md +++ b/static/docs/commands-reference/root.md @@ -12,7 +12,7 @@ usage: dvc root [-h] [-q | -v] While in project's sub-directory, sometimes developers may want to refer some file belonging to another directory. This command returns relative path to the -DVC project's root directory from the current working directory. So, this +DVC project's root directory from the present working directory. So, this command can be used to build a path to a dependency file, command, or output. ## Options diff --git a/static/docs/commands-reference/status.md b/static/docs/commands-reference/status.md index 83f7290c11..bb579edd62 100644 --- a/static/docs/commands-reference/status.md +++ b/static/docs/commands-reference/status.md @@ -32,12 +32,12 @@ synchronize them). The two modes, _local_ and _cloud_ are triggered by using the | remote | `--cloud` | Comparisons are made between the local cache, and the default remote, defined with `dvc remote --default` command. | DVC determines data and code files to compare by analyzing all -[DVC-files](/doc/user-guide/dvc-file-format) in the current workspace -(`--all-branches` and `--all-tags` in the `cloud` mode compare multiple -workspaces - across all branches or tags). The comparison can be limited to -specific DVC-files by listing them as `targets`. Changes are reported only -against the given `targets`. When combined with the `--with-deps` option, a -search is made for changes in other stages that affect the target. +[DVC-files](/doc/user-guide/dvc-file-format) in the workspace (`--all-branches` +and `--all-tags` in the `cloud` mode compare multiple workspaces - across all +branches or tags). The comparison can be limited to specific DVC-files by +listing them as `targets`. Changes are reported only against the given +`targets`. When combined with the `--with-deps` option, a search is made for +changes in other stages that affect the target. In the `local` mode, changes are detected through the checksum of every file listed in every DVC-file in question against the corresponding file in the file @@ -91,15 +91,14 @@ cache. For the typical process to update workspaces, see name defined using the `dvc remote` command. Implies `--cloud`. - `-a`, `--all-branches` - compares cache content against all Git branches. - Instead of checking just the currently checked out workspace, it checks - against all other branches of this workspace. The corresponding branches are - shown in the status output. Applies only if `--cloud` or a remote is - specified. + Instead of checking just the workspace, it checks against all other branches + of this workspace. The corresponding branches are shown in the status output. + Applies only if `--cloud` or a remote is specified. - `-T`, `--all-tags` - compares cache content against all Git tags. Both the `--all-branches` and `--all-tags` options cause DVC to check more than just - the currently checked out workspace. The corresponding tags are shown in the - status output. Applies only if `--cloud` or a remote is specified. + the workspace. The corresponding tags are shown in the status output. Applies + only if `--cloud` or a remote is specified. - `--show-checksums` - shows the DVC checksum for the file, rather than the file name. Applies only if `--cloud` is specified. diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index c6fb98b18c..d8b9efa2e2 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -24,7 +24,7 @@ system/environment: | `Filesystem type` | Shows the filesystem type (eg. ext4, FAT, etc.) and mount point of workspace and the cache directory | > If `dvc version` is executed outside a DVC workspace, the command outputs the -> filesystem type of the current working directory. +> filesystem type of the present working directory. #### Components of DVC version diff --git a/static/docs/get-started/add-files.md b/static/docs/get-started/add-files.md index 92d4387a3c..ae00f56ba8 100644 --- a/static/docs/get-started/add-files.md +++ b/static/docs/get-started/add-files.md @@ -42,8 +42,8 @@ $ git commit -m "add source data to DVC" ### Expand to learn about DVC internals You can see that actual data file has been moved to the `.dvc/cache` directory, -while the entries in the working directory may be links to the actual files in -the DVC cache. (See +while the entries in the workspace may be links to the actual files in the DVC +cache. (See [File link types](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) to learn about the supported file linking options, their tradeoffs, and how to enable them). diff --git a/static/docs/get-started/retrieve-data.md b/static/docs/get-started/retrieve-data.md index 64f3faf8f0..7288f84d20 100644 --- a/static/docs/get-started/retrieve-data.md +++ b/static/docs/get-started/retrieve-data.md @@ -12,8 +12,8 @@ $ dvc pull ``` This command retrieves data files that are referenced in _all_ -[DVC-files](/doc/user-guide/dvc-file-format) in the current workspace. So, you -usually run it after `git clone`, `git pull`, or `git checkout`. +[DVC-files](/doc/user-guide/dvc-file-format) in the workspace. So, you usually +run it after `git clone`, `git pull`, or `git checkout`. As an easy way to test it: diff --git a/static/docs/user-guide/update-tracked-file.md b/static/docs/user-guide/update-tracked-file.md index 7b822a225e..f69e063124 100644 --- a/static/docs/user-guide/update-tracked-file.md +++ b/static/docs/user-guide/update-tracked-file.md @@ -1,7 +1,7 @@ # Update a Tracked File Due to the way DVC handles linking between the data files in the cache and their -counterparts in the working directory (refer to +counterparts in the workspace (refer to [Large Dataset Optimization](/docs/user-guide/large-dataset-optimization)), updating tracked files has to be carried out with caution to avoid data corruption when the DVC config option `cache.type` is set to `hardlink` or/and From f887c50c763393a66b3a288ccec118572453cd2e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 1 Jul 2019 13:57:12 -0500 Subject: [PATCH 08/45] term: fix links to "stage" --- static/docs/get-started/connect-code-and-data.md | 6 +++--- static/docs/get-started/example-pipeline.md | 8 ++++---- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/static/docs/get-started/connect-code-and-data.md b/static/docs/get-started/connect-code-and-data.md index fe59822e33..8867b0417c 100644 --- a/static/docs/get-started/connect-code-and-data.md +++ b/static/docs/get-started/connect-code-and-data.md @@ -61,9 +61,9 @@ $ git commit -m "add code"
Having installed the `src/prepare.py` script in your repo, the following command -transforms it into a reproducible -[stage](/doc/user-guide/dvc-files-and-directories) for the ML pipeline we're -building (described in the [next chapter](/doc/get-started/example-pipeline)). +transforms it into a reproducible [stage](/doc/commands-reference/run) for the +ML pipeline we're building (described in the +[next chapter](/doc/get-started/example-pipeline)). ```dvc $ dvc run -f prepare.dvc \ diff --git a/static/docs/get-started/example-pipeline.md b/static/docs/get-started/example-pipeline.md index ca1bdb1e78..88a6a062d3 100644 --- a/static/docs/get-started/example-pipeline.md +++ b/static/docs/get-started/example-pipeline.md @@ -129,10 +129,10 @@ $ git commit -m "add dataset" ## Define stages -Each [stage](/doc/user-guide/dvc-files-and-directories) – the parts of a -pipeline – is described by providing a command to run, input data it takes and a -list of output files. DVC is not Python or any other language specific and can -wrap any command runnable via CLI. +Each [stage](/doc/commands-reference/run) – the parts of a pipeline – is +described by providing a command to run, input data it takes and a list of +output files. DVC is not Python or any other language specific and can wrap any +command runnable via CLI. - The first stage is to extract XML from the archive. Note that we don't need to run `dvc add` on `Posts.xml` below, `dvc run` saves the data automatically From 900b8fce5d3cde16cd4f3602aaae1db9c13fbd75 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 1 Jul 2019 14:09:55 -0500 Subject: [PATCH 09/45] term: review usage of "setting", favoring "config option" or "configuration", when in the context of DVC configuration options. --- static/docs/commands-reference/cache.md | 4 ++-- static/docs/commands-reference/index.md | 5 +++-- static/docs/commands-reference/remote_list.md | 4 ++-- static/docs/commands-reference/remote_modify.md | 6 +++--- static/docs/commands-reference/remote_remove.md | 4 ++-- static/docs/tutorial/sharing-data.md | 4 ++-- 6 files changed, 14 insertions(+), 13 deletions(-) diff --git a/static/docs/commands-reference/cache.md b/static/docs/commands-reference/cache.md index 0d799b68cb..9a8a26be76 100644 --- a/static/docs/commands-reference/cache.md +++ b/static/docs/commands-reference/cache.md @@ -21,8 +21,8 @@ default `cache` directory. The DVC cache is where your data files, models, etc (anything you want to version with DVC) are actually stored. The corresponding files you see in the -workspace simply link to the ones in cache. (See `dvc config cache` `type` -setting for more information on file links on different platforms.) +workspace simply link to the ones in cache. (See `dvc config cache`, `type` +config option, for more information on file links on different platforms.) > For more cache-related configuration options refer to `dvc config cache`. diff --git a/static/docs/commands-reference/index.md b/static/docs/commands-reference/index.md index d95f92ca0f..5d64dc2853 100644 --- a/static/docs/commands-reference/index.md +++ b/static/docs/commands-reference/index.md @@ -10,7 +10,8 @@ DVC is a command-line tool. The typical use case for DVC goes as follows - Use `--outs` option to specify `dvc run` command outputs which will be converted to DVC data files after the code runs. - Clone a git repo with the code of your ML application pipeline. However, this - will not copy your DVC cache. Use cloud storage settings and `dvc push` to - share the cache (data). + will not copy your DVC cache. Use + [data remotes](/doc/commands-reference/remote) and `dvc push` to share the + cache (data). - Use `dvc repro` to quickly reproduce your pipeline on a new iteration, after your data item files or source code of your ML application are modified. diff --git a/static/docs/commands-reference/remote_list.md b/static/docs/commands-reference/remote_list.md index 20482407a0..e20d1d4592 100644 --- a/static/docs/commands-reference/remote_list.md +++ b/static/docs/commands-reference/remote_list.md @@ -28,8 +28,8 @@ Including names and URLs. - `--local` - list remotes specified in the [local](/doc/user-guide/dvc-files-and-directories) configuration file - (`.dvc/config.local`). Local configuration files stores private settings that - should not be tracked by Git. + (`.dvc/config.local`). Local config files stores private configuration that + should not be tracked by SCM (Git). ## Examples diff --git a/static/docs/commands-reference/remote_modify.md b/static/docs/commands-reference/remote_modify.md index 4541a81a5e..caff5a6d70 100644 --- a/static/docs/commands-reference/remote_modify.md +++ b/static/docs/commands-reference/remote_modify.md @@ -1,6 +1,6 @@ # remote modify -Modify remote settings. +Modify configuration of remotes. > This command is commonly needed after `dvc remote add` or > [default](/doc/commands-reference/remote-default) to setup credentials or @@ -30,9 +30,9 @@ Remote `name` and `option` name are required. Option names are remote type specific. See below examples and a list of per remote type - AWS S3, Google cloud, Azure, SSH, ALiyun OSS, and others. -This command modifies a `remote` section in the DVC +This command modifies a `remote` section in the DVC project's [config file](/doc/user-guide/dvc-files-and-directories). Alternatively, -`dvc config` or manual editing could be used to change settings. +`dvc config` or manual editing could be used to change the configuration. ## Options diff --git a/static/docs/commands-reference/remote_remove.md b/static/docs/commands-reference/remote_remove.md index 361ad73c29..443e0a4644 100644 --- a/static/docs/commands-reference/remote_remove.md +++ b/static/docs/commands-reference/remote_remove.md @@ -36,8 +36,8 @@ possible to edit config files manually. - `--local` - remove remote specified in the [local](/doc/user-guide/dvc-files-and-directories) configuration file - (`.dvc/config.local`). Local configuration files stores private settings or - local environment specific settings that should not be tracked by Git. + (`.dvc/config.local`). Local config files stores private configuration that + should not be tracked by SCM (Git). ## Examples diff --git a/static/docs/tutorial/sharing-data.md b/static/docs/tutorial/sharing-data.md index 32d3343fc9..be5f1a478c 100644 --- a/static/docs/tutorial/sharing-data.md +++ b/static/docs/tutorial/sharing-data.md @@ -12,8 +12,8 @@ DVC is able to push the cache to a cloud. > Using your shared cache a colleague can reuse ML models that were trained on > your machine. -First, you need to modify the cloud settings in the DVC config file. This can be -done using the CLI as shown below. +First, you need to set a data remote which will be stored in the project's +config file. This can be done using the CLI as shown below. > Note that we are using `dvc-share` s3 bucket as an example and you don't have > write access to it, so in order to follow the tutorial you will need to either From 1583de80b15b903e75c8478cd87587f3f2dbf42a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 3 Jul 2019 11:08:33 -0600 Subject: [PATCH 10/45] term: revert usage of "key" vs "path" for S3 remote URLS --- static/docs/commands-reference/config.md | 2 +- static/docs/commands-reference/get-url.md | 2 +- static/docs/commands-reference/remote_add.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/static/docs/commands-reference/config.md b/static/docs/commands-reference/config.md index 14bdd85e49..65970932b9 100644 --- a/static/docs/commands-reference/config.md +++ b/static/docs/commands-reference/config.md @@ -190,7 +190,7 @@ Add an S3 remote and set it as the project default: > to create your bucket. ```dvc -$ dvc remote add myremote s3://bucket/key +$ dvc remote add myremote s3://bucket/path $ dvc config core.remote myremote ``` diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 17ff13709b..c0a499690d 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -82,7 +82,7 @@ This command will copy an S3 bucket key into the local workspace with the same file name: ```dvc -$ dvc get-url s3://bucket/key +$ dvc get-url s3://bucket/path ``` By default DVC expects your AWS CLI is already diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 4f71118c62..454216c879 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -129,7 +129,7 @@ $ cat .dvc/config > to create your bucket. ```dvc -$ dvc remote add myremote s3://bucket/key +$ dvc remote add myremote s3://bucket/path ``` By default DVC expects your AWS CLI is already From fa4ed4f76633a295d2a5a71ef1e2337e56d0aaef Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 3 Jul 2019 12:58:48 -0600 Subject: [PATCH 11/45] cmd ref: update import-url/get-rul and related commands with relevant notes about their usage; and better use of "data artifact" term --- static/docs/commands-reference/get-url.md | 66 +++++++++----------- static/docs/commands-reference/import-url.md | 20 +++--- static/docs/commands-reference/init.md | 6 +- static/docs/commands-reference/version.md | 3 + static/docs/tutorial/reproducibility.md | 2 +- 5 files changed, 46 insertions(+), 51 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index c0a499690d..1c92c6b852 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -1,11 +1,15 @@ # get-url -Download or copy file or directory from any supported URL (for example -`http://`, `s3://`, `ssh://`, and other protocols) or local directory to the -workspace. +Download or copy file or directory from any supported URL (for example `s3://`, +`ssh://`, and other protocols) or local directory to the local file system. -> Unlike `dvc import-url`, this command does not track the downloaded file nor -> does it creates a DVC-file. +> Like `dvc init`, this is one of the few commands that doesn't require an +> existing DVC project to run. + +> See `dvc get` to download data from other DVC repositories (e.g. GitHub URLs). + +> Unlike `dvc import-url`, this command does not track the downloaded data +> file(s) (does not create a DVC-file). ## Synopsis @@ -19,23 +23,26 @@ positional arguments: ## Description -In some cases it is convenient to get a data file or directory from a remote -location and into the workspace. The `dvc get-url` command helps the user do so. -The `url` argument should provide the location of the data to be imported, while -`out` is used to specify the (path and) name of the file or directory in the -workspace. +In some cases it's convenient to get a data file or directory from a remote +location. The `dvc get-url` command helps the user do so. The `url` argument +should provide the location of the data to be imported, while `out` can be used +to specify the (path and) file name desired for the imported data file or +directory. + +It's important to note that this command does not require an initialized +repository to work in. It's a single-purpose command that can be used out of the +box after installing DVC. DVC supports several types of (local or) remote locations (protocols): -| Type | Discussion | URL format | -| -------- | ------------------------------------------------------- | ------------------------------------------ | -| `local` | Local path | `/path/to/local/file` | -| `s3` | Amazon S3 | `s3://mybucket/data.csv` | -| `gs` | Google Storage | `gs://mybucket/data.csv` | -| `ssh` | SSH server | `ssh://user@example.com:/path/to/data.csv` | -| `hdfs` | HDFS | `hdfs://user@example.com/path/to/data.csv` | -| `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | -| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` | +| Type | Discussion | URL format | +| ------- | ------------------------------------------------------- | ------------------------------------------ | +| `local` | Local path | `/path/to/local/file` | +| `s3` | Amazon S3 | `s3://mybucket/data.csv` | +| `gs` | Google Storage | `gs://mybucket/data.csv` | +| `ssh` | SSH server | `ssh://user@example.com:/path/to/data.csv` | +| `hdfs` | HDFS | `hdfs://user@example.com/path/to/data.csv` | +| `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | > `remote://myremote/path/to/file` notation just means that a DVC > [remote](/doc/commands-reference/remote) `myremote` is defined, and when DVC @@ -78,8 +85,8 @@ The above command will copy the `/local/path/to/data` file or directory into ### Click for AWS S3 example -This command will copy an S3 bucket key into the local workspace with the same -file name: +This command will copy an S3 object into the present working directory with the +same file name: ```dvc $ dvc get-url s3://bucket/path @@ -157,20 +164,3 @@ $ dvc get-url https://example.com/path/to/data
- -### Click for DVC remote example - -First, register a new remote, in this case with the S3 protocol: - -```dvc -$ dvc remote add myremote ssh://user@example.com/path/to/dir -``` - -Then use the `remote://` prefix to refer to the remote in order to download -`data` from that location: - -```dvc -$ dvc get-url remote://myremote/data -``` - -
diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 56e81a29f3..4855b0f4fb 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -1,12 +1,14 @@ # import-url -Download or copy file or directory from any supported URL (for example -`http://`, `s3://`, `ssh://`, and other protocols) or local directory to the -workspace, and track changes in the remote source with DVC. Creates -a DVC-file. +Download or copy file or directory from any supported URL (for example `s3://`, +`ssh://`, and other protocols) or local directory to the workspace, +and track changes in the remote source with DVC. Creates a DVC-file. + +> See `dvc import` to download and tack data from other DVC repositories (e.g. +> GitHub URLs). > See also `dvc get-url` which corresponds to the first step this command -> performs (just download the ). +> performs (just download the data). ## Synopsis @@ -20,7 +22,7 @@ positional arguments: ## Description -In some cases it is convenient to add a data file or directory to the workspace +In some cases it's convenient to add a data file or directory to the workspace such that it will be automatically updated when the data source is updated. Examples: @@ -40,8 +42,8 @@ to the workspace initially, and to re-download it upon changes. The `dvc import-url` command helps the user create such an external data dependency. The `url` argument should provide the location of the data to be -imported, while `out` is used to specify the (path and) name of the imported -data file or directory in the workspace. +imported, while `out` can be used to specify the (path and) file name desired +for the imported data file or directory in the workspace. DVC supports several types of (local or) remote locations (protocols): @@ -219,7 +221,7 @@ file has changed. ## Example: Detecting remote file changes What if that remote file is one which will be updated regularly? The project -goal might include regenerating some artifact based on the updated data. A +goal might include regenerating a data artifact based on the updated data. A pipeline can be triggered to re-execute based on a changed external dependency. Let us again use the [Getting Started](/doc/get-started) example, in a way which diff --git a/static/docs/commands-reference/init.md b/static/docs/commands-reference/init.md index 7432978587..3d8a4ccb15 100644 --- a/static/docs/commands-reference/init.md +++ b/static/docs/commands-reference/init.md @@ -1,6 +1,6 @@ # init -This command initializes a DVC environment in a current Git repository. +This command initializes a DVC environment in a present working directory. ## Synopsis @@ -24,7 +24,7 @@ usage: dvc init [-h] [-q | -v] [--no-scm] [-f] - `-v`, `--verbose` - displays detailed tracing information. -## Details +## Description After DVC initialization, a new directory `.dvc/` will be created with `config` and `.gitignore` files and `cache` directory. These files and directories are @@ -38,7 +38,7 @@ this is your local directory and you cannot push it to any Git remote. ## Examples -- Creating a new DVC repository: +- Creating a new DVC repository on top of a Git repository: ```dvc $ mkdir tag_classifier diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index d8b9efa2e2..590ae47703 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -3,6 +3,9 @@ This command shows the system/environment information along with the DVC version. +> Like `dvc init`, this is one of the few commands that doesn't require an +> existing DVC project to run. + ## Synopsis ```usage diff --git a/static/docs/tutorial/reproducibility.md b/static/docs/tutorial/reproducibility.md index afdd189b6f..7cb2d4caab 100644 --- a/static/docs/tutorial/reproducibility.md +++ b/static/docs/tutorial/reproducibility.md @@ -110,7 +110,7 @@ master: data/eval.txt: AUC: 0.624652 ``` -> It is convenient to keep track of information even for failed experiments. +> It's convenient to keep track of information even for failed experiments. > Sometimes a failed hypothesis gives more information than a successful one. Let’s keep the result in the repository. Later we can find out why bigram does From 92c62b542c984a15c6bb55285e20a86e84ce534d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 3 Jul 2019 13:29:46 -0600 Subject: [PATCH 12/45] term: revert "present" for "current" (working directory) Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-256650173 --- static/docs/commands-reference/cache_dir.md | 2 +- static/docs/commands-reference/get-url.md | 2 +- static/docs/commands-reference/init.md | 2 +- static/docs/commands-reference/remote_add.md | 7 +++---- static/docs/commands-reference/root.md | 2 +- static/docs/commands-reference/version.md | 2 +- 6 files changed, 8 insertions(+), 9 deletions(-) diff --git a/static/docs/commands-reference/cache_dir.md b/static/docs/commands-reference/cache_dir.md index 59bed40d37..ab70d31ddf 100644 --- a/static/docs/commands-reference/cache_dir.md +++ b/static/docs/commands-reference/cache_dir.md @@ -18,7 +18,7 @@ positional arguments: Helper to set the `cache.dir` configuration option. Unlike doing so with `dvc config cache`, this command transform paths (`value`) that are provided -relative to the present working directory into paths **relative to the config +relative to the current working directory into paths **relative to the config file location**. They are required in the latter form for the config file. ## Options diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 1c92c6b852..72bb7cb749 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -85,7 +85,7 @@ The above command will copy the `/local/path/to/data` file or directory into ### Click for AWS S3 example -This command will copy an S3 object into the present working directory with the +This command will copy an S3 object into the current working directory with the same file name: ```dvc diff --git a/static/docs/commands-reference/init.md b/static/docs/commands-reference/init.md index 3d8a4ccb15..12b6758442 100644 --- a/static/docs/commands-reference/init.md +++ b/static/docs/commands-reference/init.md @@ -1,6 +1,6 @@ # init -This command initializes a DVC environment in a present working directory. +This command initializes a DVC environment in a current working directory. ## Synopsis diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 454216c879..3b17b557c3 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -26,7 +26,7 @@ positional arguments: `name` and `url` are required. `url` specifies a location to store your data. It could be S3 path, SSH path, Azure, Google cloud, Aliyun OSS local directory, etc. (See more examples below.) If `url` is a local relative path, it will be -resolved relative to the present working directory but saved **relative to the +resolved relative to the current working directory but saved **relative to the config file location** (see LOCAL example below). Whenever possible DVC will create a remote directory if it doesn't exists yet. It won't create an S3 bucket though and will rely on default access settings. @@ -260,9 +260,8 @@ $ dvc remote add myremote hdfs://user@example.com/path/to/dir > **Note!** Currently HTTP remotes only support downloads operations: > -> - `pull` -> - `fetch` -> - `import-url` +> - `pull` and `fetch` +> - `import-url` and `get-url` > - As an [external dependency](/doc/user-guide/external-dependencies) ```dvc diff --git a/static/docs/commands-reference/root.md b/static/docs/commands-reference/root.md index dea7ae263f..c9bb0bcd75 100644 --- a/static/docs/commands-reference/root.md +++ b/static/docs/commands-reference/root.md @@ -12,7 +12,7 @@ usage: dvc root [-h] [-q | -v] While in project's sub-directory, sometimes developers may want to refer some file belonging to another directory. This command returns relative path to the -DVC project's root directory from the present working directory. So, this +DVC project's root directory from the current working directory. So, this command can be used to build a path to a dependency file, command, or output. ## Options diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index 590ae47703..2446c86c5e 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -27,7 +27,7 @@ system/environment: | `Filesystem type` | Shows the filesystem type (eg. ext4, FAT, etc.) and mount point of workspace and the cache directory | > If `dvc version` is executed outside a DVC workspace, the command outputs the -> filesystem type of the present working directory. +> filesystem type of the current working directory. #### Components of DVC version From d865ff8555187f4e6795ab63281128167026cdf2 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 3 Jul 2019 13:32:11 -0600 Subject: [PATCH 13/45] status: update `-a` option desc. Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-256650381 --- static/docs/commands-reference/status.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/static/docs/commands-reference/status.md b/static/docs/commands-reference/status.md index bb579edd62..907d2ff982 100644 --- a/static/docs/commands-reference/status.md +++ b/static/docs/commands-reference/status.md @@ -91,9 +91,9 @@ cache. For the typical process to update workspaces, see name defined using the `dvc remote` command. Implies `--cloud`. - `-a`, `--all-branches` - compares cache content against all Git branches. - Instead of checking just the workspace, it checks against all other branches - of this workspace. The corresponding branches are shown in the status output. - Applies only if `--cloud` or a remote is specified. + Instead of checking just the workspace, it runs the same status command in all + the branches of this repo. The corresponding branches are shown in the status + output. Applies only if `--cloud` or a remote is specified. - `-T`, `--all-tags` - compares cache content against all Git tags. Both the `--all-branches` and `--all-tags` options cause DVC to check more than just From 2b0ef40e7135a3c0077bfd45c514db4da880eceb Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 5 Jul 2019 16:38:05 -0600 Subject: [PATCH 14/45] revert a couple recent errors --- static/docs/commands-reference/init.md | 4 ++-- static/docs/user-guide/external-dependencies.md | 4 ---- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/static/docs/commands-reference/init.md b/static/docs/commands-reference/init.md index 12b6758442..8f3adaaa92 100644 --- a/static/docs/commands-reference/init.md +++ b/static/docs/commands-reference/init.md @@ -1,6 +1,6 @@ # init -This command initializes a DVC environment in a current working directory. +This command initializes a DVC environment in a local Git repository. ## Synopsis @@ -38,7 +38,7 @@ this is your local directory and you cannot push it to any Git remote. ## Examples -- Creating a new DVC repository on top of a Git repository: +- Creating a new DVC repository (requires a Git repository): ```dvc $ mkdir tag_classifier diff --git a/static/docs/user-guide/external-dependencies.md b/static/docs/user-guide/external-dependencies.md index 0bcefb67ed..4f0a0532a6 100644 --- a/static/docs/user-guide/external-dependencies.md +++ b/static/docs/user-guide/external-dependencies.md @@ -28,10 +28,6 @@ $ dvc run -d /home/shared/data.txt \ cp /home/shared/data.txt data.txt ``` -> ```yml -> TODO: What does the DVC-file looks like? -> ``` - ### Amazon S3 ```dvc From 37eae0f1e1fcab486d7bccd54d40d4186a198a42 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 5 Jul 2019 16:50:28 -0600 Subject: [PATCH 15/45] term: review "data artifact" related terms and add glossary tag --- static/docs/commands-reference/add.md | 12 ++++++------ static/docs/commands-reference/commit.md | 12 ++++++------ static/docs/commands-reference/import-url.md | 5 +++-- static/docs/commands-reference/run.md | 2 +- static/docs/get-started/agenda.md | 9 +++++---- static/docs/get-started/example-pipeline.md | 2 +- .../use-cases/data-and-model-files-versioning.md | 8 ++++---- 7 files changed, 26 insertions(+), 24 deletions(-) diff --git a/static/docs/commands-reference/add.md b/static/docs/commands-reference/add.md index b5e78342bd..7c1c5b03aa 100644 --- a/static/docs/commands-reference/add.md +++ b/static/docs/commands-reference/add.md @@ -69,12 +69,12 @@ to work with directory hierarchies with `dvc add`. the single DVC-file points to a file in the DVC cache that contains references to the files in the added hierarchy. -In a DVC project `dvc add` can be used to version control any data artifacts - -input, intermediate, output files and directories, as well as model files. It is -useful by itself to go back and forth between different versions of datasets or -models. Usually though, it is recommended to use `dvc run` and `dvc repro` -mechanism to version control intermediate and output artifacts (like models). -This way you bring data provenance and make your project reproducible. +In a DVC project `dvc add` can be used to version control any data +artifact (input, intermediate, or output files and directories, and model +files). It is useful by itself to go back and forth between different versions +of datasets or models. Usually though, it is recommended to use `dvc run` and +`dvc repro` mechanism to version control intermediate and final results (like +models). This way you bring data provenance and make your project reproducible. ## Options diff --git a/static/docs/commands-reference/commit.md b/static/docs/commands-reference/commit.md index 925719f064..5576b2b778 100644 --- a/static/docs/commands-reference/commit.md +++ b/static/docs/commands-reference/commit.md @@ -55,12 +55,12 @@ to the DVC cache as the last step. What _commit_ means is that DVC: - Adds the file/directory or to the DVC cache. There are many cases where the last step is not desirable (usually, rapid -iteration on some experiment). For the DVC commands where it is appropriate the -`--no-commit` option prevents the last step from occurring - thus, we are saving -some time and space, by not storing all the data artifacts for all the attempts -we do. The checksum is still computed and added to the DVC-file, but the file is -not added to the cache. That's where the `dvc commit` command comes into play. -It handles that last step of adding the file to the DVC cache. +iteration on some experiment). For the DVC commands where available, the +`--no-commit` option prevents the last step from occurring, thus we are saving +time and space by not storing all the data artifacts for every +command attempt. The checksum is still computed and added to the DVC-file, but +the file is not added to the cache. That's where the `dvc commit` command comes +into play. It handles that last step of adding the file to the DVC cache. ## Options diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 4855b0f4fb..81d033b9a2 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -221,8 +221,9 @@ file has changed. ## Example: Detecting remote file changes What if that remote file is one which will be updated regularly? The project -goal might include regenerating a data artifact based on the updated data. A -pipeline can be triggered to re-execute based on a changed external dependency. +goal might include regenerating a data artifact based on the +updated source. A pipeline can be triggered to re-execute based on a changed +external dependency. Let us again use the [Getting Started](/doc/get-started) example, in a way which will mimic an updated external data source. diff --git a/static/docs/commands-reference/run.md b/static/docs/commands-reference/run.md index db6bd84e7a..28e6c800b9 100644 --- a/static/docs/commands-reference/run.md +++ b/static/docs/commands-reference/run.md @@ -45,7 +45,7 @@ be no cycles, etc. Note that `dvc repro` provides an interface to check state and reproduce this graph later. This concept is similar to the one of the `Makefile` but DVC -captures data and caches data artifacts along the way. Check this +captures data and caches data artifacts along the way. Check this [example](/doc/get-started/example-pipeline) to learn more and try to build a pipeline. diff --git a/static/docs/get-started/agenda.md b/static/docs/get-started/agenda.md index 5a8e04919b..ea411e5157 100644 --- a/static/docs/get-started/agenda.md +++ b/static/docs/get-started/agenda.md @@ -26,10 +26,11 @@ contrary, DVC is designed to be pretty agnostic of frameworks, languages, etc. If you have data files or data sets and/or you produce other data files, models, data sets and you want to: -- capture and save those data artifacts the same way we capture code, -- track and switch between different versions of these artifacts easily, -- being able to answer the question of how those artifacts or models were built - in the first place, +- capture and save those data artifacts the same way we capture + code, +- track and switch between different versions of the data easily, +- being able to answer the question of how data artifacts (e.g. ML models) were + built in the first place, - being able to compare them, - bring best practices to your team and get everyone on the same page. diff --git a/static/docs/get-started/example-pipeline.md b/static/docs/get-started/example-pipeline.md index 88a6a062d3..eadc2b2a20 100644 --- a/static/docs/get-started/example-pipeline.md +++ b/static/docs/get-started/example-pipeline.md @@ -16,7 +16,7 @@ itself is a sequence of transformation we apply to the data file: ![](/static/img/example-flow-2x.png) DVC helps to describe these transformations and capture actual data involved - -input data set we are processing, intermediate artifacts (useful if some +input data set we are processing, intermediate results (useful if some transformations take a lot of time to run), output models. This way we can capture what data and code were used to produce a specific model in a sharable and reproducible way. diff --git a/static/docs/use-cases/data-and-model-files-versioning.md b/static/docs/use-cases/data-and-model-files-versioning.md index cd0eb5e961..9a886d1212 100644 --- a/static/docs/use-cases/data-and-model-files-versioning.md +++ b/static/docs/use-cases/data-and-model-files-versioning.md @@ -18,10 +18,10 @@ store and share your data alongside your code. In this very basic scenario, DVC is a better replacement for `git-lfs` (check the [Related Technologies](/doc/understanding-dvc/related-technologies) to get a better sense why) and ad-hoc scripts on top of Amazon S3 (or name-it cloud) that -are usually used to manage ML artifacts like model files, data files, etc. -Unlike `git-lfs`, DVC doesn't require installing a server; it can be used -on-premises (NAS, SSH, for example) or with any major cloud provider (S3, Google -Cloud, Azure). +are usually used to manage ML data artifacts like data files, +models, etc. Unlike `git-lfs`, DVC doesn't require installing a server; it can +be used on-premises (NAS, SSH, for example) or with any major cloud provider +(S3, Google Cloud, Azure). Let's say you already have a project that uses a bunch of images that are stored in `images` directory and has a `model.pkl` file - your model file that is From 2ff746bc2fb5321bfc72c62a0ad61fdeef2874ac Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 5 Jul 2019 16:52:43 -0600 Subject: [PATCH 16/45] add: revert shortenned command list in desc. Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-256649625 --- static/docs/commands-reference/remote.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/static/docs/commands-reference/remote.md b/static/docs/commands-reference/remote.md index 8e61aa1eb7..377f305447 100644 --- a/static/docs/commands-reference/remote.md +++ b/static/docs/commands-reference/remote.md @@ -43,7 +43,11 @@ Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually `.dvc/cache` directory inside your repository), which enables basic DVC usage scenarios out of the box. -`dvc remote` commands read or modify DVC +[Add](/doc/commands-reference/remote-add), +[default](/doc/commands-reference/remote-default), +[list](/doc/commands-reference/remote-list), +[modify](/doc/commands-reference/remote-modify), and +[remove](/doc/commands-reference/remote-remove) commands read or modify DVC [config files](/doc/user-guide/dvc-files-and-directories). Alternatively, `dvc config` can be used or these files could be edited manually. From 6e467ddbbf62465f4debec2779f44cb84dbd4dda Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Jul 2019 13:43:23 -0600 Subject: [PATCH 17/45] term: review usage of "check" and "checkout" --- static/docs/commands-reference/checkout.md | 2 +- static/docs/commands-reference/config.md | 2 +- static/docs/commands-reference/install.md | 2 +- static/docs/commands-reference/metrics_add.md | 4 +-- .../docs/commands-reference/metrics_modify.md | 4 +-- .../docs/commands-reference/metrics_show.md | 15 ++++++----- static/docs/commands-reference/remove.md | 4 +-- static/docs/commands-reference/repro.md | 2 +- static/docs/commands-reference/run.md | 4 +-- .../docs/get-started/compare-experiments.md | 2 +- static/docs/get-started/example-pipeline.md | 4 +-- static/docs/get-started/example-versioning.md | 6 ++--- static/docs/get-started/index.md | 8 +++--- static/docs/tutorial/define-ml-pipeline.md | 2 +- static/docs/tutorial/sharing-data.md | 2 +- .../data-and-model-files-versioning.md | 25 +++++++++---------- static/docs/user-guide/autocomplete.md | 6 ++--- .../user-guide/contributing-documentation.md | 4 +-- static/docs/user-guide/contributing.md | 2 +- static/docs/user-guide/dvc-file-format.md | 4 +-- 20 files changed, 50 insertions(+), 54 deletions(-) diff --git a/static/docs/commands-reference/checkout.md b/static/docs/commands-reference/checkout.md index ef73a0b025..71abd16840 100644 --- a/static/docs/commands-reference/checkout.md +++ b/static/docs/commands-reference/checkout.md @@ -179,7 +179,7 @@ MD5 (model.pkl) = 3863d0e317dee0a55c4e59d2ec0eef33 ``` What if we want to rewind history, so to speak? The `git checkout` command lets -us checkout at any point in the commit history, or even check out other tags. It +us checkout at any point in the commit history, or even checkout other tags. It automatically adjusts the files, by replacing file content and adding or deleting files as necessary. diff --git a/static/docs/commands-reference/config.md b/static/docs/commands-reference/config.md index 65970932b9..f4ed1bc804 100644 --- a/static/docs/commands-reference/config.md +++ b/static/docs/commands-reference/config.md @@ -158,7 +158,7 @@ details.) ### state -State config options. Check the +State config options. See [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to learn more about the state file that is used for optimization. diff --git a/static/docs/commands-reference/install.md b/static/docs/commands-reference/install.md index 9ebce721cf..0e99c80c7f 100644 --- a/static/docs/commands-reference/install.md +++ b/static/docs/commands-reference/install.md @@ -154,7 +154,7 @@ bigrams-experiment These tags are used to mark points in the development of this workspace, and to document specific experiments conducted in the workspace. To take a look at one -we check-out the workspace using the SCM (in this case Git): +we checkout the workspace using the SCM (in this case Git): ```dvc $ git checkout 6-featurization diff --git a/static/docs/commands-reference/metrics_add.md b/static/docs/commands-reference/metrics_add.md index 79ee788b86..92ca0ebf31 100644 --- a/static/docs/commands-reference/metrics_add.md +++ b/static/docs/commands-reference/metrics_add.md @@ -41,8 +41,8 @@ contains multiple metrics. `dvc metrics show`. Accepted value depends on the metric file type (`-t` option): - - `json` - check [JSONPath spec](https://goessner.net/articles/JsonPath/) to - see available options. For example, `"AUC"` extracts the value from the + - `json` - see [JSONPath spec](https://goessner.net/articles/JsonPath/) for + available options. For example, `"AUC"` extracts the value from the following json-formatted metric file: `{"AUC": "0.624652"}`. - `tsv`/`csv` - `row,column`, e.g. `1,2`. Indices are 0-based. - `htsv`/`hcsv` - `row,column name`. Row index is 0-based. First row is used diff --git a/static/docs/commands-reference/metrics_modify.md b/static/docs/commands-reference/metrics_modify.md index 78c966a2dd..1f94394380 100644 --- a/static/docs/commands-reference/metrics_modify.md +++ b/static/docs/commands-reference/metrics_modify.md @@ -46,8 +46,8 @@ ERROR: failed to modify metric file settings - `dvc metrics show`. Accepted value depends on the metric file type (`-t` option): - - `json` - check [JSONPath spec](https://goessner.net/articles/JsonPath/) to - see available options. For example, `"AUC"` extracts the value from the + - `json` - see [JSONPath spec](https://goessner.net/articles/JsonPath/) for + available options. For example, `"AUC"` extracts the value from the following json-formatted metric file: `{"AUC": "0.624652"}`. - `tsv`/`csv` - `row,column`, e.g. `1,2`. Indices are 0-based. - `htsv`/`hcsv` - `row,column name`. Row index is 0-based. First row is used diff --git a/static/docs/commands-reference/metrics_show.md b/static/docs/commands-reference/metrics_show.md index abf78694de..b1353e6335 100644 --- a/static/docs/commands-reference/metrics_show.md +++ b/static/docs/commands-reference/metrics_show.md @@ -51,14 +51,13 @@ supported. corresponding format in this case. Accepted value depends on the metric file type (`-t` option): - - `json` - check [JSONPath spec](https://goessner.net/articles/JsonPath/) or - [jsonpath-ng](https://github.com/h2non/jsonpath-ng) to see available - options. For example, `"AUC"` extracts the value from the following - json-formatted metric file: `{"AUC": "0.624652"}`. You can also filter on - certain values. For example, - `"$.metrics[?(@.deviation_mse<0.30) & (@.value_mse>0.4)]"` extracts only the - values for model versions if they meet the given condition(s) from the - metric file: + - `json` - see [JSONPath spec](https://goessner.net/articles/JsonPath/) or + [jsonpath-ng](https://github.com/h2non/jsonpath-ng) for available options. + For example, `"AUC"` extracts the value from the following json-formatted + metric file: `{"AUC": "0.624652"}`. You can also filter on certain values. + For example, `"$.metrics[?(@.deviation_mse<0.30) & (@.value_mse>0.4)]"` + extracts only the values for model versions if they meet the given + condition(s) from the metric file: `{"metrics": [{"dataset": "train", "deviation_mse": 0.173461, "value_mse": 0.421601}]}` - `tsv`/`csv` - `row,column`, e.g. `1,2`. Indices are 0-based. - `htsv`/`hcsv` - `row,column name`. Row index is 0-based. First row is used diff --git a/static/docs/commands-reference/remove.md b/static/docs/commands-reference/remove.md index 6c3459df16..e1879b6c78 100644 --- a/static/docs/commands-reference/remove.md +++ b/static/docs/commands-reference/remove.md @@ -19,8 +19,8 @@ positional arguments: DVC-files in the workspace by default.) ``` -Check also [Update Tracked Files](/doc/user-guide/update-tracked-file) to see -how it can be used to replace or modify files that are under DVC control. +Refer to [Update Tracked Files](/doc/user-guide/update-tracked-file) to see how +it can be used to replace or modify files that are under DVC control. ## Options diff --git a/static/docs/commands-reference/repro.md b/static/docs/commands-reference/repro.md index bfdc0e5cfb..19a2524c88 100644 --- a/static/docs/commands-reference/repro.md +++ b/static/docs/commands-reference/repro.md @@ -112,7 +112,7 @@ specified), and updates stage files with the new checksum information. ## Examples For simplicity, let's build a pipeline defined below (if you want get your hands -on something more real, check this +on something more real, see this [mini-tutorial](/doc/get-started/example-pipeline)). It takes this `text.txt` file: diff --git a/static/docs/commands-reference/run.md b/static/docs/commands-reference/run.md index 28e6c800b9..99e895fa37 100644 --- a/static/docs/commands-reference/run.md +++ b/static/docs/commands-reference/run.md @@ -45,7 +45,7 @@ be no cycles, etc. Note that `dvc repro` provides an interface to check state and reproduce this graph later. This concept is similar to the one of the `Makefile` but DVC -captures data and caches data artifacts along the way. Check this +captures data and caches data artifacts along the way. See this [example](/doc/get-started/example-pipeline) to learn more and try to build a pipeline. @@ -84,7 +84,7 @@ pipeline. - `-m`, `--metrics` - another kind of output files. It is usually a small human readable file (JSON, CSV, text, whatnot) with some numbers or other - information that describes a model or other outputs. Check `dvc metrics` to + information that describes a model or other outputs. See `dvc metrics` to learn more about tracking metrics and comparing them across different model or experiment versions. diff --git a/static/docs/get-started/compare-experiments.md b/static/docs/get-started/compare-experiments.md index dec676c8a4..e373e7a2b4 100644 --- a/static/docs/get-started/compare-experiments.md +++ b/static/docs/get-started/compare-experiments.md @@ -38,5 +38,5 @@ bigram-experiment: ``` DVC provides built-in support to track and navigate `JSON`, `TSV` or `CSV` -metric files if you want to track additional information. Check `dvc metrics` to +metric files if you want to track additional information. See `dvc metrics` to learn more. diff --git a/static/docs/get-started/example-pipeline.md b/static/docs/get-started/example-pipeline.md index eadc2b2a20..cdf6d027e8 100644 --- a/static/docs/get-started/example-pipeline.md +++ b/static/docs/get-started/example-pipeline.md @@ -9,7 +9,7 @@ it `python`. This is a short version of the [Tutorial](/doc/tutorial). In this example, we will focus on building a simple ML pipeline that takes an archive with StackOverflow posts and trains the prediction model and saves it as -an output. Check [get started](/doc/get-started) to see links to other examples, +an output. See [get started](/doc/get-started) to see links to other examples, tutorials, use cases if you want to cover other aspects of the DVC. The pipeline itself is a sequence of transformation we apply to the data file: @@ -94,7 +94,7 @@ When we run `dvc add` `Posts.xml.zip`, DVC creates a `dvc init` created a new directory `example/.dvc/` with `config`, `.gitignore` files and the `cache` directory. These files and directories are hidden from -users in general. Users don't interact with these files directly. Check +users in general. Users don't interact with these files directly. See [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to learn more. diff --git a/static/docs/get-started/example-versioning.md b/static/docs/get-started/example-versioning.md index 3c323cf455..8718a094d5 100644 --- a/static/docs/get-started/example-versioning.md +++ b/static/docs/get-started/example-versioning.md @@ -70,7 +70,7 @@ $ pip install -r requirements.txt The repository you cloned is already DVC-initialized. There should be a `.dvc/` directory with `config`, `.gitignore` files and the `cache` directory. These files and directories are hidden from users in general. Users don't interact -with these files directly. Check +with these files directly. See [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to learn more. @@ -341,14 +341,14 @@ changed. Here where DVC pipelines feature comes very handy and was designed for. We touched it briefly when we described `dvc run` and `dvc repro` at the very end. The next step here would be splitting the script into two steps and utilizing -DVC pipelines. Check this [example](/doc/get-started/example-pipeline) to get a +DVC pipelines. See this [example](/doc/get-started/example-pipeline) to get a hands-on experience with them and try to apply it here. Don't hesitate to join our [community](/chat) to ask any questions! Another thing, you should have noticed, is the metrics file - `metrics.json` and the way we captured it with `-M metrics.json` option. Metric file is a special type of output DVC provides an interface on top to compare across tags or -branches. Check `dvc metrics` command and +branches. See `dvc metrics` command and [Compare Experiments](/doc/get-started/compare-experiments) to learn more about managing metrics. Next step you should try on your own is converting both iterations we had into `dvc run` and then utilize `dvc metrics show` to compare diff --git a/static/docs/get-started/index.md b/static/docs/get-started/index.md index 9db6fbd937..5782ca6771 100644 --- a/static/docs/get-started/index.md +++ b/static/docs/get-started/index.md @@ -8,11 +8,11 @@ hands-on experience with real-life scenarios - first is about model and data set [versioning](/doc/get-started/example-versioning), and the second one is focused on [pipelines and reproducibility](/doc/get-started/example-pipeline). -✅ Please, join our [community](/chat) or check these [support](/support) -options if you have any questions or need any help. We are very responsive ⚡. +✅ Please, join our [community](/chat) or see these [support](/support) options +if you have any questions or need any help. We are very responsive ⚡. -✅ Check out the [Github](https://github.com/iterative/dvc) page and give us a -⭐ if you like the project! +✅ Check out the [Github](https://github.com/iterative/dvc) repository and give +us a ⭐ if you like the project! ✅ Contribute either on [Github](https://github.com/iterative/dvc) or [Patreon](https://www.patreon.com/DVCorg/overview) to support the Project. diff --git a/static/docs/tutorial/define-ml-pipeline.md b/static/docs/tutorial/define-ml-pipeline.md index 5f944e1f94..b4ac31df58 100644 --- a/static/docs/tutorial/define-ml-pipeline.md +++ b/static/docs/tutorial/define-ml-pipeline.md @@ -54,7 +54,7 @@ Refer to files with DVC. Note that to modify or replace a data file that is under DVC control you may -need to run `dvc unprotect` or `dvc remove` first (check the +need to run `dvc unprotect` or `dvc remove` first (see the [Update Tracked File](/doc/user-guide/update-tracked-file) guide). Use `dvc move` to rename or move a data file that is under DVC control. diff --git a/static/docs/tutorial/sharing-data.md b/static/docs/tutorial/sharing-data.md index be5f1a478c..f828990b30 100644 --- a/static/docs/tutorial/sharing-data.md +++ b/static/docs/tutorial/sharing-data.md @@ -54,7 +54,7 @@ $ dvc pull ``` After executing this command, all the data files will be in the right place. You -can check that by trying to reproduce the default goal: +can confirm this by trying to reproduce the default goal: ```dvc # Nothing to reproduce: diff --git a/static/docs/use-cases/data-and-model-files-versioning.md b/static/docs/use-cases/data-and-model-files-versioning.md index 9a886d1212..47cfb0b5d7 100644 --- a/static/docs/use-cases/data-and-model-files-versioning.md +++ b/static/docs/use-cases/data-and-model-files-versioning.md @@ -15,13 +15,12 @@ store and share your data alongside your code. ![](/static/img/model-versioning-diagram.png) -In this very basic scenario, DVC is a better replacement for `git-lfs` (check -the [Related Technologies](/doc/understanding-dvc/related-technologies) to get a -better sense why) and ad-hoc scripts on top of Amazon S3 (or name-it cloud) that -are usually used to manage ML data artifacts like data files, -models, etc. Unlike `git-lfs`, DVC doesn't require installing a server; it can -be used on-premises (NAS, SSH, for example) or with any major cloud provider -(S3, Google Cloud, Azure). +In this very basic scenario, DVC is a better replacement for `git-lfs` (see +[Related Technologies](/doc/understanding-dvc/related-technologies)) and ad-hoc +scripts on top of Amazon S3 (or any other cloud) that are usually used to manage +ML data artifacts like data files, models, etc. Unlike `git-lfs`, +DVC doesn't require installing a server; it can be used on-premises (NAS, SSH, +for example) or with any major cloud provider (S3, Google Cloud, Azure). Let's say you already have a project that uses a bunch of images that are stored in `images` directory and has a `model.pkl` file - your model file that is @@ -109,9 +108,9 @@ points to the `v1.0` of the data set. While code and model files are from the ![](/static/img/versioning.png) -To share your data with others you need to setup a remote repository. Check the -[Share Data And Model Files] use case to get a high level overview on how to -setup it and use `dvc pull` and `dvc push` commands to collaborate. Please, -don't forget to check the [versioning](/doc/get-started/example-versioning) get -started example to get a hands-on experience with datasets and models -versioning. +To share your data with others you need to setup a remote repository. See the +[Share Data And Model Files](/doc/use-cases/share-data-and-model-files) use case +to get a high level overview on how to setup it and use `dvc pull` and +`dvc push` commands to collaborate. Please, don't forget to see the +[versioning](/doc/get-started/example-versioning) example to get a hands-on +experience with datasets and models versioning. diff --git a/static/docs/user-guide/autocomplete.md b/static/docs/user-guide/autocomplete.md index 593b9b0ff5..a374f1da62 100644 --- a/static/docs/user-guide/autocomplete.md +++ b/static/docs/user-guide/autocomplete.md @@ -31,10 +31,8 @@ Depending on what you typed on the command line so far, it completes: Depending upon your preference and the availability of both Bash and Zsh on your system, follow the steps given below to Configure Bash and/or Zsh. -If you are new to working with shell or uncertain about your active shell, use -`$0` to check your active shell. - -For example: +If you are new to working with shell or uncertain about your active shell, print +`$0` to check your active shell. For example: ```dvc $ echo $0 diff --git a/static/docs/user-guide/contributing-documentation.md b/static/docs/user-guide/contributing-documentation.md index f49529e10a..fdb5110453 100644 --- a/static/docs/user-guide/contributing-documentation.md +++ b/static/docs/user-guide/contributing-documentation.md @@ -28,8 +28,8 @@ to update the docs and redeploy the website. ## Submitting changes In case of a minor change, you can use the **Edit on Github** button (found to -the right of each page) to fork the project, edit it in place (check the right -top corner for an Edit button on Github), and create a pull request (PR). +the right of each page) to fork the project, edit it in place (with the source +file **Edit** button in Github), and create a pull request (PR). Otherwise, please refer to the following procedure: diff --git a/static/docs/user-guide/contributing.md b/static/docs/user-guide/contributing.md index 6f1f92e370..f52fc4ecca 100644 --- a/static/docs/user-guide/contributing.md +++ b/static/docs/user-guide/contributing.md @@ -1,7 +1,7 @@ # Contributing We welcome contributions to [DVC](https://github.com/iterative/dvc) by the -community. Check the +community. See the [Contributing to the Documentation](/doc/user-guide/contributing-documentation) guide if you want to fix or update the documentation or this website. diff --git a/static/docs/user-guide/dvc-file-format.md b/static/docs/user-guide/dvc-file-format.md index 6fcb09c550..e7e85eed00 100644 --- a/static/docs/user-guide/dvc-file-format.md +++ b/static/docs/user-guide/dvc-file-format.md @@ -7,8 +7,8 @@ the `.dvc` file extension (e.g. `process.dvc`), or with the default name to track your data and reproduce pipeline stages. The file itself contains a simple YAML format that could be easily written or altered manually. -Check the [Syntax Highlighting](/doc/user-guide/plugins) to enable the -highlighting for your editor. +See the [Syntax Highlighting](/doc/user-guide/plugins) to learn how to enable +the highlighting for your editor. Here is an example of a DVC-file: From e635432e80cdae12f452a1ac189ea9c11a262076 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Jul 2019 15:18:04 -0600 Subject: [PATCH 18/45] download: reduce summary notes in import-url and get-url Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-257715778 --- static/docs/commands-reference/get-url.md | 14 +++++++------- static/docs/commands-reference/import-url.md | 6 +++--- static/docs/commands-reference/version.md | 6 +++--- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 72bb7cb749..5f1ce6ff74 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -3,11 +3,6 @@ Download or copy file or directory from any supported URL (for example `s3://`, `ssh://`, and other protocols) or local directory to the local file system. -> Like `dvc init`, this is one of the few commands that doesn't require an -> existing DVC project to run. - -> See `dvc get` to download data from other DVC repositories (e.g. GitHub URLs). - > Unlike `dvc import-url`, this command does not track the downloaded data > file(s) (does not create a DVC-file). @@ -23,10 +18,13 @@ positional arguments: ## Description +> Like `dvc init`, this is one of the few commands that doesn't require an +> existing DVC project to run. + In some cases it's convenient to get a data file or directory from a remote location. The `dvc get-url` command helps the user do so. The `url` argument -should provide the location of the data to be imported, while `out` can be used -to specify the (path and) file name desired for the imported data file or +should provide the location of the data to be downloaded, while `out` can be +used to specify the (path and) file name desired for the downloaded data file or directory. It's important to note that this command does not require an initialized @@ -52,6 +50,8 @@ DVC supports several types of (local or) remote locations (protocols): Another way to understand the `dvc get-url` command is as a tool for downloading data files. +> See `dvc get` to download data from other DVC repositories (e.g. GitHub URLs). + On GNU/Linux systems for example, instead of `dvc get-url` with HTTP(S) it's possible to instead use: diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 81d033b9a2..b33063dec3 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -4,9 +4,6 @@ Download or copy file or directory from any supported URL (for example `s3://`, `ssh://`, and other protocols) or local directory to the workspace, and track changes in the remote source with DVC. Creates a DVC-file. -> See `dvc import` to download and tack data from other DVC repositories (e.g. -> GitHub URLs). - > See also `dvc get-url` which corresponds to the first step this command > performs (just download the data). @@ -40,6 +37,9 @@ the remote file or directory to enable DVC to efficiently check it to determine if the local copy is out of date. DVC uses this remote URL to download the data to the workspace initially, and to re-download it upon changes. +> See `dvc import` to download and tack data from other DVC repositories (e.g. +> GitHub URLs). + The `dvc import-url` command helps the user create such an external data dependency. The `url` argument should provide the location of the data to be imported, while `out` can be used to specify the (path and) file name desired diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index 2446c86c5e..b8d4e35662 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -3,9 +3,6 @@ This command shows the system/environment information along with the DVC version. -> Like `dvc init`, this is one of the few commands that doesn't require an -> existing DVC project to run. - ## Synopsis ```usage @@ -14,6 +11,9 @@ usage: dvc version [-h] [-q | -v] ## Description +> Like `dvc init`, this is one of the few commands that doesn't require an +> existing DVC project to run. + Running the command `dvc version` outputs the following information about the system/environment: From 836fa17eab82d5a7ab82290af58cab1c1fad6511 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Jul 2019 15:26:11 -0600 Subject: [PATCH 19/45] guides: add `get-url` to comment spec in DVC-file format doc Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-257720370 --- static/docs/user-guide/dvc-file-format.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/static/docs/user-guide/dvc-file-format.md b/static/docs/user-guide/dvc-file-format.md index e7e85eed00..89f3b81d84 100644 --- a/static/docs/user-guide/dvc-file-format.md +++ b/static/docs/user-guide/dvc-file-format.md @@ -34,7 +34,7 @@ outs: locked: True # Comments like this line persist through multiple executions of -# dvc repro/commit but not through dvc run/add/import-url commands. +# dvc repro/commit but not through dvc run/add/import-url/get-url commands. meta: # Special key to contain arbitary user data name: John @@ -80,4 +80,5 @@ meta values are preserved between multiple executions of `dvc repro` and `dvc commit` commands. > Note that comments and meta values are not preserved when a DVC-file is -> overwritten with the `dvc run`,`dvc add`,`dvc import-url` commands. +> overwritten with the `dvc run`,`dvc add`,`dvc import-url`, and `dvc get-url` +> commands. From 44e9c95ceecbe6c49399b3d5709efeaad6f0511d Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Jul 2019 17:33:27 -0600 Subject: [PATCH 20/45] get-url: remove S3 write ops and permisions from ref. Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-257716717 --- static/docs/commands-reference/get-url.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 5f1ce6ff74..c18ebd1a88 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -103,16 +103,11 @@ following API methods are performed: - `list_objects_v2`, `list_objects` - `head_object` - `download_file` -- `upload_file` -- `delete_object` -- `copy` So, make sure you have the following permissions enabled: -- s3:ListBucket -- s3:GetObject -- s3:PutObject -- s3:DeleteObject +- `s3:ListBucket` +- `s3:GetObject` From 10a52f3cccc98b998e87eac228f741e4165679cd Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 8 Jul 2019 17:41:34 -0600 Subject: [PATCH 21/45] term: revive "import stage" Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-257717782 --- static/docs/commands-reference/import-url.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index b33063dec3..5469156464 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -78,7 +78,7 @@ Instead of `dvc import-url`: $ dvc import-url https://example.com/path/to/data.csv data.csv ``` -It is possible to instead use `dvc run`: +It is possible to instead use `dvc run`, for example (HTTP URL): ```dvc $ dvc run -d https://example.com/path/to/data.csv \ @@ -86,14 +86,14 @@ $ dvc run -d https://example.com/path/to/data.csv \ wget https://example.com/path/to/data.csv -O data.csv ``` -Both methods generate a stage file (DVC-file) with an external dependency, and -they produce equivalent results. The `dvc import-url` command saves the user -from having to manually copy files from each of the remote storage schemes, and -from having to install CLI tools for each service. +Both methods generate an equivalent stage file (DVC-file) with an external +dependency. The `dvc import-url` command saves the user from having to manually +copy files from each of the remote storage schemes, and from having to install +CLI tools for each service. When DVC inspects a DVC-file, its dependencies will be checked to see if any have changed. A changed dependency will appear in the `dvc status` report, -indicating the need to reproduce this imported stage. When DVC inspects an +indicating the need to reproduce this import stage. When DVC inspects an external dependency, it uses a method appropriate to that dependency to test its current status. From 23ca5db906fb7fba3c59cf2004a174a0b54653c7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Jul 2019 03:22:31 -0600 Subject: [PATCH 22/45] guide: remove unnecessary sentence from share-data Per https://github.com/iterative/dvc.org/pull/472/files#r301482051 --- static/docs/get-started/share-data.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/static/docs/get-started/share-data.md b/static/docs/get-started/share-data.md index 88f864ba74..1e9d5bcc48 100644 --- a/static/docs/get-started/share-data.md +++ b/static/docs/get-started/share-data.md @@ -16,11 +16,9 @@ Usually, you run it along with `git commit` and `git push` to save changed [DVC-files](/doc/user-guide/dvc-file-format) to Git. The `dvc push` command allows one to upload data to remote storage. It doesn't -save any changes in the code or DVC-files. Those should be saved by using +save any changes in the code or DVC-files. Those should be saved by using `git commit` and `git push`. -See `dvc push` for more details and options for this command. - > \*As noted in the DVC [configuration](/doc/get-started/configure) chapter, we > are using a **local remote** in this guide for educational purposes. From 880736d1b81bfab9b0e56d35b1d4f5be6907a7b7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Jul 2019 11:53:34 -0600 Subject: [PATCH 23/45] status: rewrap usage code block Per https://github.com/iterative/dvc.org/pull/416#pullrequestreview-259210248 --- static/docs/commands-reference/status.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/static/docs/commands-reference/status.md b/static/docs/commands-reference/status.md index 907d2ff982..75adcadd1e 100644 --- a/static/docs/commands-reference/status.md +++ b/static/docs/commands-reference/status.md @@ -8,8 +8,7 @@ cache and remote cache. ```usage usage: dvc status [-h] [-v] [-j JOBS] [--show-checksums] [-q] [-c] - [-r REMOTE] [-a] [-T] [-d] - [targets [targets ...]] + [-r REMOTE] [-a] [-T] [-d] [targets [targets ...]] positional arguments: targets Limit command scope to these DVC-files. Using -R, From 3fb8da11818eda54d49cdb2b4e2971510b18f0ab Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Jul 2019 12:01:51 -0600 Subject: [PATCH 24/45] cases: mention directories and `dvc run` in data-and-model-files-versioning Related to https://github.com/iterative/dvc.org/pull/431#issuecomment-509740379 --- .../data-and-model-files-versioning.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/static/docs/use-cases/data-and-model-files-versioning.md b/static/docs/use-cases/data-and-model-files-versioning.md index 47cfb0b5d7..f820b78a95 100644 --- a/static/docs/use-cases/data-and-model-files-versioning.md +++ b/static/docs/use-cases/data-and-model-files-versioning.md @@ -5,13 +5,14 @@ > along the [versioning](/doc/get-started/example-versioning) get started > example. -DVC allows storing and versioning source data files, ML models, intermediate -results with Git, without checking the file contents into Git. It is useful when -dealing with files that are too large for Git to handle. DVC stores information -about your data file in a special [DVC-file](/doc/user-guide/dvc-file-format), -that has a description of a file that can be used for versioning. DVC supports -various types of remote locations for your data files and allows you to easily -store and share your data alongside your code. +DVC allows storing and versioning source data files and directories, ML models, +intermediate results with Git, without checking the file contents into Git. It +is useful when dealing with files that are too large for Git to handle. DVC +stores information about your data file in a special +[DVC-file](/doc/user-guide/dvc-file-format), that has a description of a file +that can be used for versioning. DVC supports various types of remote locations +for your data files and allows you to easily store and share your data alongside +your code. ![](/static/img/model-versioning-diagram.png) @@ -54,13 +55,15 @@ $ git status $ git commit -m "Initialize dvc" ``` -Start tracking images and models with DVC: +Start tracking images and models with `dvc add`: ```dvc $ dvc add images $ dvc add model.pkl ``` +> Refer also to `dvc run` for more advanced ways to version data. + Commit your changes: ```dvc From b61b26db3815f75ee2e5bb55beb1439f62abb422 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 9 Jul 2019 13:15:12 -0600 Subject: [PATCH 25/45] cmd ref: First version of `import` and `get`, with updated `-url` counterparts For #385 --- src/Documentation/sidebar.json | 4 ++ static/docs/commands-reference/get-url.md | 21 ++++--- static/docs/commands-reference/get.md | 45 ++++++++++++++ static/docs/commands-reference/import-url.md | 21 +++---- static/docs/commands-reference/import.md | 62 ++++++++++++++++++++ 5 files changed, 134 insertions(+), 19 deletions(-) create mode 100644 static/docs/commands-reference/get.md create mode 100644 static/docs/commands-reference/import.md diff --git a/src/Documentation/sidebar.json b/src/Documentation/sidebar.json index 27dd74fd5a..fc2d96f866 100644 --- a/src/Documentation/sidebar.json +++ b/src/Documentation/sidebar.json @@ -93,8 +93,10 @@ "diff.md", "fetch.md", "get-url.md", + "get.md", "gc.md", "import-url.md", + "import.md", "init.md", "install.md", "lock.md", @@ -137,8 +139,10 @@ "diff.md": "diff", "fetch.md": "fetch", "get-url.md": "get-url", + "get.md": "get", "gc.md": "gc", "import-url.md": "import-url", + "import.md": "import", "init.md": "init", "install.md": "install", "lock.md": "lock", diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index c18ebd1a88..5a7fd6959d 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -18,18 +18,15 @@ positional arguments: ## Description -> Like `dvc init`, this is one of the few commands that doesn't require an -> existing DVC project to run. - In some cases it's convenient to get a data file or directory from a remote location. The `dvc get-url` command helps the user do so. The `url` argument should provide the location of the data to be downloaded, while `out` can be used to specify the (path and) file name desired for the downloaded data file or directory. -It's important to note that this command does not require an initialized -repository to work in. It's a single-purpose command that can be used out of the -box after installing DVC. +> Like `dvc init`, this is one of the few commands that doesn't require an +> existing DVC project to run. It's a single-purpose command that can be used +> out of the box after installing DVC. DVC supports several types of (local or) remote locations (protocols): @@ -50,7 +47,7 @@ DVC supports several types of (local or) remote locations (protocols): Another way to understand the `dvc get-url` command is as a tool for downloading data files. -> See `dvc get` to download data from other DVC repositories (e.g. GitHub URLs). +> See `dvc get` to download data from other DVC repositories (e.g. Github URLs). On GNU/Linux systems for example, instead of `dvc get-url` with HTTP(S) it's possible to instead use: @@ -98,16 +95,22 @@ DVC will be using default AWS credentials file to access S3. To override some of these settings, you could the options described in `dvc remote modify`. We use `boto3` library to set up a client and communicate with AWS S3. The -following API methods are performed: +following API methods may be performed: - `list_objects_v2`, `list_objects` - `head_object` - `download_file` +- `upload_file` +- `delete_object` +- `copy` -So, make sure you have the following permissions enabled: +So make sure you have the following permissions enabled to enable all the above +operations: - `s3:ListBucket` - `s3:GetObject` +- `s3:PutObject` +- `s3:DeleteObject` diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md new file mode 100644 index 0000000000..a44b3d321c --- /dev/null +++ b/static/docs/commands-reference/get.md @@ -0,0 +1,45 @@ +# get + +Download or copy file or directory from another DVC repository into the local +file system. + +> Unlike `dvc import`, this command does not track the downloaded data file(s) +> (does not create a DVC-file). + +## Synopsis + +```usage +usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path + +positional arguments: + url DVC repository URL to download data from. + path Path to data within DVC repository. +``` + +## Description + +In some cases it's convenient to get a data artifact from another +DVC repository. The `dvc get` command helps the user do so. The `url` argument +should provide the external DVC project's Git repository URL, while `path` is +used to specify the path to the data to be downloaded within the repo. + + + +> Like `dvc init`, this is one of the few commands that doesn't require an +> existing DVC project to run. It's a single-purpose command that can be used +> out of the box after installing DVC. + + + + + +> See `dvc get-url` to download data from other supported URLs. + +## Options + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 5469156464..caddfe6e10 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -19,26 +19,27 @@ positional arguments: ## Description -In some cases it's convenient to add a data file or directory to the workspace -such that it will be automatically updated when the data source is updated. -Examples: +In some cases it's convenient to add a data file or directory from a remote +location into the workspace, such that it will be automatically updated when the +data source is updated. Examples: - A remote system may produce occasional data files that are used in other projects. - A batch process running regularly updates a data file to import. - A shared dataset on a remote storage that is managed and updated outside DVC. -DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to an -external data location, see +DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to data in +an external location, see [External Dependencies](/doc/user-guide/external-dependencies). In such a DVC-file, the `deps` section specifies a remote URL, and the `outs` section -lists the corresponding local path in the workspace. It records enough data from -the remote file or directory to enable DVC to efficiently check it to determine -if the local copy is out of date. DVC uses this remote URL to download the data -to the workspace initially, and to re-download it upon changes. +contains the corresponding local path in the workspace. It records enough data +from the external file or directory to enable DVC to efficiently check it to +determine whether the local copy is out of date. DVC uses the remote URL to +download the data to the workspace initially, and to re-download it when +changed. > See `dvc import` to download and tack data from other DVC repositories (e.g. -> GitHub URLs). +> Github URLs). The `dvc import-url` command helps the user create such an external data dependency. The `url` argument should provide the location of the data to be diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md new file mode 100644 index 0000000000..0433c6231b --- /dev/null +++ b/static/docs/commands-reference/import.md @@ -0,0 +1,62 @@ +# import + +Download or copy file or directory from another DVC repository into the +workspace, and track changes in the remote source with DVC. Creates +a DVC-file. + +> See also `dvc get` which corresponds to the first step this command performs +> (just download the data). + +## Synopsis + +```usage +usage: dvc import [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path + +positional arguments: + url DVC repository URL. + path Path to data within DVC repository. +``` + +## Description + +In some cases it's convenient to add a data artifact from another +DVC repository into the workspace, such that it will be automatically updated +when the data source is updated. + +DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to data in +an external DVC repository (hosted on a Git server). In such a DVC-file, the +`deps` section specifies the DVC repo and data path, and the `outs` section +contains the corresponding local path in the workspace. It records enough data +from the external file or directory to enable DVC to efficiently check it to +determine whether the local copy is out of date. DVC uses the DVC repo and data +path to download the data to the workspace initially, and to re-download it when +changed. + +> See `dvc import-url` to download and tack data from other supported URLs. + +The `dvc import` command helps the user create such an external data dependency. +The `url` argument should provide the external DVC project's Git repository URL, +while `path` is used to specify the path to the data to be imported within the +repo. An import stage (DVC-file) is then created with the name of the data +artifact, similar to having used `dvc run` to generate the same output as done +in the external DVC project. + + + + + +## Options + +- `-o`, `--out` - specify a location in the workspace to place the imported data + in, as a path to the desired directory. The default value (when this option + isn't used) is the current working directory (`.`). + +- `--rev` - specific Git revision of the DVC repository to import the data from. + `HEAD` by default. + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. From b94c79919fcd714017698e0885ad36b3cfce9e48 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 22:01:37 -0600 Subject: [PATCH 26/45] Simplify notes about single-use commands. Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259774230 --- static/docs/commands-reference/get-url.md | 5 ++--- static/docs/commands-reference/get.md | 5 ++--- static/docs/commands-reference/version.md | 3 +-- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 5a7fd6959d..050942208d 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -24,9 +24,8 @@ should provide the location of the data to be downloaded, while `out` can be used to specify the (path and) file name desired for the downloaded data file or directory. -> Like `dvc init`, this is one of the few commands that doesn't require an -> existing DVC project to run. It's a single-purpose command that can be used -> out of the box after installing DVC. +> This command doesn't require an existing DVC project to run in. It's a +> single-purpose command that can be used out of the box after installing DVC. DVC supports several types of (local or) remote locations (protocols): diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index a44b3d321c..d8e1e6d6de 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -25,9 +25,8 @@ used to specify the path to the data to be downloaded within the repo. -> Like `dvc init`, this is one of the few commands that doesn't require an -> existing DVC project to run. It's a single-purpose command that can be used -> out of the box after installing DVC. +> This command doesn't require an existing DVC project to run in. It's a +> single-purpose command that can be used out of the box after installing DVC. diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index b8d4e35662..0e858fa3b8 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -11,8 +11,7 @@ usage: dvc version [-h] [-q | -v] ## Description -> Like `dvc init`, this is one of the few commands that doesn't require an -> existing DVC project to run. +> This command doesn't require an existing DVC project to run in. Running the command `dvc version` outputs the following information about the system/environment: From 88dddafba1fa7a63448c026111ff337ed3c9c033 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 22:13:52 -0600 Subject: [PATCH 27/45] cmd ref: Add "Git server e.g. Github" note to `import` and `get` summaries Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259776278 --- static/docs/commands-reference/get.md | 6 ++++-- static/docs/commands-reference/import.md | 8 +++++--- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index d8e1e6d6de..69ef87e993 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -1,7 +1,7 @@ # get -Download or copy file or directory from another DVC repository into the local -file system. +Download or copy file or directory from another DVC repository (on a git server +such as Github) into the local file system. > Unlike `dvc import`, this command does not track the downloaded data file(s) > (does not create a DVC-file). @@ -42,3 +42,5 @@ used to specify the path to the data to be downloaded within the repo. problems arise, otherwise 1. - `-v`, `--verbose` - displays detailed tracing information. + + diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index 0433c6231b..8ed7fec57a 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -1,8 +1,8 @@ # import -Download or copy file or directory from another DVC repository into the -workspace, and track changes in the remote source with DVC. Creates -a DVC-file. +Download or copy file or directory from another DVC repository (on a git server +such as Github) into the workspace, and track changes in the remote +source with DVC. Creates a DVC-file. > See also `dvc get` which corresponds to the first step this command performs > (just download the data). @@ -60,3 +60,5 @@ in the external DVC project. problems arise, otherwise 1. - `-v`, `--verbose` - displays detailed tracing information. + + From ae0ed5c99e677235320cb00b572f44019e96da0f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 22:23:09 -0600 Subject: [PATCH 28/45] cmd ref: updated `url` arg desc in `import` and `get` Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259776737 --- static/docs/commands-reference/get.md | 6 +++--- static/docs/commands-reference/import.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index 69ef87e993..e01fbbba1e 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -12,7 +12,7 @@ such as Github) into the local file system. usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path positional arguments: - url DVC repository URL to download data from. + url DVC repository URL (Git server link). path Path to data within DVC repository. ``` @@ -28,9 +28,9 @@ used to specify the path to the data to be downloaded within the repo. > This command doesn't require an existing DVC project to run in. It's a > single-purpose command that can be used out of the box after installing DVC. - + - + > See `dvc get-url` to download data from other supported URLs. diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index 8ed7fec57a..3e63d9ff62 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -13,7 +13,7 @@ source with DVC. Creates a DVC-file. usage: dvc import [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path positional arguments: - url DVC repository URL. + url DVC repository URL (Git server link). path Path to data within DVC repository. ``` @@ -41,9 +41,9 @@ repo. An import stage (DVC-file) is then created with the name of the data artifact, similar to having used `dvc run` to generate the same output as done in the external DVC project. - + - + ## Options From 65980d9592a21d382476a0f3bafdba7b3f1dd04e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 22:29:27 -0600 Subject: [PATCH 29/45] cmd ref: add note abot http and ssh protocols to `get` and `import` Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259777015 --- static/docs/commands-reference/get.md | 7 +++---- static/docs/commands-reference/import.md | 13 ++++++------- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index e01fbbba1e..7ef7f56391 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -20,16 +20,15 @@ positional arguments: In some cases it's convenient to get a data artifact from another DVC repository. The `dvc get` command helps the user do so. The `url` argument -should provide the external DVC project's Git repository URL, while `path` is -used to specify the path to the data to be downloaded within the repo. +should provide the external DVC project's Git repository URL (both HTTP and SSH +protocols supported, e.g. `[user@]server:project.git`), while `path` is used to +specify the path to the data to be downloaded within the repo. > This command doesn't require an existing DVC project to run in. It's a > single-purpose command that can be used out of the box after installing DVC. - - > See `dvc get-url` to download data from other supported URLs. diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index 3e63d9ff62..fce2f787f6 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -35,13 +35,12 @@ changed. > See `dvc import-url` to download and tack data from other supported URLs. The `dvc import` command helps the user create such an external data dependency. -The `url` argument should provide the external DVC project's Git repository URL, -while `path` is used to specify the path to the data to be imported within the -repo. An import stage (DVC-file) is then created with the name of the data -artifact, similar to having used `dvc run` to generate the same output as done -in the external DVC project. - - +The `url` argument should provide the external DVC project's Git repository URL +(both HTTP and SSH protocols supported, e.g. `[user@]server:project.git`), while +`path` is used to specify the path to the data to be imported within the repo. +An import stage (DVC-file) is then created with the name of the data artifact, +similar to having used `dvc run` to generate the same output as done in the +external DVC project. From aa4067b7c010b783bb069033021d7c3d58571223 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 23:01:21 -0600 Subject: [PATCH 30/45] init: clarify "local" (repo) term Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259822616 --- static/docs/commands-reference/init.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/static/docs/commands-reference/init.md b/static/docs/commands-reference/init.md index 8f3adaaa92..fea009df9a 100644 --- a/static/docs/commands-reference/init.md +++ b/static/docs/commands-reference/init.md @@ -1,6 +1,7 @@ # init -This command initializes a DVC environment in a local Git repository. +This command initializes a DVC project on a Git repository where you're working +from. ## Synopsis From 67807540fab3c7c27299f271fd0b70998cbfba90 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 23:08:26 -0600 Subject: [PATCH 31/45] remote: fix grammar in `--local` option of `modify` and `remove` Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259824548 --- static/docs/commands-reference/remote_list.md | 2 +- static/docs/commands-reference/remote_remove.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/static/docs/commands-reference/remote_list.md b/static/docs/commands-reference/remote_list.md index e20d1d4592..96f0d491b0 100644 --- a/static/docs/commands-reference/remote_list.md +++ b/static/docs/commands-reference/remote_list.md @@ -28,7 +28,7 @@ Including names and URLs. - `--local` - list remotes specified in the [local](/doc/user-guide/dvc-files-and-directories) configuration file - (`.dvc/config.local`). Local config files stores private configuration that + (`.dvc/config.local`). Local config files store private configuration that should not be tracked by SCM (Git). ## Examples diff --git a/static/docs/commands-reference/remote_remove.md b/static/docs/commands-reference/remote_remove.md index 443e0a4644..a36d6a584f 100644 --- a/static/docs/commands-reference/remote_remove.md +++ b/static/docs/commands-reference/remote_remove.md @@ -36,7 +36,7 @@ possible to edit config files manually. - `--local` - remove remote specified in the [local](/doc/user-guide/dvc-files-and-directories) configuration file - (`.dvc/config.local`). Local config files stores private configuration that + (`.dvc/config.local`). Local config files store private configuration that should not be tracked by SCM (Git). ## Examples From 7f33a3a5e19605c9d1cb53e3d5b7233bf3fd14b5 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 23:25:04 -0600 Subject: [PATCH 32/45] term: review use of "config(uration) file" and link to /doc/commands-reference/config Related to https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259824548 --- static/docs/commands-reference/cache_dir.md | 11 +++++------ static/docs/commands-reference/config.md | 2 +- static/docs/commands-reference/remote.md | 4 ++-- static/docs/commands-reference/remote_add.md | 15 +++++++-------- .../docs/commands-reference/remote_default.md | 19 ++++++++----------- static/docs/commands-reference/remote_list.md | 8 ++++---- .../docs/commands-reference/remote_modify.md | 15 +++++++-------- .../docs/commands-reference/remote_remove.md | 12 ++++++------ 8 files changed, 40 insertions(+), 46 deletions(-) diff --git a/static/docs/commands-reference/cache_dir.md b/static/docs/commands-reference/cache_dir.md index ab70d31ddf..95275daee7 100644 --- a/static/docs/commands-reference/cache_dir.md +++ b/static/docs/commands-reference/cache_dir.md @@ -29,12 +29,11 @@ file location**. They are required in the latter form for the config file. - `--system` - modify a system config file (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - modify a local - [config file](/doc/user-guide/dvc-files-and-directories) instead of - `.dvc/config`. It is located in `.dvc/config.local` and is Git-ignored. This - is useful when you need to specify private config options in your config that - you don't want to track and share through Git (credentials, private locations, - etc). +- `--local` - modify a local [config file](/doc/commands-reference/config) + instead of `.dvc/config`. It is located in `.dvc/config.local` and is + Git-ignored. This is useful when you need to specify private config options in + your config that you don't want to track and share through Git (credentials, + private locations, etc). - `-u`, `--unset` - remove the `cache.dir` config option from the config file. Don't provide a `value` when using this flag. diff --git a/static/docs/commands-reference/config.md b/static/docs/commands-reference/config.md index f4ed1bc804..a6e00b2017 100644 --- a/static/docs/commands-reference/config.md +++ b/static/docs/commands-reference/config.md @@ -19,7 +19,7 @@ You can query/set/replace/unset DVC configuration options with this command. It takes a config option `name` (a section and a key, separated by a dot) and its `value` (any valid alpha-numeric string generally). -This command reads and overwrites the DVC config file `.dvc/config`. If +This command reads and overwrites the DVC configuration file `.dvc/config`. If `--local` option is specified, `.dvc/config.local` is modified instead. If the config option `value` is not provided and `--unset` option is not used, diff --git a/static/docs/commands-reference/remote.md b/static/docs/commands-reference/remote.md index 377f305447..1e2ec14392 100644 --- a/static/docs/commands-reference/remote.md +++ b/static/docs/commands-reference/remote.md @@ -48,8 +48,8 @@ repository), which enables basic DVC usage scenarios out of the box. [list](/doc/commands-reference/remote-list), [modify](/doc/commands-reference/remote-modify), and [remove](/doc/commands-reference/remote-remove) commands read or modify DVC -[config files](/doc/user-guide/dvc-files-and-directories). Alternatively, -`dvc config` can be used or these files could be edited manually. +[config files](/doc/commands-reference/config). Alternatively, `dvc config` can +be used or these files could be edited manually. For the typical process to share the project via remote, see [Share Data And Model Files](/doc/use-cases/share-data-and-model-files). diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 3b17b557c3..f87efa448d 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -38,8 +38,8 @@ though and will rely on default access settings. > to support AWS S3 storage. This command creates a section in the DVC -[config file](/doc/user-guide/dvc-files-and-directories) and optionally assigns -a default remote in the core section if the `--default` option is used: +[config file](/doc/commands-reference/config) and optionally assigns a default +remote in the core section if the `--default` option is used: ```ini ['remote "myremote"'] @@ -64,12 +64,11 @@ Use `dvc config` to unset/change the default remote as so: - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - modify a local - [config file](/doc/user-guide/dvc-files-and-directories) instead of - `.dvc/config`. It is located in `.dvc/config.local` and is Git-ignored. This - is useful when you need to specify private config options in your config that - you don't want to track and share through Git (credentials, private locations, - etc). +- `--local` - modify a local [config file](/doc/commands-reference/config) + instead of `.dvc/config`. It is located in `.dvc/config.local` and is + Git-ignored. This is useful when you need to specify private config options in + your config that you don't want to track and share through Git (credentials, + private locations, etc). - `-d`, `-default` - commands like `dvc pull`, `dvc push`, `dvc fetch` will be using this remote by default to save or retrieve data files unless `-r` option diff --git a/static/docs/commands-reference/remote_default.md b/static/docs/commands-reference/remote_default.md index 2dd2054741..0d9f0c4dc4 100644 --- a/static/docs/commands-reference/remote_default.md +++ b/static/docs/commands-reference/remote_default.md @@ -31,7 +31,7 @@ $ dvc remote default myremote ``` This command assigns the default remote in the core section of the DVC -[config file](/doc/user-guide/dvc-files-and-directories). +[config file](/doc/commands-reference/config). ```ini [core] @@ -42,10 +42,8 @@ For the commands which take a `--remote` option (`dvc pull`, `dvc push`, `dvc status`, `dvc gc`, `dvc fetch`), default remote is used if that option is not specified. -You can also use [`dvc config`](/doc/user-guide/dvc-files-and-directories), -[`dvc remote add`](/doc/commands-reference/remote-add) and -[`dvc remote modify`](/doc/commands-reference/remote-modify) commands to -set/unset/change the default remote configurations. +You can also use `dvc config`, `dvc remote add` and `dvc remote modify` commands +to set/unset/change the default remote configurations. ## Options @@ -57,12 +55,11 @@ set/unset/change the default remote configurations. - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - modify a local - [config file](/doc/user-guide/dvc-files-and-directories) instead of - `.dvc/config`. It is located in `.dvc/config.local` and is Git-ignored. This - is useful when you need to specify private config options in your config that - you don't want to track and share through Git (credentials, private locations, - etc). +- `--local` - modify a local [config file](/doc/commands-reference/config) + instead of `.dvc/config`. It is located in `.dvc/config.local` and is + Git-ignored. This is useful when you need to specify private config options in + your config that you don't want to track and share through Git (credentials, + private locations, etc). - `-h`, `--help` - prints the usage/help message and exit. diff --git a/static/docs/commands-reference/remote_list.md b/static/docs/commands-reference/remote_list.md index 96f0d491b0..2d863bacee 100644 --- a/static/docs/commands-reference/remote_list.md +++ b/static/docs/commands-reference/remote_list.md @@ -26,10 +26,10 @@ Including names and URLs. - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - list remotes specified in the - [local](/doc/user-guide/dvc-files-and-directories) configuration file - (`.dvc/config.local`). Local config files store private configuration that - should not be tracked by SCM (Git). +- `--local` - list remotes specified in the local + [config file](/doc/commands-reference/config) (`.dvc/config.local`). Local + config files store private configuration that should not be tracked by SCM + (Git). ## Examples diff --git a/static/docs/commands-reference/remote_modify.md b/static/docs/commands-reference/remote_modify.md index caff5a6d70..027c1c51c4 100644 --- a/static/docs/commands-reference/remote_modify.md +++ b/static/docs/commands-reference/remote_modify.md @@ -31,8 +31,8 @@ specific. See below examples and a list of per remote type - AWS S3, Google cloud, Azure, SSH, ALiyun OSS, and others. This command modifies a `remote` section in the DVC project's -[config file](/doc/user-guide/dvc-files-and-directories). Alternatively, -`dvc config` or manual editing could be used to change the configuration. +[config file](/doc/commands-reference/config). Alternatively, `dvc config` or +manual editing could be used to change the configuration. ## Options @@ -45,12 +45,11 @@ This command modifies a `remote` section in the DVC project's - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - modify a local - [config file](/doc/user-guide/dvc-files-and-directories) instead of - `.dvc/config`. It is located in `.dvc/config.local` and is Git-ignored. This - is useful when you need to specify private config options in your config that - you don't want to track and share through Git (credentials, private locations, - etc). +- `--local` - modify a local [config file](/doc/commands-reference/config) + instead of `.dvc/config`. It is located in `.dvc/config.local` and is + Git-ignored. This is useful when you need to specify private config options in + your config that you don't want to track and share through Git (credentials, + private locations, etc). ## Examples diff --git a/static/docs/commands-reference/remote_remove.md b/static/docs/commands-reference/remote_remove.md index a36d6a584f..75b1adcf31 100644 --- a/static/docs/commands-reference/remote_remove.md +++ b/static/docs/commands-reference/remote_remove.md @@ -23,8 +23,8 @@ positional arguments: Remote `name` is required. This command removes a section in the DVC -[config file](/doc/user-guide/dvc-files-and-directories). Alternatively, it is -possible to edit config files manually. +[config file](/doc/commands-reference/config). Alternatively, it is possible to +edit config files manually. ## Options @@ -34,10 +34,10 @@ possible to edit config files manually. - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - remove remote specified in the - [local](/doc/user-guide/dvc-files-and-directories) configuration file - (`.dvc/config.local`). Local config files store private configuration that - should not be tracked by SCM (Git). +- `--local` - remove remote specified in the local + [config file](/doc/commands-reference/config) (`.dvc/config.local`). Local + config files store private configuration that should not be tracked by SCM + (Git). ## Examples From d5b38f4f3b01b24526a29ba666c52a49a87ecaf2 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 10 Jul 2019 23:28:49 -0600 Subject: [PATCH 33/45] remote: std `--local` opt desc Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259824548 --- static/docs/commands-reference/remote_list.md | 6 ++---- static/docs/commands-reference/remote_remove.md | 7 +++---- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/static/docs/commands-reference/remote_list.md b/static/docs/commands-reference/remote_list.md index 2d863bacee..87e890e823 100644 --- a/static/docs/commands-reference/remote_list.md +++ b/static/docs/commands-reference/remote_list.md @@ -26,10 +26,8 @@ Including names and URLs. - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - list remotes specified in the local - [config file](/doc/commands-reference/config) (`.dvc/config.local`). Local - config files store private configuration that should not be tracked by SCM - (Git). +- `--local` - read a local [config file](/doc/commands-reference/config) instead + of `.dvc/config`. It is located in `.dvc/config.local` and is Git-ignored. ## Examples diff --git a/static/docs/commands-reference/remote_remove.md b/static/docs/commands-reference/remote_remove.md index 75b1adcf31..2f759bd48a 100644 --- a/static/docs/commands-reference/remote_remove.md +++ b/static/docs/commands-reference/remote_remove.md @@ -34,10 +34,9 @@ edit config files manually. - `--system` - save remote configuration to the system config (e.g. `/etc/dvc.config`) instead of `.dvc/config`. -- `--local` - remove remote specified in the local - [config file](/doc/commands-reference/config) (`.dvc/config.local`). Local - config files store private configuration that should not be tracked by SCM - (Git). +- `--local` - modify a local [config file](/doc/commands-reference/config) + instead of `.dvc/config`. It is located in `.dvc/config.local` and is + Git-ignored. ## Examples From ce2dc0c6bd3510913367ccdfbecbab77d0abbe6f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 11 Jul 2019 18:22:36 -0400 Subject: [PATCH 34/45] cmd ref: remove outdated comment from `get` and `import` --- static/docs/commands-reference/get.md | 2 -- static/docs/commands-reference/import.md | 2 -- 2 files changed, 4 deletions(-) diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index 7ef7f56391..f7e0cb6aa6 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -29,8 +29,6 @@ specify the path to the data to be downloaded within the repo. > This command doesn't require an existing DVC project to run in. It's a > single-purpose command that can be used out of the box after installing DVC. - - > See `dvc get-url` to download data from other supported URLs. ## Options diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index fce2f787f6..2e0fd93aca 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -42,8 +42,6 @@ An import stage (DVC-file) is then created with the name of the data artifact, similar to having used `dvc run` to generate the same output as done in the external DVC project. - - ## Options - `-o`, `--out` - specify a location in the workspace to place the imported data From dab6a3602f9d3c6d28742bd268f148bec8a45eb1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 11 Jul 2019 18:24:59 -0400 Subject: [PATCH 35/45] cmd ref: make note about single-use commands into regular paragraphs Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259774230 --- static/docs/commands-reference/get-url.md | 4 ++-- static/docs/commands-reference/get.md | 4 ++-- static/docs/commands-reference/version.md | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 050942208d..7940263691 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -24,8 +24,8 @@ should provide the location of the data to be downloaded, while `out` can be used to specify the (path and) file name desired for the downloaded data file or directory. -> This command doesn't require an existing DVC project to run in. It's a -> single-purpose command that can be used out of the box after installing DVC. +Note that this command doesn't require an existing DVC project to run in. It's a +single-purpose command that can be used out of the box after installing DVC. DVC supports several types of (local or) remote locations (protocols): diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index f7e0cb6aa6..45a813a9ad 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -26,8 +26,8 @@ specify the path to the data to be downloaded within the repo. -> This command doesn't require an existing DVC project to run in. It's a -> single-purpose command that can be used out of the box after installing DVC. +Note that this command doesn't require an existing DVC project to run in. It's a +single-purpose command that can be used out of the box after installing DVC. > See `dvc get-url` to download data from other supported URLs. diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index 0e858fa3b8..ab1e822937 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -11,7 +11,7 @@ usage: dvc version [-h] [-q | -v] ## Description -> This command doesn't require an existing DVC project to run in. +Note that this command doesn't require an existing DVC project to run in. Running the command `dvc version` outputs the following information about the system/environment: From 4f9308f0852b49caeea54a7d354da6d201a72a2e Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 11 Jul 2019 18:29:26 -0400 Subject: [PATCH 36/45] cmd ref: udpate `url` arg desc in `get` and `import` (again) Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259776737 --- static/docs/commands-reference/get.md | 4 ++-- static/docs/commands-reference/import.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index 45a813a9ad..c6df596f90 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -12,8 +12,8 @@ such as Github) into the local file system. usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path positional arguments: - url DVC repository URL (Git server link). - path Path to data within DVC repository. + url URL of Git repository with DVC project to download from. + path Path to data within DVC repository. ``` ## Description diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index 2e0fd93aca..291f8f0504 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -13,8 +13,8 @@ source with DVC. Creates a DVC-file. usage: dvc import [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path positional arguments: - url DVC repository URL (Git server link). - path Path to data within DVC repository. + url URL of Git repository with DVC project to download from. + path Path to data within DVC repository. ``` ## Description From 92a52380ed3badd7a8fd460f78e75d1e6eb53813 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Thu, 11 Jul 2019 19:11:13 -0400 Subject: [PATCH 37/45] version: remove unnecessary note Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259825371 --- static/docs/commands-reference/version.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/static/docs/commands-reference/version.md b/static/docs/commands-reference/version.md index ab1e822937..c6fb98b18c 100644 --- a/static/docs/commands-reference/version.md +++ b/static/docs/commands-reference/version.md @@ -11,8 +11,6 @@ usage: dvc version [-h] [-q | -v] ## Description -Note that this command doesn't require an existing DVC project to run in. - Running the command `dvc version` outputs the following information about the system/environment: From fd7f42818bfe4c5e0ccd605b70a742fa91ff9baf Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 12 Jul 2019 12:53:39 -0400 Subject: [PATCH 38/45] s3: update info on boto3 methods and permissions required... in `imoprt-url`, `get-url` as well as `remote` and `remote add` command refs. Updates also related guides (install and config). --- static/docs/commands-reference/get-url.md | 37 +++++++++----------- static/docs/commands-reference/import-url.md | 32 +++++++++++++++-- static/docs/commands-reference/remote.md | 13 ++++--- static/docs/commands-reference/remote_add.md | 12 +++---- static/docs/get-started/configure.md | 10 +++--- static/docs/get-started/install.md | 12 +++---- 6 files changed, 72 insertions(+), 44 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 7940263691..ddb0a63200 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -38,10 +38,15 @@ DVC supports several types of (local or) remote locations (protocols): | `hdfs` | HDFS | `hdfs://user@example.com/path/to/data.csv` | | `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | +> Depending on the remote locations type you plan to download data from you +> might need to specify one of the optional dependencies: `s3`, `gs`, `ssh` (or +> `all_remotes` to include them all) when +> [installing DVC](/doc/get-started/install) with `pip`. + > `remote://myremote/path/to/file` notation just means that a DVC -> [remote](/doc/commands-reference/remote) `myremote` is defined, and when DVC -> is running it internally expands this URL into a regular S3, SSH, GS, etc URL -> by appending `/path/to/file` to the `myremote`'s configured base path. +> [remote](/doc/commands-reference/remote) `myremote` is defined and when DVC is +> running. DVC automatically expands this URL into a regular S3, SSH, GS, etc +> URL by appending `/path/to/file` to the `myremote`'s configured base path. Another way to understand the `dvc get-url` command is as a tool for downloading data files. @@ -79,6 +84,8 @@ The above command will copy the `/local/path/to/data` file or directory into +
+ ### Click for AWS S3 example This command will copy an S3 object into the current working directory with the @@ -93,23 +100,13 @@ By default DVC expects your AWS CLI is already DVC will be using default AWS credentials file to access S3. To override some of these settings, you could the options described in `dvc remote modify`. -We use `boto3` library to set up a client and communicate with AWS S3. The -following API methods may be performed: - -- `list_objects_v2`, `list_objects` -- `head_object` -- `download_file` -- `upload_file` -- `delete_object` -- `copy` - -So make sure you have the following permissions enabled to enable all the above -operations: - -- `s3:ListBucket` -- `s3:GetObject` -- `s3:PutObject` -- `s3:DeleteObject` +> We use the `boto3` library to and communicate with AWS S3. The following API +> methods may be performed: +> +> - `head_object` +> - `download_file` +> +> So make sure you have the `s3:GetObject` permission enabled.
diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index caddfe6e10..36e7db80f6 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -58,6 +58,11 @@ DVC supports several types of (local or) remote locations (protocols): | `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | | `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` | +> Depending on the remote locations type you plan to download data from you +> might need to specify one of the optional dependencies: `s3`, `gs`, `ssh` (or +> `all_remotes` to include them all) when +> [installing DVC](/doc/get-started/install) with `pip`. + > In case of HTTP, > [strong ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) > is necessary to track if the specified remote file (URL) changed to download @@ -65,8 +70,8 @@ DVC supports several types of (local or) remote locations (protocols): > `remote://myremote/path/to/file` notation just means that a DVC > [remote](/doc/commands-reference/remote) `myremote` is defined and when DVC is -> running, it internally expands this URL into a regular S3, SSH, GS, etc URL by -> appending `/path/to/file` to the `myremote`'s configured base path. +> running. DVC automatically expands this URL into a regular S3, SSH, GS, etc +> URL by appending `/path/to/file` to the `myremote`'s configured base path. Another way to understand the `dvc import-url` command is as a short-cut for a more verbose `dvc run` command. This is discussed in the @@ -149,6 +154,29 @@ Now, we can install requirements for the project: $ pip install -r requirements.txt ``` +
+ +### Click for AWS S3 example + +This command will copy an S3 object into the current working directory with the +same file name: + +```dvc +$ dvc get-url s3://bucket/path +``` + +Note that the examples use + +> We use the `boto3` library to and communicate with AWS S3. The following API +> methods may be performed: +> +> - `head_object` +> - `download_file` +> +> So make sure you have the `s3:GetObject` permission enabled. + +
+ ## Example: Tracking a remote file diff --git a/static/docs/commands-reference/remote.md b/static/docs/commands-reference/remote.md index 1e2ec14392..25871af80e 100644 --- a/static/docs/commands-reference/remote.md +++ b/static/docs/commands-reference/remote.md @@ -33,11 +33,14 @@ models and re-process data files. It also saves space on your local environment - DVC can [fetch](/doc/commands-reference/fetch) into the local cache only the data you need for a specific branch/commit. -> If you installed DVC via `pip`, and depending on the remote type you plan to -> use you might need to install optional dependencies: `s3`, `gs`, `azure`, -> `ssh`. Or `all_remotes` to include them all. The command should look like -> this: `pip install -U "dvc[s3]"` - it installs `boto3` library along with DVC -> to support AWS S3 storage. +> Depending on the [remote storage](/doc/commands-reference/remote) type you +> plan to use to keep and share your data you might need to specify one of the +> optional dependencies: `s3`, `gs`, `azure`, `ssh`. (Use `all_remotes` to +> include them all.) The command should look like this: `pip install "dvc[s3]"`. +> That particular example will include the `boto3` library along with the DVC +> installation in order to support AWS S3 storage. This is valid for the `pip` +> installation method only. Other ways to install DVC already include support +> for all remotes. Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually `.dvc/cache` directory inside your diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index f87efa448d..7700757fa4 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -136,8 +136,8 @@ By default DVC expects your AWS CLI is already DVC will be using default AWS credentials file to access S3. To override some of these settings, you could the options described in `dvc remote modify`. -We use `boto3` library to set up a client and communicate with AWS S3. The -following API methods are performed: +We use the `boto3` library to communicate with AWS S3. The following API methods +are performed: - `list_objects_v2`, `list_objects` - `head_object` @@ -148,10 +148,10 @@ following API methods are performed: So, make sure you have the following permissions enabled: -- s3:ListBucket -- s3:GetObject -- s3:PutObject -- s3:DeleteObject +- `s3:ListBucket` +- `s3:GetObject` +- `s3:PutObject` +- `s3:DeleteObject` diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 6eb84ce704..7b4878c7f6 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -45,11 +45,11 @@ path. DVC currently supports seven types of remotes: > Depending on the [remote storage](/doc/commands-reference/remote) type you > plan to use to keep and share your data you might need to specify one of the -> optional dependencies: `s3`, `gs`, `azure`, `ssh`. Or `all_remotes` to include -> them all. The command should look like this: `pip install "dvc[s3]"` - it will -> install `boto3` library along with DVC to support AWS S3 storage. This is -> valid for `pip install` option only. Other ways to install DVC already include -> support for all remotes. +> optional dependencies: `s3`, `gs`, `azure`, `ssh` (or `all_remotes` to include +> them all) when installing DVC with `pip`. The command should look like this: +> `pip install "dvc[s3]"`. That particular example will include the `boto3` +> library along with the DVC installation in order to support AWS S3 storage. +> Other methods to install DVC already include support for all remotes. For example, to setup an S3 remote we would use something like (make sure that `mybucket` exists): diff --git a/static/docs/get-started/install.md b/static/docs/get-started/install.md index efe3d6c479..a19e488a6d 100644 --- a/static/docs/get-started/install.md +++ b/static/docs/get-started/install.md @@ -10,12 +10,12 @@ $ pip install dvc ``` > Depending on the [remote storage](/doc/commands-reference/remote) type you -> plan to use to keep and share your data, you might need to specify one of the -> optional dependencies: `s3`, `gs`, `azure`, `ssh`. Or `all_remotes` to include -> them all. The command should look like this: `pip install "dvc[s3]"` - it -> installs the `boto3` library along with DVC to support the AWS S3 storage. -> This is valid for `pip install` option only. Other ways to install DVC already -> include support for all remotes. +> plan to use to keep and share your data you might need to specify one of the +> optional dependencies: `s3`, `gs`, `azure`, `ssh` (or `all_remotes` to include +> them all) when installing DVC with `pip`. The command should look like this: +> `pip install "dvc[s3]"`. That particular example will include the `boto3` +> library along with the DVC installation in order to support AWS S3 storage. +> Other methods to install DVC already include support for all remotes. The easiest option, self-contained binary packages (or Windows installer), are available by using the big "Download" button in the [home page](/). You may also From 13bde23854c15f915a3ed0b85b68b297455ff894 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 12 Jul 2019 13:08:54 -0400 Subject: [PATCH 39/45] init: update with details about using or nto a Git repo for the DVC project Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259822616 --- static/docs/commands-reference/init.md | 34 ++++++++++++++------------ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/static/docs/commands-reference/init.md b/static/docs/commands-reference/init.md index fea009df9a..e6571e5729 100644 --- a/static/docs/commands-reference/init.md +++ b/static/docs/commands-reference/init.md @@ -1,7 +1,9 @@ # init -This command initializes a DVC project on a Git repository where you're working -from. +This command initializes a DVC project on a directory. + +Note that by default the current working directory is expected to contain a Git +repository, unless the `--no-scm` option is used. ## Synopsis @@ -9,10 +11,22 @@ from. usage: dvc init [-h] [-q | -v] [--no-scm] [-f] ``` +## Description + +After DVC initialization, a new directory `.dvc/` will be created with `config` +and `.gitignore` files and `cache` directory. These files and directories are +hidden from the user generally and are not meant to be manipulated directly. + +`.dvc/cache` is one of the most important +[DVC directories](/doc/user-guide/dvc-files-and-directories). It will hold all +the contents of tracked data files. Note that `.dvc/.gitignore` lists this +directory, which means that the cache directory is not under Git control. This +is your local cache and you cannot push it to any Git remote. + ## Options -- `--no-scm` - skip Git specific initializations, `.dvc/.gitignore` will not be - populated and added to Git. +- `--no-scm` - skip Git specific initialization, `.dvc/.gitignore` will not be + written. - `-f`, `--force` - remove `.dvc/` if it exists before initialization. Will remove all local cache. Useful when first `dvc init` got corrupted for some @@ -25,18 +39,6 @@ usage: dvc init [-h] [-q | -v] [--no-scm] [-f] - `-v`, `--verbose` - displays detailed tracing information. -## Description - -After DVC initialization, a new directory `.dvc/` will be created with `config` -and `.gitignore` files and `cache` directory. These files and directories are -hidden from the user generally and are not meant to be manipulated directly. - -`.dvc/cache directory` is one of the most important parts of any DVC -repositories. The directory contains all content of data files. The most -important part about this directory is that `.dvc/.gitignore` file is containing -this directory which means that the cache directory is not under Git control — -this is your local directory and you cannot push it to any Git remote. - ## Examples - Creating a new DVC repository (requires a Git repository): From c38cc8e698582b59196ea63c6e421a6472167593 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 13 Jul 2019 21:41:31 -0400 Subject: [PATCH 40/45] cmd ref: improve desc of `import` and `get` commands, et al Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-259871798 Includes other misc changes. --- static/docs/commands-reference/config.md | 2 +- static/docs/commands-reference/get-url.md | 9 +---- static/docs/commands-reference/get.md | 16 +++++--- static/docs/commands-reference/import-url.md | 24 ++++++------ static/docs/commands-reference/import.md | 39 +++++++++++--------- 5 files changed, 46 insertions(+), 44 deletions(-) diff --git a/static/docs/commands-reference/config.md b/static/docs/commands-reference/config.md index a6e00b2017..c25b227fd0 100644 --- a/static/docs/commands-reference/config.md +++ b/static/docs/commands-reference/config.md @@ -95,7 +95,7 @@ details.) config location results in `.dvc/cache`. > See also helper command `dvc cache dir` to intuitively set this config - > option, properly transforming paths relative to the present working + > option, properly transforming paths relative to the current working > directory into paths relative to the config file location. - `cache.protected` - makes files in the workspace read-only. Possible values diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index ddb0a63200..5ab13cba59 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -27,6 +27,8 @@ directory. Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. +> See `dvc get` to download data from other DVC repositories (e.g. Github URLs). + DVC supports several types of (local or) remote locations (protocols): | Type | Discussion | URL format | @@ -43,16 +45,9 @@ DVC supports several types of (local or) remote locations (protocols): > `all_remotes` to include them all) when > [installing DVC](/doc/get-started/install) with `pip`. -> `remote://myremote/path/to/file` notation just means that a DVC -> [remote](/doc/commands-reference/remote) `myremote` is defined and when DVC is -> running. DVC automatically expands this URL into a regular S3, SSH, GS, etc -> URL by appending `/path/to/file` to the `myremote`'s configured base path. - Another way to understand the `dvc get-url` command is as a tool for downloading data files. -> See `dvc get` to download data from other DVC repositories (e.g. Github URLs). - On GNU/Linux systems for example, instead of `dvc get-url` with HTTP(S) it's possible to instead use: diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index c6df596f90..8e46803cfb 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -18,19 +18,23 @@ positional arguments: ## Description -In some cases it's convenient to get a data artifact from another -DVC repository. The `dvc get` command helps the user do so. The `url` argument -should provide the external DVC project's Git repository URL (both HTTP and SSH -protocols supported, e.g. `[user@]server:project.git`), while `path` is used to -specify the path to the data to be downloaded within the repo. +DVC provides an easy way to reuse datasets, intermediate results, ML models, or +other files and directories tracked in another DVC repository into the present +workspace. The `dvc get` command downloads such a data +artifact. - +The `url` argument specifies the external DVC project's Git repository URL (both +HTTP and SSH protocols supported, e.g. `[user@]server:project.git`), while +`path` is used to specify the path to the data to be downloaded within the repo. Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. > See `dvc get-url` to download data from other supported URLs. +After running this command successfully, the data found in the `url` `path` is +created in the current working directory with its original file name. + ## Options - `-h`, `--help` - prints the usage/help message, and exit. diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index 36e7db80f6..a608e218d4 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -21,30 +21,28 @@ positional arguments: In some cases it's convenient to add a data file or directory from a remote location into the workspace, such that it will be automatically updated when the -data source is updated. Examples: +external data source changes. Examples: - A remote system may produce occasional data files that are used in other projects. - A batch process running regularly updates a data file to import. - A shared dataset on a remote storage that is managed and updated outside DVC. +The `dvc import-url` command helps the user create such an external data +dependency. The `url` argument specifies the external location of the data to be +imported, while `out` can be used to specify the (path and) file name desired +for the imported data file or directory in the workspace. + +> See `dvc import` to download and tack data from other DVC repositories (e.g. +> Github URLs). + DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to data in an external location, see [External Dependencies](/doc/user-guide/external-dependencies). In such a -DVC-file, the `deps` section specifies a remote URL, and the `outs` section +DVC-file, the `deps` section stores the remote URL, and the `outs` section contains the corresponding local path in the workspace. It records enough data from the external file or directory to enable DVC to efficiently check it to -determine whether the local copy is out of date. DVC uses the remote URL to -download the data to the workspace initially, and to re-download it when -changed. - -> See `dvc import` to download and tack data from other DVC repositories (e.g. -> Github URLs). - -The `dvc import-url` command helps the user create such an external data -dependency. The `url` argument should provide the location of the data to be -imported, while `out` can be used to specify the (path and) file name desired -for the imported data file or directory in the workspace. +determine whether the local copy is out of date. DVC supports several types of (local or) remote locations (protocols): diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index 291f8f0504..19fb26955d 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -19,28 +19,33 @@ positional arguments: ## Description -In some cases it's convenient to add a data artifact from another -DVC repository into the workspace, such that it will be automatically updated -when the data source is updated. +DVC provides an easy way to reuse datasets, intermediate results, ML models, or +other files and directories tracked in another DVC repository into the present +workspace. The `dvc import` command downloads such a data +artifact in a way that it can be tracked with DVC, resulting in automatic +updates when the external data source changes. + +The `url` argument specifies the external DVC project's Git repository URL (both +HTTP and SSH protocols supported, e.g. `[user@]server:project.git`), while +`path` is used to specify the path to the data to be downloaded within the repo. + +> See `dvc import-url` to download and tack data from other supported URLs. + +After running this command successfully, the data found in the `url` `path` is +created in the current working directory with its original file name e.g. +`data.txt`. An import stage (DVC-file) is then created (similar to having used +`dvc run` to generate the same output) extending the full file or directory name +of the imported data e.g. `data.txt.dvc`. DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to data in an external DVC repository (hosted on a Git server). In such a DVC-file, the -`deps` section specifies the DVC repo and data path, and the `outs` section +`deps` section specifies the `repo` URL and data `path`, and the `outs` section contains the corresponding local path in the workspace. It records enough data from the external file or directory to enable DVC to efficiently check it to -determine whether the local copy is out of date. DVC uses the DVC repo and data -path to download the data to the workspace initially, and to re-download it when -changed. - -> See `dvc import-url` to download and tack data from other supported URLs. +determine whether the local copy is out of date. -The `dvc import` command helps the user create such an external data dependency. -The `url` argument should provide the external DVC project's Git repository URL -(both HTTP and SSH protocols supported, e.g. `[user@]server:project.git`), while -`path` is used to specify the path to the data to be imported within the repo. -An import stage (DVC-file) is then created with the name of the data artifact, -similar to having used `dvc run` to generate the same output as done in the -external DVC project. +To actually [track the data](https://dvc.org/doc/get-started/add-files), +`git add` (and `git commit`) the import stage (DVC-file). ## Options @@ -58,4 +63,4 @@ external DVC project. - `-v`, `--verbose` - displays detailed tracing information. - + From 377410cbd1c45e5f172ab5ec148559155b144b71 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 13 Jul 2019 22:28:15 -0400 Subject: [PATCH 41/45] cmd ref: fix command to install DVC with pip inc all remotes Per https://github.com/iterative/dvc.org/commit/9d0819dab8aed3956178192e765633771acf697d#r34289598 --- static/docs/commands-reference/get-url.md | 4 ++-- static/docs/commands-reference/import-url.md | 4 ++-- static/docs/commands-reference/remote.md | 13 +++++-------- static/docs/commands-reference/remote_add.md | 10 +++++----- static/docs/get-started/configure.md | 12 +++++------- static/docs/get-started/install.md | 13 ++++++------- 6 files changed, 25 insertions(+), 31 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 5ab13cba59..079235a3d5 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -41,8 +41,8 @@ DVC supports several types of (local or) remote locations (protocols): | `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | > Depending on the remote locations type you plan to download data from you -> might need to specify one of the optional dependencies: `s3`, `gs`, `ssh` (or -> `all_remotes` to include them all) when +> might need to specify one of the optional dependencies: `[s3]`, `[gs]`, +> `[ssh]` (or `[all]` to include them all) when > [installing DVC](/doc/get-started/install) with `pip`. Another way to understand the `dvc get-url` command is as a tool for downloading diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index a608e218d4..fcb9f3866a 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -57,8 +57,8 @@ DVC supports several types of (local or) remote locations (protocols): | `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` | > Depending on the remote locations type you plan to download data from you -> might need to specify one of the optional dependencies: `s3`, `gs`, `ssh` (or -> `all_remotes` to include them all) when +> might need to specify one of the optional dependencies: `[s3]`, `[gs]`, +> `[ssh]` (or `[all]` to include them all) when > [installing DVC](/doc/get-started/install) with `pip`. > In case of HTTP, diff --git a/static/docs/commands-reference/remote.md b/static/docs/commands-reference/remote.md index 25871af80e..3ebc6d9355 100644 --- a/static/docs/commands-reference/remote.md +++ b/static/docs/commands-reference/remote.md @@ -33,14 +33,11 @@ models and re-process data files. It also saves space on your local environment - DVC can [fetch](/doc/commands-reference/fetch) into the local cache only the data you need for a specific branch/commit. -> Depending on the [remote storage](/doc/commands-reference/remote) type you -> plan to use to keep and share your data you might need to specify one of the -> optional dependencies: `s3`, `gs`, `azure`, `ssh`. (Use `all_remotes` to -> include them all.) The command should look like this: `pip install "dvc[s3]"`. -> That particular example will include the `boto3` library along with the DVC -> installation in order to support AWS S3 storage. This is valid for the `pip` -> installation method only. Other ways to install DVC already include support -> for all remotes. +> If you installed DVC via `pip`, depending on the remote type you plan to use +> you might need to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, +> `[ssh]`; or `[all]` to include them all. The command should look like this: +> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to +> support AWS S3 storage. Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually `.dvc/cache` directory inside your diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 7700757fa4..257ce0d5f1 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -31,11 +31,11 @@ config file location** (see LOCAL example below). Whenever possible DVC will create a remote directory if it doesn't exists yet. It won't create an S3 bucket though and will rely on default access settings. -> If you installed DVC via `pip`, and depending on the remote type you plan to -> use you might need to install optional dependencies: `s3`, `gs`, `azure`, -> `ssh`. Or `all_remotes` to include them all. The command should look like -> this: `pip install -U "dvc[s3]"` - it installs `boto3` library along with DVC -> to support AWS S3 storage. +> If you installed DVC via `pip`, depending on the remote type you plan to use +> you might need to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, +> `[ssh]`; or `[all]` to include them all. The command should look like this: +> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to +> support AWS S3 storage. This command creates a section in the DVC [config file](/doc/commands-reference/config) and optionally assigns a default diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 7b4878c7f6..4bee5af5f2 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -43,13 +43,11 @@ path. DVC currently supports seven types of remotes: - `hdfs` - The Hadoop Distributed File System - `http` - HTTP and HTTPS protocols -> Depending on the [remote storage](/doc/commands-reference/remote) type you -> plan to use to keep and share your data you might need to specify one of the -> optional dependencies: `s3`, `gs`, `azure`, `ssh` (or `all_remotes` to include -> them all) when installing DVC with `pip`. The command should look like this: -> `pip install "dvc[s3]"`. That particular example will include the `boto3` -> library along with the DVC installation in order to support AWS S3 storage. -> Other methods to install DVC already include support for all remotes. +> If you installed DVC via `pip`, depending on the remote type you plan to use +> you might need to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, +> `[ssh]`; or `[all]` to include them all. The command should look like this: +> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to +> support AWS S3 storage. For example, to setup an S3 remote we would use something like (make sure that `mybucket` exists): diff --git a/static/docs/get-started/install.md b/static/docs/get-started/install.md index a19e488a6d..dbb43b32e8 100644 --- a/static/docs/get-started/install.md +++ b/static/docs/get-started/install.md @@ -9,13 +9,12 @@ To install DVC from terminal, run: $ pip install dvc ``` -> Depending on the [remote storage](/doc/commands-reference/remote) type you -> plan to use to keep and share your data you might need to specify one of the -> optional dependencies: `s3`, `gs`, `azure`, `ssh` (or `all_remotes` to include -> them all) when installing DVC with `pip`. The command should look like this: -> `pip install "dvc[s3]"`. That particular example will include the `boto3` -> library along with the DVC installation in order to support AWS S3 storage. -> Other methods to install DVC already include support for all remotes. +> If you installed DVC via `pip`, depending on the +> [remote](/doc/commands-reference/remote) type you plan to use you might need +> to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, `[ssh]`; or +> `[all]` to include them all. The command should look like this: +> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to +> support AWS S3 storage. The easiest option, self-contained binary packages (or Windows installer), are available by using the big "Download" button in the [home page](/). You may also From 6d0d0e0ac40fcbded3466fdbe8d4b2725bb3a610 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 15 Jul 2019 11:40:35 -0400 Subject: [PATCH 42/45] install: add [oss] to list of optional deps when installing via pip Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-261565825 --- static/docs/commands-reference/get-url.md | 4 ++-- static/docs/commands-reference/import-url.md | 4 ++-- static/docs/commands-reference/remote.md | 8 ++++---- static/docs/commands-reference/remote_add.md | 8 ++++---- static/docs/get-started/configure.md | 8 ++++---- static/docs/get-started/install.md | 4 ++-- 6 files changed, 18 insertions(+), 18 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 079235a3d5..ced6c1a826 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -41,8 +41,8 @@ DVC supports several types of (local or) remote locations (protocols): | `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` | > Depending on the remote locations type you plan to download data from you -> might need to specify one of the optional dependencies: `[s3]`, `[gs]`, -> `[ssh]` (or `[all]` to include them all) when +> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`, +> `[gs]`, `[azure]`, and `[oss]` (or `[all]` to include them all) when > [installing DVC](/doc/get-started/install) with `pip`. Another way to understand the `dvc get-url` command is as a tool for downloading diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index fcb9f3866a..eadcef5542 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -57,8 +57,8 @@ DVC supports several types of (local or) remote locations (protocols): | `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` | > Depending on the remote locations type you plan to download data from you -> might need to specify one of the optional dependencies: `[s3]`, `[gs]`, -> `[ssh]` (or `[all]` to include them all) when +> might need to specify one of the optional dependencies: `[s3]`, `[ssh]`, +> `[gs]`, `[azure]`, and `[oss]` (or `[all]` to include them all) when > [installing DVC](/doc/get-started/install) with `pip`. > In case of HTTP, diff --git a/static/docs/commands-reference/remote.md b/static/docs/commands-reference/remote.md index 3ebc6d9355..213e550371 100644 --- a/static/docs/commands-reference/remote.md +++ b/static/docs/commands-reference/remote.md @@ -34,10 +34,10 @@ environment - DVC can [fetch](/doc/commands-reference/fetch) into the local cache only the data you need for a specific branch/commit. > If you installed DVC via `pip`, depending on the remote type you plan to use -> you might need to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, -> `[ssh]`; or `[all]` to include them all. The command should look like this: -> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to -> support AWS S3 storage. +> you might need to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`, +> `[azure]`, and `[oss]`; or `[all]` to include them all. The command should +> look like this: `pip install "dvc[s3]"` - it installs `boto3` library along +> with DVC to support AWS S3 storage. Using DVC with a remote data storage is optional. By default, DVC is configured to use a local data storage only (usually `.dvc/cache` directory inside your diff --git a/static/docs/commands-reference/remote_add.md b/static/docs/commands-reference/remote_add.md index 257ce0d5f1..491b489351 100644 --- a/static/docs/commands-reference/remote_add.md +++ b/static/docs/commands-reference/remote_add.md @@ -32,10 +32,10 @@ create a remote directory if it doesn't exists yet. It won't create an S3 bucket though and will rely on default access settings. > If you installed DVC via `pip`, depending on the remote type you plan to use -> you might need to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, -> `[ssh]`; or `[all]` to include them all. The command should look like this: -> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to -> support AWS S3 storage. +> you might need to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`, +> `[azure]`, and `[oss]`; or `[all]` to include them all. The command should +> look like this: `pip install "dvc[s3]"` - it installs `boto3` library along +> with DVC to support AWS S3 storage. This command creates a section in the DVC [config file](/doc/commands-reference/config) and optionally assigns a default diff --git a/static/docs/get-started/configure.md b/static/docs/get-started/configure.md index 4bee5af5f2..90acffc13c 100644 --- a/static/docs/get-started/configure.md +++ b/static/docs/get-started/configure.md @@ -44,10 +44,10 @@ path. DVC currently supports seven types of remotes: - `http` - HTTP and HTTPS protocols > If you installed DVC via `pip`, depending on the remote type you plan to use -> you might need to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, -> `[ssh]`; or `[all]` to include them all. The command should look like this: -> `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to -> support AWS S3 storage. +> you might need to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`, +> `[azure]`, and `[oss]`; or `[all]` to include them all. The command should +> look like this: `pip install "dvc[s3]"` - it installs `boto3` library along +> with DVC to support AWS S3 storage. For example, to setup an S3 remote we would use something like (make sure that `mybucket` exists): diff --git a/static/docs/get-started/install.md b/static/docs/get-started/install.md index dbb43b32e8..13cc0506a6 100644 --- a/static/docs/get-started/install.md +++ b/static/docs/get-started/install.md @@ -11,8 +11,8 @@ $ pip install dvc > If you installed DVC via `pip`, depending on the > [remote](/doc/commands-reference/remote) type you plan to use you might need -> to install optional dependencies: `[s3]`, `[gs]`, `[azure]`, `[ssh]`; or -> `[all]` to include them all. The command should look like this: +> to install optional dependencies: `[s3]`, `[ssh]`, `[gs]`, `[azure]`, and +> `[oss]`; or `[all]` to include them all. The command should look like this: > `pip install "dvc[s3]"` - it installs `boto3` library along with DVC to > support AWS S3 storage. From 2b1008a151ef834f0e50c7e9187d13ae389d33ce Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 15 Jul 2019 11:44:03 -0400 Subject: [PATCH 43/45] cmd ref: be more specific about what import and get are for... in import-url and get-url notes about them Per https://github.com/iterative/dvc.org/pull/464 --- static/docs/commands-reference/get-url.md | 3 ++- static/docs/commands-reference/import-url.md | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index ced6c1a826..524354be60 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -27,7 +27,8 @@ directory. Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. -> See `dvc get` to download data from other DVC repositories (e.g. Github URLs). +> See `dvc get` to download data or model files or directories from other DVC +> repositories (e.g. Github URLs). DVC supports several types of (local or) remote locations (protocols): diff --git a/static/docs/commands-reference/import-url.md b/static/docs/commands-reference/import-url.md index eadcef5542..ad56d97afe 100644 --- a/static/docs/commands-reference/import-url.md +++ b/static/docs/commands-reference/import-url.md @@ -33,8 +33,8 @@ dependency. The `url` argument specifies the external location of the data to be imported, while `out` can be used to specify the (path and) file name desired for the imported data file or directory in the workspace. -> See `dvc import` to download and tack data from other DVC repositories (e.g. -> Github URLs). +> See `dvc import` to download and tack data or model files or directories from +> other DVC repositories (e.g. Github URLs). DVC supports [DVC-files](/doc/user-guide/dvc-file-format) which refer to data in an external location, see From d53434df94d2575bbe92b0a6f18ffcfb9cfe69ae Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 15 Jul 2019 11:50:33 -0400 Subject: [PATCH 44/45] cmd ref: clarify that get and get-url download files anywhere... not just into DVC projects. Per https://github.com/iterative/dvc.org/pull/464#pullrequestreview-261565970 --- static/docs/commands-reference/get-url.md | 10 ++++++---- static/docs/commands-reference/get.md | 6 +++--- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/static/docs/commands-reference/get-url.md b/static/docs/commands-reference/get-url.md index 524354be60..a3816f0e34 100644 --- a/static/docs/commands-reference/get-url.md +++ b/static/docs/commands-reference/get-url.md @@ -19,10 +19,12 @@ positional arguments: ## Description In some cases it's convenient to get a data file or directory from a remote -location. The `dvc get-url` command helps the user do so. The `url` argument -should provide the location of the data to be downloaded, while `out` can be -used to specify the (path and) file name desired for the downloaded data file or -directory. +location into the current working directory, regardless of whether it's a DVC +project. The `dvc get-url` command helps the user do just that. + +The `url` argument should provide the location of the data to be downloaded, +while `out` can be used to specify the (path and) file name desired for the +downloaded data file or directory. Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. diff --git a/static/docs/commands-reference/get.md b/static/docs/commands-reference/get.md index 8e46803cfb..eb124fbfdf 100644 --- a/static/docs/commands-reference/get.md +++ b/static/docs/commands-reference/get.md @@ -19,9 +19,9 @@ positional arguments: ## Description DVC provides an easy way to reuse datasets, intermediate results, ML models, or -other files and directories tracked in another DVC repository into the present -workspace. The `dvc get` command downloads such a data -artifact. +other files and directories tracked in another DVC repository into the current +working directory, regardless of whether it's a DVC project. The `dvc get` +command downloads such a data artifact. The `url` argument specifies the external DVC project's Git repository URL (both HTTP and SSH protocols supported, e.g. `[user@]server:project.git`), while From 1bdd04f51f779c30f43810a4f275d3629d664a98 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Mon, 15 Jul 2019 12:03:15 -0400 Subject: [PATCH 45/45] import: add note that the original release is now import-url in cmd ref Per https://github.com/iterative/dvc.org/issues/385#issuecomment-510808263 --- static/docs/commands-reference/import.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/static/docs/commands-reference/import.md b/static/docs/commands-reference/import.md index 19fb26955d..aef93621de 100644 --- a/static/docs/commands-reference/import.md +++ b/static/docs/commands-reference/import.md @@ -1,5 +1,8 @@ # import +> **Note!** This command has been repurposed after its original release. The +> previous version is still available as the `dvc import-url` command. + Download or copy file or directory from another DVC repository (on a git server such as Github) into the workspace, and track changes in the remote source with DVC. Creates a DVC-file.