Skip to content

Commit

Permalink
s3: update info on boto3 methods and permissions required...
Browse files Browse the repository at this point in the history
in `imoprt-url`, `get-url` as well as `remote` and `remote add` command refs.
Updates also related guides (install and config).
  • Loading branch information
jorgeorpinel committed Jul 12, 2019
1 parent a7ee6aa commit 9d0819d
Show file tree
Hide file tree
Showing 6 changed files with 72 additions and 44 deletions.
37 changes: 17 additions & 20 deletions static/docs/commands-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,15 @@ DVC supports several types of (local or) remote locations (protocols):
| `hdfs` | HDFS | `hdfs://[email protected]/path/to/data.csv` |
| `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` |

> Depending on the remote locations type you plan to download data from you
> might need to specify one of the optional dependencies: `s3`, `gs`, `ssh` (or
> `all_remotes` to include them all) when
> [installing DVC](/doc/get-started/install) with `pip`.
> `remote://myremote/path/to/file` notation just means that a DVC
> [remote](/doc/commands-reference/remote) `myremote` is defined, and when DVC
> is running it internally expands this URL into a regular S3, SSH, GS, etc URL
> by appending `/path/to/file` to the `myremote`'s configured base path.
> [remote](/doc/commands-reference/remote) `myremote` is defined and when DVC is
> running. DVC automatically expands this URL into a regular S3, SSH, GS, etc
> URL by appending `/path/to/file` to the `myremote`'s configured base path.
Another way to understand the `dvc get-url` command is as a tool for downloading
data files.
Expand Down Expand Up @@ -79,6 +84,8 @@ The above command will copy the `/local/path/to/data` file or directory into

</details>

<details>

### Click for AWS S3 example

This command will copy an S3 object into the current working directory with the
Expand All @@ -93,23 +100,13 @@ By default DVC expects your AWS CLI is already
DVC will be using default AWS credentials file to access S3. To override some of
these settings, you could the options described in `dvc remote modify`.

We use `boto3` library to set up a client and communicate with AWS S3. The
following API methods may be performed:

- `list_objects_v2`, `list_objects`
- `head_object`
- `download_file`
- `upload_file`
- `delete_object`
- `copy`

So make sure you have the following permissions enabled to enable all the above
operations:

- `s3:ListBucket`
- `s3:GetObject`
- `s3:PutObject`
- `s3:DeleteObject`
> We use the `boto3` library to and communicate with AWS S3. The following API
> methods may be performed:
>
> - `head_object`
> - `download_file`
>
> So make sure you have the `s3:GetObject` permission enabled.
</details>

Expand Down
32 changes: 30 additions & 2 deletions static/docs/commands-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,20 @@ DVC supports several types of (local or) remote locations (protocols):
| `http` | HTTP to file with _strong ETag_ (see explanation below) | `https://example.com/path/to/data.csv` |
| `remote` | Remote path (see explanation below) | `remote://myremote/path/to/file` |

> Depending on the remote locations type you plan to download data from you
> might need to specify one of the optional dependencies: `s3`, `gs`, `ssh` (or
> `all_remotes` to include them all) when
> [installing DVC](/doc/get-started/install) with `pip`.
> In case of HTTP,
> [strong ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation)
> is necessary to track if the specified remote file (URL) changed to download
> it again.
> `remote://myremote/path/to/file` notation just means that a DVC
> [remote](/doc/commands-reference/remote) `myremote` is defined and when DVC is
> running, it internally expands this URL into a regular S3, SSH, GS, etc URL by
> appending `/path/to/file` to the `myremote`'s configured base path.
> running. DVC automatically expands this URL into a regular S3, SSH, GS, etc
> URL by appending `/path/to/file` to the `myremote`'s configured base path.
Another way to understand the `dvc import-url` command is as a short-cut for a
more verbose `dvc run` command. This is discussed in the
Expand Down Expand Up @@ -149,6 +154,29 @@ Now, we can install requirements for the project:
$ pip install -r requirements.txt
```

<details>

### Click for AWS S3 example

This command will copy an S3 object into the current working directory with the
same file name:

```dvc
$ dvc get-url s3://bucket/path
```

Note that the examples use

> We use the `boto3` library to and communicate with AWS S3. The following API
> methods may be performed:
>
> - `head_object`
> - `download_file`
>
> So make sure you have the `s3:GetObject` permission enabled.
</details>

</details>

## Example: Tracking a remote file
Expand Down
13 changes: 8 additions & 5 deletions static/docs/commands-reference/remote.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,14 @@ models and re-process data files. It also saves space on your local
environment - DVC can [fetch](/doc/commands-reference/fetch) into the local
cache only the data you need for a specific branch/commit.

> If you installed DVC via `pip`, and depending on the remote type you plan to
> use you might need to install optional dependencies: `s3`, `gs`, `azure`,
> `ssh`. Or `all_remotes` to include them all. The command should look like
> this: `pip install -U "dvc[s3]"` - it installs `boto3` library along with DVC
> to support AWS S3 storage.
> Depending on the [remote storage](/doc/commands-reference/remote) type you
> plan to use to keep and share your data you might need to specify one of the
> optional dependencies: `s3`, `gs`, `azure`, `ssh`. (Use `all_remotes` to
> include them all.) The command should look like this: `pip install "dvc[s3]"`.
> That particular example will include the `boto3` library along with the DVC
> installation in order to support AWS S3 storage. This is valid for the `pip`
> installation method only. Other ways to install DVC already include support
> for all remotes.
Using DVC with a remote data storage is optional. By default, DVC is configured
to use a local data storage only (usually `.dvc/cache` directory inside your
Expand Down
12 changes: 6 additions & 6 deletions static/docs/commands-reference/remote_add.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ By default DVC expects your AWS CLI is already
DVC will be using default AWS credentials file to access S3. To override some of
these settings, you could the options described in `dvc remote modify`.

We use `boto3` library to set up a client and communicate with AWS S3. The
following API methods are performed:
We use the `boto3` library to communicate with AWS S3. The following API methods
are performed:

- `list_objects_v2`, `list_objects`
- `head_object`
Expand All @@ -148,10 +148,10 @@ following API methods are performed:

So, make sure you have the following permissions enabled:

- s3:ListBucket
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
- `s3:ListBucket`
- `s3:GetObject`
- `s3:PutObject`
- `s3:DeleteObject`

</details>

Expand Down
10 changes: 5 additions & 5 deletions static/docs/get-started/configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@ path. DVC currently supports seven types of remotes:

> Depending on the [remote storage](/doc/commands-reference/remote) type you
> plan to use to keep and share your data you might need to specify one of the
> optional dependencies: `s3`, `gs`, `azure`, `ssh`. Or `all_remotes` to include
> them all. The command should look like this: `pip install "dvc[s3]"` - it will
> install `boto3` library along with DVC to support AWS S3 storage. This is
> valid for `pip install` option only. Other ways to install DVC already include
> support for all remotes.
> optional dependencies: `s3`, `gs`, `azure`, `ssh` (or `all_remotes` to include

This comment has been minimized.

Copy link
@shcheklein

shcheklein Jul 13, 2019

Member

is it all_remotes or just all?

This comment has been minimized.

Copy link
@jorgeorpinel

jorgeorpinel Jul 14, 2019

Author Contributor

It is [all]. Idk why that was there already before (in the remote * command refs. Just fixed in e9805c3

> them all) when installing DVC with `pip`. The command should look like this:
> `pip install "dvc[s3]"`. That particular example will include the `boto3`
> library along with the DVC installation in order to support AWS S3 storage.
> Other methods to install DVC already include support for all remotes.
For example, to setup an S3 remote we would use something like (make sure that
`mybucket` exists):
Expand Down
12 changes: 6 additions & 6 deletions static/docs/get-started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ $ pip install dvc
```

> Depending on the [remote storage](/doc/commands-reference/remote) type you
> plan to use to keep and share your data, you might need to specify one of the
> optional dependencies: `s3`, `gs`, `azure`, `ssh`. Or `all_remotes` to include
> them all. The command should look like this: `pip install "dvc[s3]"` - it
> installs the `boto3` library along with DVC to support the AWS S3 storage.
> This is valid for `pip install` option only. Other ways to install DVC already
> include support for all remotes.
> plan to use to keep and share your data you might need to specify one of the
> optional dependencies: `s3`, `gs`, `azure`, `ssh` (or `all_remotes` to include
> them all) when installing DVC with `pip`. The command should look like this:
> `pip install "dvc[s3]"`. That particular example will include the `boto3`
> library along with the DVC installation in order to support AWS S3 storage.
> Other methods to install DVC already include support for all remotes.
The easiest option, self-contained binary packages (or Windows installer), are
available by using the big "Download" button in the [home page](/). You may also
Expand Down

0 comments on commit 9d0819d

Please sign in to comment.