Skip to content

Commit

Permalink
Add readme to explain the deploy procedure
Browse files Browse the repository at this point in the history
  • Loading branch information
MicheleTrya committed Jul 9, 2024
1 parent 8ef5136 commit 1f661f3
Showing 1 changed file with 294 additions and 0 deletions.
294 changes: 294 additions & 0 deletions README.trya.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# NOTE FOR TRYA DEVELOPERS

Here we describe:

- the typical scenario of a Python package hosted on a git provider:
- the content of the git repository
- how to deploy a new version of the package

and then:

- how this package (i.e. *selenium*) is configured in an atypical way: again,
- the content of the git repository
- how to deploy a new version of the package.



In the next we assume the git provider is GitHub.

All git providers share the same mechanics, but some details may differ, e.g. the procedure to upload a tarball may be different: in that case, just check the official documentation to understand how to upload the tarball.



# TYPICAL SCENARIO

The content of the repository is the following:

```
repository
└── setup.py
```



At top level we have a `setup.py` file.

This file allows *pip* to recognize the directory as the source code of a package.

In practice, you can run:

- ```bash
pip install ./repository
```

to install the package from a local git clone

- ```bash
pip install -r ./requirements.txt
```

to install the package through GitHub.

Here we assume the deploy of the package to GitHub happened correctly, i.e.:

- this is the content of the `requirements.txt` file:

```
package @ https://github.com/<username>/<package>/archive/refs/tags/<tag>.tar.gz
```

- tha tarball `<tag>.tar.gz` contains the following:

```
<tag>.tar.gz
└── <tag>
└── setup.py
```

i.e. `<tag>.tar.gz` is obtained by compressing the `repository` folder (note that in the case of GitHub the compression is not done manually but it's done automatically by GitHub as part of the deploy procedure, explained below).
Suppose you want to deploy a new package version to GitHub.
In practice, the goal of the deploy procedure is to host the tarball of the repository on a specific URL:
```
https://github.com/<username>/<package>/archive/refs/tags/<tag>.tar.gz
```
Once the URL exists, you can define a `requirements.txt` file with this content:
```
package @ https://github.com/<username>/<package>/archive/refs/tags/<tag>.tar.gz
```
This line is just an alias for *pip*: when you run
```
pip install -r ./requirements.txt
```
*pip* unrolls that line into a list of commands:
- download the tarball from the URL
- extract the tarball into a folder
- run `pip install <folder>`.
**Note.** *pip* is able to install packages from external sources (e.g. local files, git addresses, HTTP addresses) but it's not able to keep track of the source of a package. Example:

```bash
pip install selenium @ https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0-trya.tar.gz
# Collecting https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0-trya.tar.gz
# Downloading https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0-trya.tar.gz
# | 459.3 kB 1.8 MB/s 0:00:00
# Preparing metadata (setup.py) ... done
pip freeze
# selenium==4.19.0
```

Here `pip freeze` is not able to show that the `selenium` package was installed from an HTTP address.

This means that you can't rely on the `pip freeze` utility when formally defining the requirements of a project: you have to arrange the `requirements.txt` file manually.
**Note.** There are many ways to install packages from external sources. [This](https://pip.pypa.io/en/stable/topics/vcs-support/#supported-vcs) is the official documentation explaining them.
We show a couple of examples just to give the idea:
```
package @ https://github.com/<username>/<package>/archive/refs/tags/<tag>.tar.gz
package @ git+ssh://[email protected]/<package>.git@<tag>
package @ git+https://[email protected]/<package>.git@<tag>
package @ ./<package>.whl
```
The first one downloads a tarball from an HTTP address. The second one clones a git repository through the SSH protocol. The third one clones a git repository through the HTTP protocol. The fourth one installs the package from a local wheel.
In this document we are not going to explain the various ways to install a package. We just mention that most of them at some point rely on the fact to have a folder with a `setup.py` file inside: at that point *pip* installs the package by running `pip install <folder>`.
The *wheel* file (with extension `.whl`) is an exception: the wheel file is a zipped directory which contains the output of the `pip install <folder>` command, i.e. the actual package.
In the context of this document the goal is to publish on GitHub an HTTP URL which returns the tarball of the package source (i.e. a directory containing a `setup.py` file).
To deploy a tarball, just push a git tag: GitHub will automatically create the tarball of the whole repository and name it with the tag name. Example:
```bash
git tag <tag>
git push origin <tag>
wget -O <tag>.tar.gz https://github.com/<username>/<package>/archive/refs/tags/<tag>.tar.gz
```
After pushing the tag, the URL of the tarball is automatically created and hosted by GitHub.
# SELENIUM SCENARIO
The *selenium* project is configured in an atypical way, so the straightforward procedures explained below do not apply without some tricks.
The reason for which *selenium* is atypical is that it's a cross-language repository: the source code contained is a single git repository must produce the bindings for many programming languages (i.e. not just Python but also Java, C#, and others).



The structure of the repository is the following (we will not be precise, we just want to give the idea):

```
repository
├── core
│ └── <the C++ source code for the project core binaries>
├── python
│ ├── setup.py
│ └── <the Python source code for the Python bindings (i.e. the Python package)>
└── java
└── <the Java source code for the Java bindings>
```

The steps to deploy a tarball to GitHub are the following:

- compile the C++ code into the core binaries
- embed the core binaries into the Python package (i.e. copy/paste them into the `python` folder)
- compress the content of the `python` folder into a tarball
- upload the tarball to GitHub, making it downloadable through an appropriate HTTP URL.



Within the *selenium* project you don't execute the steps above manually but a through a CLI. The CLI is embedded into the git repository. The CLI command to publish a package version on GitHub is something like the following:
```bash
./go py:release
```
Unluckily, so far we have not been able to make this CLI work, and we don't want to invest time in understanding how to make it work.

Hence, we have found a workaround to create the tarball without using the *selenium* CLI.

The procedure is the following.

- Assume you want to make a change to a certain package version, e.g. `4.19.0`. The goal is to publish a tarball `4.19.0-trya` with the desired changes, so taht out `requirements.txt` file will be able to contain the following line:

```
selenium @ https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0-trya.tar.gz
```

**Note.** The prefix `selenium-` to the tag is a convention of the *selenium* maintainers, it doesn't have any special meaning.
- Download the tarball:
```bash
wget -O selenium-4.19.0.tar.gz https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0.tar.gz
```
- Extract the tarball:
```bash
tar -xvzf selenium-4.19.0.tar.gz
```
- Make the desired changes, e.g. modify the content of the following file:
```
selenium-4.19.0/selenium/webdriver/common/selenium_manager.py
```
- Replace the content of the whole git directory with the content of the tarball directory:
```bash
cd selenium
git rm *
cp ../selenium-4.19.0/* .
git add *
```
Here `selenium` is the directory of the git repository and `selenium-4.19.0` is the directory of the extracted (and modified) tarball.
- Commit the changes:
```bash
git commit -m "Replace project source with Python package source for version 4.19.0-trya"
```
- Tag the commit:
```bash
git tag selenium-4.19.0-trya
```
**Note.** The `selenium-` prefix doesn't have any special meaning.

- Push the tag:

```bash
git push origin selenium-4.19.0-trya
```

- At this point the HTTP URL has just been automatically created by GitHub:

```
https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0.tar.gz
```

so you can include it in your `requirements.txt` file:

```
selenium @ https://github.com/tryadev/selenium/archive/refs/tags/selenium-4.19.0-trya.tar.gz
```



**Hint.** The commit explained above is destructive: it replaces the content of the whole directory and you have no easy way to retrieve what changes you made.

Hence, we suggest to first make the changes inside the original repository, and then repeat the changes in the tarball.

In practice, the sequence of commits will be this:

```bash
git log
# 052d9fd9c1 (tag: selenium-4.19.0-trya) Replace project source with Python package source for version 4.19.0-trya
# b8408f8e45 The desired change
# 5f9cec8963 (tag: selenium-4.19.0) Release 4.19.0
```

instead of this:

```bash
git log
# f3e26e2566 (tag: selenium-4.19.0-trya) Replace project source with Python package source for version 4.19.0-trya
# 5f9cec8963 (tag: selenium-4.19.0) Release 4.19.0
```

Note that there is no difference between commits `052d9fd9c1` and `f3e26e2566`: the only difference is that the first one has an additional ancestor (`b8408f8e45`) whose role is just to document the changes made to the tarball.

Obviously, the git history will resume from `b8408f8e45`: the commit `052d9fd9c1` will be a dead branch, as its only purpose is to trigger GitHub to create the tarball.



**Note.** If you delete the tag `selenium-4.19.0-trya` then also the relative tarball will be deleted from the GitHub servers, hence the tag is meant to live there forever. If you dislike having a tag in a dead branch, then you can check the documentation of GitHub (or any other git provider) about how to publish a tarball manually, i.e. not necessarily bounding it to a tag.



**Note.** It can happen that we split a commit into many of them. E.g. the commit *replace project content* may be split into *remove project content* and *add Python package source*.

0 comments on commit 1f661f3

Please sign in to comment.