Private Preview Feedback: GitHub Enterprise Importer (GEI) can migrated repositories with GitHub owned blob storage #144948
Unanswered
boylejj
asked this question in
Enterprise
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
GitHub Enterprise Importer (GEI) can migrate repositories with GitHub owned blob storage
Using GitHub-owned storage with GEI
Congratulations on being a part of the GitHub-owned storage for GEI beta!
The GitHub Enterprise Importer (also known as GEI), performs migrations from GitHub products, Bitbucket Server, and Bitbucket Data Center into your organizations accessible via GitHub.com. Migrations with GEI are performed by a helpful CLI utility, or as individual GraphQL requests.
In the past, to migrate from GHES, Bitbucket Server, and Bitbucket Data Center, an external storage provider was required to perform migrations. With the introduction of GitHub-owned storage for GEI, this is no longer a requirement. Migrations can now be performed with archives uploaded directly to GitHub.com!
Security practices and managing your archives with GitHub-owned storage
We designed every step of the GitHub-owned storage flow with security being paramount. Archives are uploaded to organizations, and the uploading user is authorized with the migrator role for the organization, just like the rest of GEI. The storage is explicitly write-only, and downloads from GitHub-owned storage are not possible. Only basic metadata is accessible to users with a migrator role for the organization, which includes the file name, size, upload date, a GUID attributed to the archive, and a special GEI URI that represents the archive.
After a migration completes, the archives that were used are immediately deleted. If an archive is uploaded, but is not used with a migration, the archive is deleted after 7 days. In addition, our GraphQL API has been extended to be able to query basic metadata for archives uploaded to an organization, and to be able to delete archives at-will. The GraphQL endpoints only allow querying and deletion of archives for users that have the migrator role for their organizations.
Using GitHub-owned storage with the GEI CLI
Using GitHub-owned storage with the GEI CLI is a snap! Make sure you are running version v1.9.0 or greater of the gei or bbs2gh extension, then simply pass the
--use-github-storage
option to yourgh gei
orgh bbs2gh
commands. The CLI will then be performing migrations using GitHub-owned storage! There's nothing more to it!Using GitHub-owned storage via individual API calls
If you're looking to cater to a more specific flow with your migrations, you may want to use the GEI GraphQL API. This section serves as an addendum to the public Migrating repositories from GitHub Enterprise Server to GitHub Enterprise Cloud documentation, and includes details on uploading and enqueuing migrations using GitHub-owned storage.
Compared to the public documentation, these are the steps for performing migrations while using GitHub-owned storage:
For step 6, there are two ways to upload your archives: single POST requests for archives up to 5 GiB, and via a multipart upload flow for archives between 5 MiB and 30 GiB. Depending on the migration, it may use one archive, or separate archives for Git data and repository metadata (i.e. pull requests, issues, etc.). In any case, perform the uploads as necessary, and keep track of the GEI URIs from your uploads to continue with a migration.
In step 7, when starting your migration, simply use the GEI URIs in the
gitArchiveUrl
andmetadataArchiveUrl
fields, and migrations will be performed using the archives uploaded to GitHub-owned storage!Performing single uploads to GitHub-owned storage (step 6, <5 GiB)
To perform a single upload, simply submit a POST request with your archive as the POST data to:
Substitute
{organization_id}
with your organization database ID. The ID can be found in the"id"
key from https://api.github.com/orgs/your-organization-login. Also substitute{archive_name}
with a filename of your choice that represents your archive. This value sets the"name"
field on the archive metadata for easier management as a user, and the value makes no functional difference in your migrations.Make sure to also include your PAT and a content type in the request headers, too. The content type is required, although it can be always set to
application/octet-stream
. Here's an example of the request headers necessary for the POST request:The response body will include a JSON object, like so:
The
"uri"
value contains the GEI URI! This URI represents the uploaded archive, and will be used to enqueue migrations in step 7!Performing multipart uploads to GitHub-owned storage (step 6, 5-30 GiB)
Multipart uploads are a little more involved, and follow a high-level flow like so:
Here are the steps in more depth! For all requests, ensure to include credentials in your headers with your PAT, i.e.
Start a multipart upload. Submit a POST request to
https://uploads.github.com/organizations/{organization_id}/gei/archive/blobs/uploads
, substituting{organization_id}
with your organization ID. Include a JSON body like below with the archive name and size. The content type can remain as"application/octet-stream"
for all uploads.This will return a 202 with an empty response body. In the response headers, the Location will look like this:
Keep this path handy, as it'll be used to upload your first file part. We'll call it the "next path." Also remember the GUID value, as it'll be used to enqueue a migration with the uploaded archive later.
Upload a file part. Upload 100 MiB of your file to
https://uploads.github.com/{location}
, substituting{location}
with the "next path" value. This will return a 202 with an empty response body. In the response headers, the Location will look like this:Notice that this Location value is identical to the initial Location on step 1, except the
part_number
value is incremented. If this is the last file part necessary to upload the entire file, we'll need to make a request to the previous Location path (not the new one), so keep the old path as "last path," and replace "next path" with our new Location path.Repeat step 2 until the upload is complete. Ensure that you are reading up to 100 MiB of the file at a time, and submitting requests to the new Location values with the incremented
part_number
values.Complete the multipart upload. Submit a PUT request to the "last path" value from step 2 with an empty body. If all is well, you'll receive a 201, and your upload to GitHub-owned storage is complete! Your GEI URI can be constructed with the GUID from step 1 like this:
gei://archive/{guid}
.Here is an example of a Ruby script that will perform the above flow using Faraday and Addressable:
Querying and deleting archives from GitHub-owned storage via GraphQL
There are three ways to interact with your archives from GitHub-owned storage:
Organization
object via themigrationArchives
connection.MigrationArchive
object.deleteMigrationArchive
mutation.In all three requests, ensure that you include the GraphQL-Features header below during the beta period:
Querying all archives
Here's an example query that obtains the first 10 migrations for an organization by the organization node ID, which can be found in https://api.github.com/orgs/your-organization-login:
This will return a value like this:
If you prefer, you can also query your archives alongside your other organization fields. You can also look up your organization by the organization name, too. Here's an example query:
This will return a response like the one below:
Querying a single archive
To query an archive by the archive's node ID, use a query like the one below:
This will return a response like below:
Deleting an archive
Archives are automatically deleted after they are used in a migration, and unused archives are always deleted after 7 days. This mutation can be useful if an archive was uploaded that is not planned to be used for a migration, and immediate deletion is desired.
To delete an archive, use the
deleteMigrationArchive
mutation like so:This mutation will return a response like the one below:
Beta Was this translation helpful? Give feedback.
All reactions