Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download models from (private) S3 #500

Merged
merged 5 commits into from
Aug 25, 2020
Merged

Download models from (private) S3 #500

merged 5 commits into from
Aug 25, 2020

Conversation

tholor
Copy link
Member

@tholor tholor commented Aug 25, 2020

Simple utility function to download a model from an s3 bucket (e.g. private AWS or on-prem deployment).
We'll skip those files that have already been downloaded before.

Usage:

from farm.modeling.tokenization import Tokenizer
from farm.modeling.language_model import LanguageModel
from farm.file_utils import download_from_s3

# download model (if no custom cache_dir is supplied => we use FARM default cache dir, i.e. ~/.cache/torch/farm on Linux)
remote_model_path = "s3://your_bucket/bert-base-german-cased/"
local_model_path = download_from_s3(s3_url=remote_model_path, cache_dir=None)

# load components as usual
tokenizer = Tokenizer.load(local_model_path)
language_model = LanguageModel.load(local_model_path)

Potential improvements in the future:

  • add a file with the hash to see if remote files have been updated and therefore need to be downloaded again
  • progress bar for download

@tholor tholor requested a review from tanaysoni August 25, 2020 11:59
@tholor tholor changed the title WIP Download models from (private) S3 Download models from (private) S3 Aug 25, 2020
@tholor tholor merged commit 761028f into master Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants