Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to latest iev-data release #66

Closed
ronaldtse opened this issue Nov 23, 2020 · 20 comments · Fixed by #71
Closed

Update to latest iev-data release #66

ronaldtse opened this issue Nov 23, 2020 · 20 comments · Fixed by #71
Assignees

Comments

@ronaldtse
Copy link
Member

No description provided.

@ronaldtse
Copy link
Member Author

@skalee ping on this.

@skalee
Copy link
Contributor

skalee commented Feb 17, 2021

@ronaldtse Anytime I guess.

However we have some pending discussion in concept-model on how to represent a few things, so it won't be complete. Also we need to display multiple sources on page (geolexica/geolexica-server#151) which is simple, but some clarification on that is welcome.

@ronaldtse
Copy link
Member Author

@skalee we have a demo on Feb 25 (your morning) and must get the full site deployed (on Feb 24). Can you please help make that work?

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

Oops, I have overlooked this one. I can try if it's not too late. @ronaldtse?

@ronaldtse
Copy link
Member Author

Yes please do. Thanks!

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

Caveats:

  • if concept has multiple sources, only first one is displayed
  • I will not update domains list, but all concepts are accessible either via search or via "browse all" link
  • various Relaton issues in domain 112, need to investigate

Built new concepts ZIP, now deploying site. Hopefully it won't take too long.

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

The new problem is that full IEV deploy takes more than 20 minutes. We'll have to find something out for everyday use. Subset of concepts, perhaps.

@ronaldtse
Copy link
Member Author

@skalee is it deployed? It’s not showing up. Do you know why it’s slow? The copying of files to S3?

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

It's still deploying. One hour and counting. Building the site took half an hour, copying to S3 took all the rest.

I hope we can tweak s3sync parameters. I recall that like a year ago I was experimenting with them, but I don't remember the results.

Oh, it just has completed.

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

  • stats are broken (I need to update language names)
  • search is bit slow but works (I got some improvements on a branch, maybe they'll help, though I think it's mostly due to fetching a large JSON)

@ronaldtse
Copy link
Member Author

@skalee I noticed that the site built in 16 mins but the upload takes more than 30. If we reduce the output (quiet option), it can will be much faster. Let me check if S3 has an option for uploading a compressed archive.

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

@ronaldtse I doubt it's due to logging. I mean it may be a contributor, but I doubt it's the biggest one.

We do these uploads in a quite weird way with several passes: https://github.com/geolexica/geolexica-server/blob/master/lib/tasks/deploy.rake, maybe this is the reason. As far as I understand, we do that because we want to set our content type headers via CLI arguments. AFAIR there are other ways to set them too.

@ronaldtse
Copy link
Member Author

https://aws.amazon.com/premiumsupport/knowledge-center/s3-upload-large-files/

  1. If you're using the AWS CLI, customize the upload configurations. i.e. increase parallelism (https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html)
  2. Enable Amazon S3 Transfer Acceleration.

@ronaldtse
Copy link
Member Author

I will enable S3 Transfer Acceleration at the bucket.

@ronaldtse
Copy link
Member Author

@ronaldtse
Copy link
Member Author

I will enable S3 Transfer Acceleration at the bucket.

Welp, not possible:

Requirements for using Transfer Acceleration

The following are required when you are using Transfer Acceleration on an S3 bucket:
...
The name of the bucket used for Transfer Acceleration must be DNS-compliant and must not contain periods (".").

Error putting S3 acceleration: InvalidRequest: S3 Transfer Acceleration is not supported for buckets with periods (.) in their names
	status code: 400

@ronaldtse
Copy link
Member Author

So increasing the number of parallel requests is the only way forward...

@skalee
Copy link
Contributor

skalee commented Feb 25, 2021

And this might work! https://www.genui.com/open-source/s3p-massively-parallel-s3-copying

Unfortunately, it looks like it's tool for copying files between buckets, not for uploading local files to bucket. On the other hand, some ideas will be very handy if we decide to craft our own tool using AWS API.

https://aws.amazon.com/premiumsupport/knowledge-center/s3-upload-large-files/

1. If you're using the AWS CLI, customize the upload configurations. i.e. increase parallelism (https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html)

2. Enable Amazon S3 Transfer Acceleration.

This is mainly focused on uploading large files, not large amounts of files.

@skalee
Copy link
Contributor

skalee commented Feb 26, 2021

We can close this one I guess. Summing it all up:

if concept has multiple sources, only first one is displayed

See geolexica/geolexica-server#151.

I will not update domains list, but all concepts are accessible either via search or via "browse all" link

See #68.

various Relaton issues in domain 112, need to investigate

See https://github.com/glossarist/iev-demo-site/issues/69.

stats are broken (I need to update language names)

See geolexica/geolexica-server#161.

search is bit slow but works (I got some improvements on a branch, maybe they'll help, though I think it's mostly due to fetching a large JSON)

See geolexica/geolexica-server#162.

It's still deploying. One hour and counting.

See geolexica/geolexica-server#160.

@skalee skalee closed this as completed Feb 26, 2021
@skalee skalee reopened this Feb 26, 2021
@skalee
Copy link
Contributor

skalee commented Feb 26, 2021

Oh, I'll better close it with some nice pull request so that we have cleaner Git history.

@skalee skalee mentioned this issue Mar 1, 2021
@skalee skalee closed this as completed in #71 Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants