Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Better experience with reproducing results #10

Open
OlehOnyshchak opened this issue Oct 30, 2019 · 5 comments
Open

Comments

@OlehOnyshchak
Copy link

OlehOnyshchak commented Oct 30, 2019

First of all, I want to thank you for this repository with well-written documentation. But trying to reproduce your results, wanted to share some feedback on how it could be even better.

  • Setting up an environment is very slow:
    • Downloading all required files from http://lixirong.net/ is very slow and regularly fails due to network errors. For example, it takes a few days to download word2vec.tar.gz
      • Is it possible to host files on some well-supported file-sharing system such as Google Drive and redirect requests from your site to it? Or store them directly in git with Git Large File Storage
      • Ideally, is to have everything accessible from an online-container with code&data without the need to download anything at all, such as Kaggle
    • Since support for python 2 is about to end, it might be a good idea to migrate this repo to python 3 as well. I already did it so that it works in my environment, will try to attach diff if everything works properly.
  • README does a great job explaining details of the project and how to reproduce it locally. Although, an area of specific content/structure of some data files wasn't clear for me. So I'm currently in the process of exploring your paper and the source code to figure it out. It would have been a great addition to the guide if dataset and word2vec were more verbosely covered.

One more time, that's just my subjective feedback on how to make this repository even better, if you have the possibility to maintain it. Thank you again for putting a lot of work in organizing code and README here, that helped a lot!

Thanks,
Oleh

@RakeshRadarapu
Copy link

First of all, I want to thank you for this repository with well-written documentation. But trying to reproduce your results, wanted to share some feedback on how it could be even better.

* Setting up an environment is very slow:
  
  * Downloading all required files from http://lixirong.net/ is very slow and regularly fails due to network errors. For example, it takes a few days to download [word2vec.tar.gz](http://lixirong.net/data/w2vv-tmm2018/word2vec.tar.gz)
    
    * Is it possible to host files on some well-supported file-sharing system such as [Google Drive](https://www.google.com/drive/) and redirect requests from your site to it? Or store them directly in git with [Git Large File Storage](https://git-lfs.github.com/)
    * Ideally, is to have everything accessible from an online-container with code&data without the need to download anything at all, such as [Kaggle](https://www.kaggle.com/)
  * Since support for python 2 is [about to end](https://github.com/python/devguide/pull/344), it might be a good idea to migrate this repo to python 3 as well. I already did it so that it works in my environment, will try to attach diff if everything works properly.

* README does a great job explaining details of the project and how to reproduce it locally. Although, an area of specific content/structure of some data files wasn't clear for me. So I'm currently in the process of exploring your paper and the source code to figure it out.  It would have been a great addition to the guide if dataset and word2vec were more verbosely covered.

One more time, that's just my subjective feedback on how to make this repository even better, if you have the possibility to maintain it. Thank you again for putting a lot of work in organizing code and README here, that helped a lot!

Thanks,
Oleh

Hi Oleh,
We are also trying to reproduce the results but since py2 is no longer available we are facing issues. Hope you could share your py3 code! That'd be a great deal for us. And also what are the versions of all other libraries you used?

Thanking you
Rakesh

@danieljf24
Copy link
Owner

Sorry for the late reply. The data can be downloaded from Google Drive and Baidu Pan.

@OlehOnyshchak
Copy link
Author

Hi @RakeshRadarapu. There were too much behaviour changes in depended libraries when migrating from py2 to py3, so we decided to work with the original version after one day of trying.

Although, the following resources might be of your interest:

Thank you for the alternative links to download the dataset @danieljf24

@RakeshRadarapu
Copy link

Hi @RakeshRadarapu. There were too much behaviour changes in depended libraries when migrating from py2 to py3, so we decided to work with the original version after one day of trying.

Although, the following resources might be of your interest:

* [Wikipedia Image Recommendation](https://github.com/OlehOnyshchak/WikiImageRecommendation) project, where we reused Word2VisualVec

* [Kaggle Data Preprocessing notebook](https://www.kaggle.com/jacksoncrow/dataset-preprocessing), where we transformed our data into Word2VisualVec format

* [Kaggle Word2VisualVec training](https://www.kaggle.com/jacksoncrow/w2vvtraining), where we set up py2 environment for Word2VisualVec model. You can just fork this notebook and train the model with your data without any downloads/configurations

Thank you for the alternative links to download the dataset @danieljf24

Hi Oleh,
Thanks for the concern. Can I know the Tensorflow and the cuda version you used for the py2 model.

@OlehOnyshchak
Copy link
Author

Hi Oleh,
Thanks for the concern. Can I know the Tensorflow and the cuda version you used for the py2 model.

Hi @RakeshRadarapu. Tensorflow version is 1.15, while CUDA is V10.0.130. You can also check this and other environment details in interactive python prompt via the link to Kaggle, which I provided above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants