[Feature Request]: Better experience with reproducing results #10

OlehOnyshchak · 2019-10-30T13:35:07Z

First of all, I want to thank you for this repository with well-written documentation. But trying to reproduce your results, wanted to share some feedback on how it could be even better.

Setting up an environment is very slow:
- Downloading all required files from http://lixirong.net/ is very slow and regularly fails due to network errors. For example, it takes a few days to download word2vec.tar.gz
  - Is it possible to host files on some well-supported file-sharing system such as Google Drive and redirect requests from your site to it? Or store them directly in git with Git Large File Storage
  - Ideally, is to have everything accessible from an online-container with code&data without the need to download anything at all, such as Kaggle
- Since support for python 2 is about to end, it might be a good idea to migrate this repo to python 3 as well. I already did it so that it works in my environment, will try to attach diff if everything works properly.
README does a great job explaining details of the project and how to reproduce it locally. Although, an area of specific content/structure of some data files wasn't clear for me. So I'm currently in the process of exploring your paper and the source code to figure it out. It would have been a great addition to the guide if dataset and word2vec were more verbosely covered.

One more time, that's just my subjective feedback on how to make this repository even better, if you have the possibility to maintain it. Thank you again for putting a lot of work in organizing code and README here, that helped a lot!

Thanks,
Oleh

RakeshRadarapu · 2020-01-24T10:24:24Z

First of all, I want to thank you for this repository with well-written documentation. But trying to reproduce your results, wanted to share some feedback on how it could be even better.

* Setting up an environment is very slow:
  
  * Downloading all required files from http://lixirong.net/ is very slow and regularly fails due to network errors. For example, it takes a few days to download [word2vec.tar.gz](http://lixirong.net/data/w2vv-tmm2018/word2vec.tar.gz)
    
    * Is it possible to host files on some well-supported file-sharing system such as [Google Drive](https://www.google.com/drive/) and redirect requests from your site to it? Or store them directly in git with [Git Large File Storage](https://git-lfs.github.com/)
    * Ideally, is to have everything accessible from an online-container with code&data without the need to download anything at all, such as [Kaggle](https://www.kaggle.com/)
  * Since support for python 2 is [about to end](https://github.com/python/devguide/pull/344), it might be a good idea to migrate this repo to python 3 as well. I already did it so that it works in my environment, will try to attach diff if everything works properly.

* README does a great job explaining details of the project and how to reproduce it locally. Although, an area of specific content/structure of some data files wasn't clear for me. So I'm currently in the process of exploring your paper and the source code to figure it out.  It would have been a great addition to the guide if dataset and word2vec were more verbosely covered.

One more time, that's just my subjective feedback on how to make this repository even better, if you have the possibility to maintain it. Thank you again for putting a lot of work in organizing code and README here, that helped a lot!

Thanks,
Oleh

Hi Oleh,
We are also trying to reproduce the results but since py2 is no longer available we are facing issues. Hope you could share your py3 code! That'd be a great deal for us. And also what are the versions of all other libraries you used?

Thanking you
Rakesh

danieljf24 · 2020-01-27T08:56:20Z

Sorry for the late reply. The data can be downloaded from Google Drive and Baidu Pan.

OlehOnyshchak · 2020-01-27T10:38:49Z

Hi @RakeshRadarapu. There were too much behaviour changes in depended libraries when migrating from py2 to py3, so we decided to work with the original version after one day of trying.

Although, the following resources might be of your interest:

Wikipedia Image Recommendation project, where we reused Word2VisualVec
Kaggle Data Preprocessing notebook, where we transformed our data into Word2VisualVec format
Kaggle Word2VisualVec training, where we set up py2 environment for Word2VisualVec model. You can just fork this notebook and train the model with your data without any downloads/configurations

Thank you for the alternative links to download the dataset @danieljf24

RakeshRadarapu · 2020-01-27T18:38:01Z

Hi @RakeshRadarapu. There were too much behaviour changes in depended libraries when migrating from py2 to py3, so we decided to work with the original version after one day of trying.

Although, the following resources might be of your interest:
* [Wikipedia Image Recommendation](https://github.com/OlehOnyshchak/WikiImageRecommendation) project, where we reused Word2VisualVec

* [Kaggle Data Preprocessing notebook](https://www.kaggle.com/jacksoncrow/dataset-preprocessing), where we transformed our data into Word2VisualVec format

* [Kaggle Word2VisualVec training](https://www.kaggle.com/jacksoncrow/w2vvtraining), where we set up py2 environment for Word2VisualVec model. You can just fork this notebook and train the model with your data without any downloads/configurations
Thank you for the alternative links to download the dataset @danieljf24

Hi Oleh,
Thanks for the concern. Can I know the Tensorflow and the cuda version you used for the py2 model.

OlehOnyshchak · 2020-02-06T18:46:37Z

Hi Oleh,
Thanks for the concern. Can I know the Tensorflow and the cuda version you used for the py2 model.

Hi @RakeshRadarapu. Tensorflow version is 1.15, while CUDA is V10.0.130. You can also check this and other environment details in interactive python prompt via the link to Kaggle, which I provided above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Better experience with reproducing results #10

[Feature Request]: Better experience with reproducing results #10

OlehOnyshchak commented Oct 30, 2019 •

edited

Loading

RakeshRadarapu commented Jan 24, 2020

danieljf24 commented Jan 27, 2020

OlehOnyshchak commented Jan 27, 2020

RakeshRadarapu commented Jan 27, 2020

OlehOnyshchak commented Feb 6, 2020

[Feature Request]: Better experience with reproducing results #10

[Feature Request]: Better experience with reproducing results #10

Comments

OlehOnyshchak commented Oct 30, 2019 • edited Loading

RakeshRadarapu commented Jan 24, 2020

danieljf24 commented Jan 27, 2020

OlehOnyshchak commented Jan 27, 2020

RakeshRadarapu commented Jan 27, 2020

OlehOnyshchak commented Feb 6, 2020

OlehOnyshchak commented Oct 30, 2019 •

edited

Loading