Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change data_loader_engine to 'merlin' in examples #580

Merged
merged 1 commit into from
Dec 16, 2022

Conversation

edknv
Copy link
Collaborator

@edknv edknv commented Dec 15, 2022

A follow-up to #547

Goals ⚽

  • Change data_loader_engine to merlin (instead of nvtabular).

With the changes in PR #547, nvtabular is now simply an alias to merlin, i.e., data_loader_engine=nvtabular is equivalent to data_loader_engine=merlin and both will use Merlin dataloader (nvtabular was not removed for backward compatibility).

This PR changes the customer-facing examples to use data_loader_engine=merlin in order to promote Merlin Dataloader as the correct engine to use going forward.

Implementation Details 🚧

The CI scripts are also changed to use --data_loader_engine merlin.

Testing Details 🔍

For the CI script changes, manually ran ./ci/test_integration.sh in the merlin-pytorch:22.11 container (after upgrading/installing core and dataloader).

@edknv edknv added area/examples chore Maintenance for the repository ci labels Dec 15, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #580 of commit 880aae1385b5604226e984aedcbdc659dce0993d, no merge conflicts.
Running as SYSTEM
Setting status of 880aae1385b5604226e984aedcbdc659dce0993d to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/409/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on the built-in node in workspace /var/jenkins_home/jobs/transformers4rec_tests/workspace
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/580/*:refs/remotes/origin/pr/580/* # timeout=10
 > git rev-parse 880aae1385b5604226e984aedcbdc659dce0993d^{commit} # timeout=10
Checking out Revision 880aae1385b5604226e984aedcbdc659dce0993d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 880aae1385b5604226e984aedcbdc659dce0993d # timeout=10
Commit message: "Change data_loader_engine to  in examples"
 > git rev-list --no-walk 6e64490a3835814f6c465bbcdd1560386451a35f # timeout=10
[workspace] $ /bin/bash /tmp/jenkins7004963983918626290.sh
GLOB sdist-make: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/setup.py
py38-gpu recreate: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/.tox/py38-gpu
py38-gpu inst: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/.tox/.tmp/package/1/transformers4rec-0.1.14+34.g880aae13.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
py38-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2022.12.7,cffi==1.15.1,charset-normalizer==2.1.1,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,docker-pycreds==0.4.0,docutils==0.16,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==3.4,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular==1.4.0+8.g95e12d347,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.28.1,requests-oauthlib==1.3.1,rsa==4.7.2,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.16.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,tabulate==0.8.10,tblib==1.7.0,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.14+34.g880aae13,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
py38-gpu run-test-pre: PYTHONHASHSEED='4291785767'
py38-gpu run-test: commands[0] | pip install --upgrade pip
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in ./.tox/py38-gpu/lib/python3.8/site-packages (22.3.1)
py38-gpu run-test: commands[1] | pip install .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: tqdm>=4.27 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (4.64.1)
Requirement already satisfied: tensorflow-metadata in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.12.0)
Requirement already satisfied: transformers<4.19 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (4.18.0)
Requirement already satisfied: betterproto<2.0.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.2.5)
Requirement already satisfied: pyarrow>=1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (10.0.1)
Requirement already satisfied: numpy>=1.17.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.23.5)
Requirement already satisfied: stringcase in ./.tox/py38-gpu/lib/python3.8/site-packages (from betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (1.2.0)
Requirement already satisfied: grpclib in ./.tox/py38-gpu/lib/python3.8/site-packages (from betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (0.4.3)
Requirement already satisfied: packaging>=20.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (22.0)
Requirement already satisfied: regex!=2019.12.17 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2022.10.31)
Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.11.1)
Requirement already satisfied: sacremoses in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.0.53)
Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.12.1)
Requirement already satisfied: requests in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2.28.1)
Requirement already satisfied: filelock in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (3.8.2)
Requirement already satisfied: pyyaml>=5.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (6.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (1.3.0)
Requirement already satisfied: protobuf<4,>=3.13 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (3.20.3)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (1.57.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.tox/py38-gpu/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (4.4.0)
Requirement already satisfied: h2<5,>=3.1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (4.1.0)
Requirement already satisfied: multidict in ./.tox/py38-gpu/lib/python3.8/site-packages (from grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (6.0.3)
Requirement already satisfied: idna<4,>=2.5 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2022.12.7)
Requirement already satisfied: charset-normalizer<3,>=2 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.26.13)
Requirement already satisfied: click in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (8.1.3)
Requirement already satisfied: joblib in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.2.0)
Requirement already satisfied: six in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.16.0)
Requirement already satisfied: hyperframe<7,>=6.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (6.0.1)
Requirement already satisfied: hpack<5,>=4.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (4.0.0)
Building wheels for collected packages: transformers4rec
  Building wheel for transformers4rec (pyproject.toml): started
  Building wheel for transformers4rec (pyproject.toml): finished with status 'done'
  Created wheel for transformers4rec: filename=transformers4rec-0.1.14+34.g880aae13-py3-none-any.whl size=481720 sha256=18f9978328d7d05c5abc992765c9085a9c4c14518012f7a169eb9f3718deff1a
  Stored in directory: /tmp/pip-ephem-wheel-cache-rb8eubdj/wheels/cb/5d/b4/e081835ae498194a418e957657f998bdff0fa2bd103855a861
Successfully built transformers4rec
Installing collected packages: transformers4rec
  Attempting uninstall: transformers4rec
    Found existing installation: transformers4rec 0.1.14+34.g880aae13
    Uninstalling transformers4rec-0.1.14+34.g880aae13:
      Successfully uninstalled transformers4rec-0.1.14+34.g880aae13
Successfully installed transformers4rec-0.1.14+34.g880aae13
___________________________________ summary ____________________________________
  py38-gpu: commands succeeded
  congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[workspace] $ /bin/bash /tmp/jenkins5679220209899127283.sh

@@ -65,7 +65,7 @@ Below is the updated command to reproduce the experiment [TRANSFORMERS WITH MULT
```bash
DATA_PATH=~/transformers4rec_paper_preproc_datasets/ecom_rees46/
FEATURE_SCHEMA_PATH=datasets_configs/ecom_rees46/rees46_schema.pbtxt
CUDA_VISIBLE_DEVICES=0 python3 -m t4r_paper_repro.transf_exp_main --output_dir ./tmp/ --overwrite_output_dir --do_train --do_eval --validate_every 10 --logging_steps 20 --save_steps 0 --data_path $DATA_PATH --features_schema_path $FEATURE_SCHEMA_PATH --fp16 --data_loader_engine nvtabular --start_time_window_index 1 --final_time_window_index 30 --time_window_folder_pad_digits 4 --model_type xlnet --loss_type cross_entropy --per_device_eval_batch_size 512 --similarity_type concat_mlp --tf_out_activation tanh --inp_merge mlp --learning_rate_warmup_steps 0 --learning_rate_schedule linear_with_warmup --hidden_act gelu --num_train_epochs 10 --dataloader_drop_last --compute_metrics_each_n_steps 1 --session_seq_length_max 20 --eval_on_last_item_seq_only --mf_constrained_embeddings --layer_norm_featurewise --attn_type bi --mlm --input_features_aggregation concat --per_device_train_batch_size 256 --learning_rate 0.00020171456712823088 --dropout 0.0 --input_dropout 0.0 --weight_decay 2.747484129693843e-05 --d_model 448 --item_embedding_dim 448 --n_layer 2 --n_head 8 --label_smoothing 0.5 --stochastic_shared_embeddings_replacement_prob 0.0 --item_id_embeddings_init_std 0.09 --other_embeddings_init_std 0.015 --mlm_probability 0.1 --embedding_dim_from_cardinality_multiplier 3.0 --eval_on_test_set --seed 100 --use_side_information_features
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to see the diff, but the only change is --data_loader_engine nvtabular -> --data_loader_engine merlin .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sararb do we still maintain tf4rec_paper_experiments ? Meaning if we change the dataloader_engine to merlin is that gonna break anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still maintaining tf4rec_paper_experiments because the main script is used in the integration tests (here). Changing the dataloader_engine won't break anything because the merlin and nvtabular aliases are both referring to the sameMerlinDataLoader class.

@edknv edknv requested review from rnyak and sararb December 15, 2022 19:19
@github-actions
Copy link

@edknv edknv marked this pull request as ready for review December 15, 2022 19:28
@edknv edknv merged commit 6864db2 into NVIDIA-Merlin:main Dec 16, 2022
@edknv edknv deleted the examples_data_loader_engine branch December 16, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/examples chore Maintenance for the repository ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants