-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change data_loader_engine to 'merlin' in examples #580
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Click to view CI ResultsGitHub pull request #580 of commit 880aae1385b5604226e984aedcbdc659dce0993d, no merge conflicts. Running as SYSTEM Setting status of 880aae1385b5604226e984aedcbdc659dce0993d to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/409/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on the built-in node in workspace /var/jenkins_home/jobs/transformers4rec_tests/workspace using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/580/*:refs/remotes/origin/pr/580/* # timeout=10 > git rev-parse 880aae1385b5604226e984aedcbdc659dce0993d^{commit} # timeout=10 Checking out Revision 880aae1385b5604226e984aedcbdc659dce0993d (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 880aae1385b5604226e984aedcbdc659dce0993d # timeout=10 Commit message: "Change data_loader_engine to in examples" > git rev-list --no-walk 6e64490a3835814f6c465bbcdd1560386451a35f # timeout=10 [workspace] $ /bin/bash /tmp/jenkins7004963983918626290.sh GLOB sdist-make: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/setup.py py38-gpu recreate: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/.tox/py38-gpu py38-gpu inst: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/.tox/.tmp/package/1/transformers4rec-0.1.14+34.g880aae13.zip WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration. py38-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2022.12.7,cffi==1.15.1,charset-normalizer==2.1.1,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,docker-pycreds==0.4.0,docutils==0.16,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==3.4,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular==1.4.0+8.g95e12d347,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.28.1,requests-oauthlib==1.3.1,rsa==4.7.2,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.16.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,tabulate==0.8.10,tblib==1.7.0,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.14+34.g880aae13,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0 py38-gpu run-test-pre: PYTHONHASHSEED='4291785767' py38-gpu run-test: commands[0] | pip install --upgrade pip Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in ./.tox/py38-gpu/lib/python3.8/site-packages (22.3.1) py38-gpu run-test: commands[1] | pip install . Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Processing /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'done' Requirement already satisfied: tqdm>=4.27 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (4.64.1) Requirement already satisfied: tensorflow-metadata in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.12.0) Requirement already satisfied: transformers<4.19 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (4.18.0) Requirement already satisfied: betterproto<2.0.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.2.5) Requirement already satisfied: pyarrow>=1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (10.0.1) Requirement already satisfied: numpy>=1.17.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.23.5) Requirement already satisfied: stringcase in ./.tox/py38-gpu/lib/python3.8/site-packages (from betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (1.2.0) Requirement already satisfied: grpclib in ./.tox/py38-gpu/lib/python3.8/site-packages (from betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (0.4.3) Requirement already satisfied: packaging>=20.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (22.0) Requirement already satisfied: regex!=2019.12.17 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2022.10.31) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.11.1) Requirement already satisfied: sacremoses in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.0.53) Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.12.1) Requirement already satisfied: requests in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2.28.1) Requirement already satisfied: filelock in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (3.8.2) Requirement already satisfied: pyyaml>=5.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (6.0) Requirement already satisfied: absl-py<2.0.0,>=0.9 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (1.3.0) Requirement already satisfied: protobuf<4,>=3.13 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (3.20.3) Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (1.57.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.tox/py38-gpu/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (4.4.0) Requirement already satisfied: h2<5,>=3.1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (4.1.0) Requirement already satisfied: multidict in ./.tox/py38-gpu/lib/python3.8/site-packages (from grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (6.0.3) Requirement already satisfied: idna<4,>=2.5 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (3.4) Requirement already satisfied: certifi>=2017.4.17 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2022.12.7) Requirement already satisfied: charset-normalizer<3,>=2 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2.1.1) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.26.13) Requirement already satisfied: click in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (8.1.3) Requirement already satisfied: joblib in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.2.0) Requirement already satisfied: six in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.16.0) Requirement already satisfied: hyperframe<7,>=6.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (6.0.1) Requirement already satisfied: hpack<5,>=4.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (4.0.0) Building wheels for collected packages: transformers4rec Building wheel for transformers4rec (pyproject.toml): started Building wheel for transformers4rec (pyproject.toml): finished with status 'done' Created wheel for transformers4rec: filename=transformers4rec-0.1.14+34.g880aae13-py3-none-any.whl size=481720 sha256=18f9978328d7d05c5abc992765c9085a9c4c14518012f7a169eb9f3718deff1a Stored in directory: /tmp/pip-ephem-wheel-cache-rb8eubdj/wheels/cb/5d/b4/e081835ae498194a418e957657f998bdff0fa2bd103855a861 Successfully built transformers4rec Installing collected packages: transformers4rec Attempting uninstall: transformers4rec Found existing installation: transformers4rec 0.1.14+34.g880aae13 Uninstalling transformers4rec-0.1.14+34.g880aae13: Successfully uninstalled transformers4rec-0.1.14+34.g880aae13 Successfully installed transformers4rec-0.1.14+34.g880aae13 ___________________________________ summary ____________________________________ py38-gpu: commands succeeded congratulations :) Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [workspace] $ /bin/bash /tmp/jenkins5679220209899127283.sh |
@@ -65,7 +65,7 @@ Below is the updated command to reproduce the experiment [TRANSFORMERS WITH MULT | |||
```bash | |||
DATA_PATH=~/transformers4rec_paper_preproc_datasets/ecom_rees46/ | |||
FEATURE_SCHEMA_PATH=datasets_configs/ecom_rees46/rees46_schema.pbtxt | |||
CUDA_VISIBLE_DEVICES=0 python3 -m t4r_paper_repro.transf_exp_main --output_dir ./tmp/ --overwrite_output_dir --do_train --do_eval --validate_every 10 --logging_steps 20 --save_steps 0 --data_path $DATA_PATH --features_schema_path $FEATURE_SCHEMA_PATH --fp16 --data_loader_engine nvtabular --start_time_window_index 1 --final_time_window_index 30 --time_window_folder_pad_digits 4 --model_type xlnet --loss_type cross_entropy --per_device_eval_batch_size 512 --similarity_type concat_mlp --tf_out_activation tanh --inp_merge mlp --learning_rate_warmup_steps 0 --learning_rate_schedule linear_with_warmup --hidden_act gelu --num_train_epochs 10 --dataloader_drop_last --compute_metrics_each_n_steps 1 --session_seq_length_max 20 --eval_on_last_item_seq_only --mf_constrained_embeddings --layer_norm_featurewise --attn_type bi --mlm --input_features_aggregation concat --per_device_train_batch_size 256 --learning_rate 0.00020171456712823088 --dropout 0.0 --input_dropout 0.0 --weight_decay 2.747484129693843e-05 --d_model 448 --item_embedding_dim 448 --n_layer 2 --n_head 8 --label_smoothing 0.5 --stochastic_shared_embeddings_replacement_prob 0.0 --item_id_embeddings_init_std 0.09 --other_embeddings_init_std 0.015 --mlm_probability 0.1 --embedding_dim_from_cardinality_multiplier 3.0 --eval_on_test_set --seed 100 --use_side_information_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to see the diff, but the only change is --data_loader_engine nvtabular
-> --data_loader_engine merlin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sararb do we still maintain tf4rec_paper_experiments
? Meaning if we change the dataloader_engine to merlin
is that gonna break anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still maintaining tf4rec_paper_experiments
because the main
script is used in the integration tests (here). Changing the dataloader_engine won't break anything because the merlin
and nvtabular
aliases are both referring to the sameMerlinDataLoader
class.
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-580 |
A follow-up to #547
Goals ⚽
data_loader_engine
tomerlin
(instead ofnvtabular
).With the changes in PR #547,
nvtabular
is now simply an alias tomerlin
, i.e.,data_loader_engine=nvtabular
is equivalent todata_loader_engine=merlin
and both will use Merlin dataloader (nvtabular
was not removed for backward compatibility).This PR changes the customer-facing examples to use
data_loader_engine=merlin
in order to promote Merlin Dataloader as the correct engine to use going forward.Implementation Details 🚧
The CI scripts are also changed to use
--data_loader_engine merlin
.Testing Details 🔍
For the CI script changes, manually ran
./ci/test_integration.sh
in themerlin-pytorch:22.11
container (after upgrading/installingcore
anddataloader
).