Our solution acheived 2-st place on the public and private leaderboard.
We used stacking pipenile for our model validation. At the start, we choose a test sample that will be used for validation. Next, we get predictions on 5 folds and train a meta-model on them. Logistic regression is used as a meta-model. Then, the models are trained on all the data for the final prediction.
This is neural network based solution. 8 out of 10 models in the ensemble use embeddings to obtain predictions. We use pytorch-lifestream library to simplify work with sequences data. The main models used in the solution are: CoLES, WTTE-RNN, CoLES on a uniform grid, Supervised NN.
pipenv sync --dev
pipenv shell
# Load data
sh get_data.sh
To get embeddings and models for finetuning run notebooks:
nn_train/coles-train_ensemble.ipynb
nn_train/wtte-coles_ensemble.ipynb
nn_train/wtte-rnn.ipynb
and all notebooks at 2_lvl_train folder.
To get predicts for metamodel run notebooks:
nn_train/supervised-coles.ipynb
nn_train/supervised-wtte-coles.ipynb
To get final predict run metamodel.ipynb