Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: tests can be run from project root (#86) * refactor: instead of juggling global random states use instances of Random for datasets * test(): add test for interacting with custom queries After refactoring, it is possible to easily add list of query-response pairs for every model (config), which will be used to compare pretrained model output with expected output. Initial lists added for error_model and ner. Also URL for downloading pretrained ner_conll2003_model added IP-1344 #done * Update docs from master (#96) * fixed grammar and style * Update README.md * fix grammar & style * fix grammar & style * fix grammar&style in Intent classification README * doc: add supported platform notes * docs: correct paths to scripts and configs to be relative to repository root (#94) * docs: correct paths to scripts and configs to be relative to repository root fixes #93 * docs: set paths in basic examples to be relative to the project root * docs: run deep.py as a python module in examples * doc: add notes for python 3.5 * test(): change downloading to temp dir (#97) * feat: assert python version is 3.6 or higher * Rename dataset to dataset_iterator and other renames (#103) * refactor: rename 'dataset' to 'dataset_iterator' * refactor: rename dataset readers and iterators * refactor: classification iterator and reader * fix: dialog_iterator * test: fix downloading procedure (#108) * Feature/tf layers to core (#67) * feat: layers moved to core * feat: attention added * fix: highway/skip connections for different dimensionality of units are fixed * feat: NER now supports core layers * fix: minor docstrings fixes * feat: CuDNN GRU and LSTM added * feat: Bidirectional CuDNN GRU and LSTM added * feat: stacked bi-rnn refactored * fix: fixed arguments order in rnn * fix: remove duplicate mult_att * chore: merge with dev * fix: backward forward bug in cudnnrnn * refactor: use single fasttext module, clean dependencies * fix: add error when n_classes is zero * feat: add fastText model usage instead of fasttext * fix: emb_module default fastText * chore: embedding fixed in configs * chore: change new models names * feat: change intent embeddings in gobot configs * chore: fastText to fasttext, new model, change intents in gobot configs * chore: new url on new fasttext embeddings * fix: delete dowload all true * fix: add url of old embedding file * fix: delete comma * fix: delete old embedding file from urls * fix: delete pyfasttext from requirements, fasttext_embedder * fix: change pyfasttext embeddings from gobot * fix: delete from requirements * fix: delete gensim from fasttext_embedder * fix: simplify requirements * fix: fix dim in gobot_all config * refactor: remove redundant parameter 'emb_module' * feat: use wiki.en.bin embeddings in gobot_all * feat: check saved model params and fix lowercase for interact * fix: lowercase text while interact * feat: check saved model params * fix: rm extra configs * feat: add support for classification data in csv/json formats (#115) * feat: add support for csv/json classification datasets * feat: add tests for snips and samples * fix: gobot_all config fix * feat: add REST API for all models * Moved telegram_utils -> utils; Refactored telegram_ui.py * Moved telegram_utils -> utils: modified deeppavlov/deep.py * Fixed getting model name with get_main_component() in telegram_ui.py * chaner.py: minor fix in get_main_component() * Added riseapi launch mode * README.md: added riseapi mode reference * Updated README.MD and fixed requirements.txt * minor fixes in README.md * Fixes in utils/server * refactor: change endpoint names * feat: add SteamSpacyTokenizer * refactor: remove duplicating from script naming * refactor: outline detokenize() meth in utils, because it should be used by all tokenizers and doesn't depend on tokenize() * feat: add streaming spaCy tokenizer * refactor: DELETE original spaCy tokenizer, rename stream_spacy to spacy * refactor: rename tokenoizer scripts back * fix: wrong grammar * feat: include spacy_tokenizer import * feat: replace old SpacyTokenizer with new StreamSpacyTokenizer * feat: ability to manage lowercasing from class constructor, typing improvements * fix: update go-bot configs, so they would work with StreamSpacyTokenizer the same as with the old tokenizer * feat: add optional logging to the spacy tokenizer * docs: update docstrings * refactor: replace custom logger with deppavlov's, pep8 style update * refactor: uotline ngramize() cause it is independent from tokenizer classes * refactor: return original JSON formatting * fix: add **kwargs to __init__() * chore: update .gitignore * refactor: more stable and consistent code * feat: add TravisCI integration * build(): add TravisCI integration * build(): add TravisCI integration * feat: add ranking model * feat: add ranking model to deeppavlov * feat: add download of dataset and embedding_model * feat: adapt to new deeppavlov interfaces * refactor: use pathlib where available in the ranking model * feat: add saving and loading responses saving with np.save * feat: add saving and loading response embeddings saving with np.save, use response embeddings to calculate predictions in __call__ function * feat: add interact regime * feat: add interact_pred_num parameter * refactor: change parameter default value, change check if the file with embeddings model exists * fix: fix non-string keys in EmbeddingDict class * feat: add parameters dict for autotests * feat: add tests support * feat: add context embeddings vocabulary (it is used in interact regime to predict the most similar contexts) * chore: change shuffle parameter default value to True in batch_generator * refactor: change config to chainer representation * fix: bug fix in urls.py file * refactor: remove emb_vocab_file saving, move build_tok2int_vocab and make_ints funcs to InsuranceDict class, add set_embeddings and reset_embeddings funcs in RankingModel * feat: add initial documentation * refactor: remove idx2int vocabulary, add vocabularies saving * change config parameters default values, remove examples in tests * feat: add table in documentation * fix: fix bug in urls.py * refactor: remove paths from config * feat: add documentation * feat: add True in tests * feat: add documentation * refactor: move init/load in the load function. * refactor: change parameters in config * feat: add logging * feat: add more logging * feat: add documentation, change parameters values in config * fix: add genesis for ranking model * fix: requirements installation order that caused setup.py error * refactor: train script * feat: add documentation * feat: models parameters check for ner * feat: parameters check added to ner * feat: parameters check added to slotfill * chore: minor clean-up * fix: fix conll-2003 model file names and archive names * refactor: remove blank line * feat: allow to stop training after n batches (#127) * fix: many minor fixes * fix: fix mark_done data_path * refactor: rename ranking_dataset to ranking_iterator.py and move it to the dataset_iterators folder * fix: fix embedding matrix construction, change epochs num default parameter value * refactor: rename registered name and name of the class * refactor: rename files and classes * refactor: change dataset downlaod * feat: add insurance embeddings and datasets in urls.py * refactor: change batch data representation (#131) * feat: install tensorflow-gpu * feat: add SQUAD model * feat: add SQuAD dataset reader * feat: add dataset, preprocessing, config * feat: add VocabEmbedder for chars and tokens * feat&fix: add model realization * feat: add training support, answer postprocessing * fix: predicted answer extraction from context * fix: dropout mask * feat: true_answer is a list of answers now * merge with dev * docs: add some docstrings * refactor: renaming variables * docs: add README.md * feat: add support of multiple inputs and outputs in interact mode * docs: upd README.md * fix: bugs after merge with dev * fix: turn on training vocabs * fix: remove keep_prob multiplier for dropout mask * fix: add short contexts support * docs: upd README.md * feat: chainer returns batch of tuples instead of tuple of batches * docs: upd squad README.md * docs: upd squad README.md * feat: add link to pretrained SQuAD model * fix: SQuAD model url * feat: add embeddings downloading and upd config * feat: add variable scope for optimizer * refactor: do not override __init__ method for squad_iterator * fix: ensure that directory exists before saving SquadVocabEmbedder * style: upd names in config and docs * chore: remove main.py used for debugging * docs: upd README.md * fix: change batch_size to fix possible OOM * test: add possibility to interact with several input query * chore: add max_batches to squad config * docs: upd README.md * fix(ranking_network): wrap y as np.array * fix: fix training stop for pytest * style: add license header * fix: refactor training stop for pytest * test: specify pytest_max_batches * feat: use all pytest keys and not only max_batches (#134) * fix: remove result stringification * feat: add GPU_only and Slow marks for tests * feat: add SQuAD dataset reader * feat: add dataset, preprocessing, config * feat: add VocabEmbedder for chars and tokens * feat&fix: add model realization * feat: add training support, answer postprocessing * fix: predicted answer extraction from context * fix: dropout mask * feat: true_answer is a list of answers now * merge with dev * docs: add some docstrings * refactor: renaming variables * docs: add README.md * feat: add support of multiple inputs and outputs in interact mode * docs: upd README.md * fix: bugs after merge with dev * fix: turn on training vocabs * fix: remove keep_prob multiplier for dropout mask * fix: add short contexts support * docs: upd README.md * feat: chainer returns batch of tuples instead of tuple of batches * docs: upd squad README.md * docs: upd squad README.md * feat: add link to pretrained SQuAD model * fix: SQuAD model url * feat: add embeddings downloading and upd config * feat: add variable scope for optimizer * refactor: do not override __init__ method for squad_iterator * fix: ensure that directory exists before saving SquadVocabEmbedder * style: upd names in config and docs * chore: remove main.py used for debugging * docs: upd README.md * fix: change batch_size to fix possible OOM * test: add possibility to interact with several input query * chore: add max_batches to squad config * docs: upd README.md * fix(ranking_network): wrap y as np.array * fix: fix training stop for pytest * style: add license header * fix: refactor training stop for pytest * test: specify pytest_max_batches * test: add couple of marks for selecting tests * test: make Travis running only fast tests without GPU * fix: ranking config works in interactbot * fix: add downloading nltk punkt for tokenization (#140) * feat: bot start message for intents does not say anything about dstc2 (#142) * feat: interactbot command works with pipes that require multiple inputs (#137) * build: change TravisCI script (#143) * feat: add Glove embedder (#138) * feat: glove embedder added * feat: embeddings added to NER network * feat: dataset and embeddings are added to urls.py for downloading * fix: char embeddings added to pretrained embeddings * feat: embedder return list of embeddings instead zero padded np array * feat: capitalization added * feat: config modified according to new features * feat: double dense added to input parameters * feat:config parameters updated * chore: fix urls for conll NER, ontonotes model url added * feat: pytest_max_batches added for faster tran check * feat: ontonotes tests added * feat: test conll max batches added * Update README.md * feat: add seq2seq go bot * fix: lowercase text while interact * feat: check saved model params * fix: rm extra configs * feat: add kvret dataset_reader * feat: add kvret_dataset_iterator * fix: add configerror * fix: dirty fix for dialog data to be lowercased * feat: check np.int and int in Vocabulary * feat: seq2seqbot works for train and infer * feat: add bleu-metric * feat: add simple seq2seq_go_bot config * fix: fix inference and load() * feat: add variable scope for optimizer * feat: add support of multiple inputs and outputs in interact mode * fix: fix padding * feat: tokenizer argument in Vocabulary * feat: chainer returns batch of tuples instead of tuple of batches * fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument * feat: add per_item_bleu * feat: train seq2seq_go_bot on utterance batches * feat: tokenize y_true * feat: fit kb_entries knowledge base * feat: add split tokenizer * feat: standartize tokenizers output * feat: normalize kb entities * feat: db_columns, db_items in each sample * fix: go_bot configs (for new vocab) and loading of network * style: minor restyling * feat: add config for infer * feat: add config for infer * feat: add seq2seq_go_bot pretrained model * feat: update telegram start and help messages * style: minor styling * docs: add simple readme * doc: remove red ... blocks * doc: change Dataset to DatasetIterator * doc: update list of configs * doc: update package structure * doc: add notes about dataset element in config * feat: add squad model description to README.md * doc: add config specification for seq2seq_go_bot * fix: lowercase text while interact * feat: check saved model params * fix: rm extra configs * feat: add kvret dataset_reader * feat: add kvret_dataset_iterator * fix: add configerror * fix: dirty fix for dialog data to be lowercased * feat: check np.int and int in Vocabulary * feat: seq2seqbot works for train and infer * feat: add bleu-metric * feat: add simple seq2seq_go_bot config * fix: fix inference and load() * feat: add variable scope for optimizer * feat: add support of multiple inputs and outputs in interact mode * fix: fix padding * feat: tokenizer argument in Vocabulary * feat: chainer returns batch of tuples instead of tuple of batches * fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument * feat: add per_item_bleu * feat: train seq2seq_go_bot on utterance batches * feat: tokenize y_true * feat: fit kb_entries knowledge base * feat: add split tokenizer * feat: standartize tokenizers output * feat: normalize kb entities * feat: db_columns, db_items in each sample * fix: go_bot configs (for new vocab) and loading of network * style: minor restyling * feat: add config for infer * feat: add config for infer * feat: add seq2seq_go_bot pretrained model * feat: update telegram start and help messages * style: minor styling * docs: add simple readme * docs: add seq2seq_go_bot in main readme * docs: small fix * docs: add config specification for seq2seq_go_bot * chore: remove install.py (#151) * feat: add support for batches in go-bot * feat: batching v1 * feat: bow_encoder is optional * fix: probs calculation for use_action_mask=true * refactor: do not feed inital_state during train * feat: feed sequence lengths in dynamic_rnn * refactor: rename go_bot.py -> bot.py
- Loading branch information