Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify processors - add Fasttokenizers #649

Merged
merged 77 commits into from
Dec 23, 2020
Merged

Commits on Nov 13, 2020

  1. Configuration menu
    Copy the full SHA
    6b5e0a1 View commit details
    Browse the repository at this point in the history
  2. use correct version

    Timoeller committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    a96aca7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1ea854d View commit details
    Browse the repository at this point in the history
  4. Remove unused imports

    Timoeller committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    f5c77bc View commit details
    Browse the repository at this point in the history
  5. Adjust tests

    Timoeller committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    2f93100 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2020

  1. Remove test

    Timoeller committed Nov 16, 2020
    Configuration menu
    Copy the full SHA
    6663847 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2020

  1. Configuration menu
    Copy the full SHA
    6a9c723 View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2020

  1. Configuration menu
    Copy the full SHA
    374e362 View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2020

  1. Configuration menu
    Copy the full SHA
    b3cb744 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8d2152b View commit details
    Browse the repository at this point in the history
  3. Make code more readable

    bogdankostic committed Nov 25, 2020
    Configuration menu
    Copy the full SHA
    53d533a View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2020

  1. Configuration menu
    Copy the full SHA
    a40d0d1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    07847aa View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    10ecdb6 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    38a79d6 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2917335 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    a6171ec View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2020

  1. Configuration menu
    Copy the full SHA
    270996c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a23daab View commit details
    Browse the repository at this point in the history
  3. Add inference flag

    brandenchan committed Dec 4, 2020
    Configuration menu
    Copy the full SHA
    4779077 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2020

  1. Configuration menu
    Copy the full SHA
    49d5a7c View commit details
    Browse the repository at this point in the history
  2. Fix dataset duplication

    brandenchan committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    801108b View commit details
    Browse the repository at this point in the history
  3. Remove inference flag

    brandenchan committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    7d10f74 View commit details
    Browse the repository at this point in the history
  4. Trigger CI for PR

    brandenchan committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    a0bd77a View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    17eeb9e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c2a0448 View commit details
    Browse the repository at this point in the history
  7. Fix test

    brandenchan committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    49afce7 View commit details
    Browse the repository at this point in the history
  8. Small rename

    Timoeller committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    7ccbd40 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    253f8f0 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    7090f13 View commit details
    Browse the repository at this point in the history
  11. Change CI

    Timoeller committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    26b06b5 View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2020

  1. Configuration menu
    Copy the full SHA
    6eaa5bf View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2020

  1. Configuration menu
    Copy the full SHA
    024c899 View commit details
    Browse the repository at this point in the history
  2. WIP: Refactor NER

    brandenchan committed Dec 11, 2020
    Configuration menu
    Copy the full SHA
    c78d663 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2020

  1. Configuration menu
    Copy the full SHA
    4a2eb37 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'refactor_processor_qa' of github.com:deepset-ai/FARM in…

    …to refactor_processor_qa
    Timoeller committed Dec 12, 2020
    Configuration menu
    Copy the full SHA
    abfd335 View commit details
    Browse the repository at this point in the history
  3. Bugfix label creation

    Timoeller committed Dec 12, 2020
    Configuration menu
    Copy the full SHA
    9e602c0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3f69bea View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    53c5e0b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0c3c9d6 View commit details
    Browse the repository at this point in the history
  7. Bugfix tokenization - special tokens should only be added when combin…

    …ing question and passage
    Timoeller committed Dec 12, 2020
    Configuration menu
    Copy the full SHA
    4c8021e View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2020

  1. Configuration menu
    Copy the full SHA
    330c8db View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    db38ced View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    83d5244 View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2020

  1. Configuration menu
    Copy the full SHA
    1d40623 View commit details
    Browse the repository at this point in the history
  2. Revert "WIP: Refactor NER"

    This reverts commit c78d663.
    brandenchan committed Dec 14, 2020
    Configuration menu
    Copy the full SHA
    dd89a3c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    dc54e38 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4ec59ed View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2020

  1. Neaten logging statement

    brandenchan committed Dec 15, 2020
    Configuration menu
    Copy the full SHA
    05ebdb0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fa94474 View commit details
    Browse the repository at this point in the history
  3. Turn off slow tokenizers

    brandenchan committed Dec 15, 2020
    Configuration menu
    Copy the full SHA
    ee60467 View commit details
    Browse the repository at this point in the history
  4. Fix tokenization tests

    brandenchan committed Dec 15, 2020
    Configuration menu
    Copy the full SHA
    6cf6fb1 View commit details
    Browse the repository at this point in the history
  5. Refactor NER for fast tokenizers (#656)

    * WIP: Refactor NER
    
    * Rewrite featurize function
    
    * Featurize fn moved into processor.py
    
    * Remove return problematic flag
    brandenchan authored Dec 15, 2020
    Configuration menu
    Copy the full SHA
    0b57863 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5d53ce0 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2020

  1. Configuration menu
    Copy the full SHA
    a3a5f3e View commit details
    Browse the repository at this point in the history
  2. WIP: Fix NER inference

    brandenchan committed Dec 16, 2020
    Configuration menu
    Copy the full SHA
    491058a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6d6c5ce View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7c65126 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c790b98 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d063df1 View commit details
    Browse the repository at this point in the history
  7. Fix tokenization tests

    brandenchan committed Dec 16, 2020
    Configuration menu
    Copy the full SHA
    becd4e2 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4309719 View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2020

  1. Configuration menu
    Copy the full SHA
    d0a6f36 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    dbd973a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d77d89c View commit details
    Browse the repository at this point in the history
  4. Fix tokenizer test

    brandenchan committed Dec 17, 2020
    Configuration menu
    Copy the full SHA
    ab4e40c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ddf02b3 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    3095b42 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    fceaf65 View commit details
    Browse the repository at this point in the history
  8. Merge branch 'refactor_processor_qa' of github.com:deepset-ai/FARM in…

    …to refactor_processor_qa
    Timoeller committed Dec 17, 2020
    Configuration menu
    Copy the full SHA
    ecec503 View commit details
    Browse the repository at this point in the history
  9. Disable NQ tests

    Timoeller committed Dec 17, 2020
    Configuration menu
    Copy the full SHA
    ca7372a View commit details
    Browse the repository at this point in the history
  10. Fix onnx conversion test

    Timoeller committed Dec 17, 2020
    Configuration menu
    Copy the full SHA
    909ea9d View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2020

  1. Add assert for parameter checks in data validation, change num cpus, …

    …adjust qa benchmark to new values
    Timoeller committed Dec 18, 2020
    Configuration menu
    Copy the full SHA
    691dddb View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2020

  1. Refactoring Processor for LM Finetuning (FastTokenizers) (#659)

    * WIP lm finetuning refactoring
    
    * WIP refactoring bert style lm
    
    * first working version of bert_style_lm
    
    * optimize speed of mask_random_words
    
    * move get_start_of_words to tokenization module
    
    * Update docstrings. fix estimation
    
    * add multithreading_rust arg
    
    * fix import. fix vocab index out of range
    
    * fix empty sequence b
    
    * make bert-style to new default for lm finetuning. disable eval_report
    
    * change evaluate_every to 1000
    tholor authored Dec 21, 2020
    Configuration menu
    Copy the full SHA
    803b41b View commit details
    Browse the repository at this point in the history
  2. 1 Configuration menu
    Copy the full SHA
    91bea62 View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2020

  1. Fix doc format

    Timoeller committed Dec 22, 2020
    Configuration menu
    Copy the full SHA
    bd12771 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'refactor_processor_qa' of github.com:deepset-ai/FARM in…

    …to refactor_processor_qa
    Timoeller committed Dec 22, 2020
    Configuration menu
    Copy the full SHA
    04cabf6 View commit details
    Browse the repository at this point in the history