Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Add pytest-cov and add test coverage command to the Makefile #2794

Merged
merged 5 commits into from
Jul 9, 2024

Conversation

fpgmaas
Copy link
Contributor

@fpgmaas fpgmaas commented Jun 28, 2024

I think it would be a good idea to increase test coverage of some of the modules in the project. In order to identify which modules could benefit from additional unit tests, I propose to add pytest-cov to the project, and an additional command in the Makefile to help generate the report locally.

Name                                                                                Stmts   Miss  Cover
-------------------------------------------------------------------------------------------------------
sentence_transformers/LoggingHandler.py                                                30     24    20%
sentence_transformers/SentenceTransformer.py                                          527     93    82%
sentence_transformers/__init__.py                                                      17      1    94%
sentence_transformers/cross_encoder/CrossEncoder.py                                   204     51    75%
sentence_transformers/cross_encoder/__init__.py                                         2      0   100%
sentence_transformers/cross_encoder/evaluation/CEBinaryAccuracyEvaluator.py            45     33    27%
sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py      56     42    25%
sentence_transformers/cross_encoder/evaluation/CECorrelationEvaluator.py               43     10    77%
sentence_transformers/cross_encoder/evaluation/CEF1Evaluator.py                        58     44    24%
sentence_transformers/cross_encoder/evaluation/CERerankingEvaluator.py                 67     57    15%
sentence_transformers/cross_encoder/evaluation/CESoftmaxAccuracyEvaluator.py           44     32    27%
sentence_transformers/cross_encoder/evaluation/__init__.py                              7      0   100%
sentence_transformers/data_collator.py                                                 23      0   100%
sentence_transformers/datasets/DenoisingAutoEncoderDataset.py                          29     18    38%
sentence_transformers/datasets/NoDuplicatesDataLoader.py                               31     25    19%
sentence_transformers/datasets/ParallelSentencesDataset.py                             92     75    18%
sentence_transformers/datasets/SentenceLabelDataset.py                                 52     42    19%
sentence_transformers/datasets/SentencesDataset.py                                     11      0   100%
sentence_transformers/datasets/__init__.py                                              6      0   100%
sentence_transformers/evaluation/BinaryClassificationEvaluator.py                     154     89    42%
sentence_transformers/evaluation/EmbeddingSimilarityEvaluator.py                       94     18    81%
sentence_transformers/evaluation/InformationRetrievalEvaluator.py                     210    188    10%
sentence_transformers/evaluation/LabelAccuracyEvaluator.py                             59     14    76%
sentence_transformers/evaluation/MSEEvaluator.py                                       52     39    25%
sentence_transformers/evaluation/MSEEvaluatorFromDataFrame.py                          68     54    21%
sentence_transformers/evaluation/ParaphraseMiningEvaluator.py                         120     35    71%
sentence_transformers/evaluation/RerankingEvaluator.py                                139    120    14%
sentence_transformers/evaluation/SentenceEvaluator.py                                  28      9    68%
sentence_transformers/evaluation/SequentialEvaluator.py                                26     20    23%
sentence_transformers/evaluation/SimilarityFunction.py                                  2      0   100%
sentence_transformers/evaluation/TranslationEvaluator.py                               78     64    18%
sentence_transformers/evaluation/TripletEvaluator.py                                   97     80    18%
sentence_transformers/evaluation/__init__.py                                           14      0   100%
sentence_transformers/fit_mixin.py                                                    279    170    39%
sentence_transformers/losses/AdaptiveLayerLoss.py                                     102     75    26%
sentence_transformers/losses/AnglELoss.py                                               7      2    71%
sentence_transformers/losses/BatchAllTripletLoss.py                                    28     18    36%
sentence_transformers/losses/BatchHardSoftMarginTripletLoss.py                         28     17    39%
sentence_transformers/losses/BatchHardTripletLoss.py                                   67     45    33%
sentence_transformers/losses/BatchSemiHardTripletLoss.py                               52     37    29%
sentence_transformers/losses/CachedGISTEmbedLoss.py                                   181    157    13%
sentence_transformers/losses/CachedMultipleNegativesRankingLoss.py                    105     15    86%
sentence_transformers/losses/CoSENTLoss.py                                             27     10    63%
sentence_transformers/losses/ContrastiveLoss.py                                        33     18    45%
sentence_transformers/losses/ContrastiveTensionLoss.py                                 70     47    33%
sentence_transformers/losses/CosineSimilarityLoss.py                                   17      2    88%
sentence_transformers/losses/DenoisingAutoEncoderLoss.py                               68     56    18%
sentence_transformers/losses/GISTEmbedLoss.py                                          60     48    20%
sentence_transformers/losses/MSELoss.py                                                18      9    50%
sentence_transformers/losses/MarginMSELoss.py                                          21     13    38%
sentence_transformers/losses/Matryoshka2dLoss.py                                       13      4    69%
sentence_transformers/losses/MatryoshkaLoss.py                                         70     51    27%
sentence_transformers/losses/MegaBatchMarginLoss.py                                    57     46    19%
sentence_transformers/losses/MultipleNegativesRankingLoss.py                           24      2    92%
sentence_transformers/losses/MultipleNegativesSymmetricRankingLoss.py                  24     15    38%
sentence_transformers/losses/OnlineContrastiveLoss.py                                  22     14    36%
sentence_transformers/losses/SoftmaxLoss.py                                            44      4    91%
sentence_transformers/losses/TripletLoss.py                                            32     17    47%
sentence_transformers/losses/__init__.py                                               25      0   100%
sentence_transformers/model_card.py                                                   460    150    67%
sentence_transformers/model_card_templates.py                                          33     21    36%
sentence_transformers/models/Asym.py                                                   73     59    19%
sentence_transformers/models/BoW.py                                                    57     39    32%
sentence_transformers/models/CLIPModel.py                                              59      4    93%
sentence_transformers/models/CNN.py                                                    51      4    92%
sentence_transformers/models/Dense.py                                                  45      4    91%
sentence_transformers/models/Dropout.py                                                21     11    48%
sentence_transformers/models/LSTM.py                                                   51      4    92%
sentence_transformers/models/LayerNorm.py                                              32      2    94%
sentence_transformers/models/Normalize.py                                              14      0   100%
sentence_transformers/models/Pooling.py                                               111     40    64%
sentence_transformers/models/Transformer.py                                           104     25    76%
sentence_transformers/models/WeightedLayerPooling.py                                   42      3    93%
sentence_transformers/models/WordEmbeddings.py                                        103     36    65%
sentence_transformers/models/WordWeights.py                                            47     33    30%
sentence_transformers/models/__init__.py                                               15      0   100%
sentence_transformers/models/tokenizer/PhraseTokenizer.py                              74     57    23%
sentence_transformers/models/tokenizer/WhitespaceTokenizer.py                          48      2    96%
sentence_transformers/models/tokenizer/WordTokenizer.py                                20      5    75%
sentence_transformers/models/tokenizer/__init__.py                                      4      0   100%
sentence_transformers/quantization.py                                                 131     98    25%
sentence_transformers/readers/InputExample.py                                           8      1    88%
sentence_transformers/readers/LabelSentenceReader.py                                   25     20    20%
sentence_transformers/readers/NLIDataReader.py                                         26     16    38%
sentence_transformers/readers/PairedFilesReader.py                                     26     26     0%
sentence_transformers/readers/STSDataReader.py                                         33     24    27%
sentence_transformers/readers/TripletReader.py                                         25     19    24%
sentence_transformers/readers/__init__.py                                               6      0   100%
sentence_transformers/sampler.py                                                      125     78    38%
sentence_transformers/similarity_functions.py                                          38      2    95%
sentence_transformers/trainer.py                                                      276    138    50%
sentence_transformers/training_args.py                                                 31      5    84%
sentence_transformers/util.py                                                         310    110    65%
-------------------------------------------------------------------------------------------------------
TOTAL                                                                                6483   3205    51%

@tomaarsen
Copy link
Collaborator

Do you generally use the xml-based coverage outputs? I've only used the HTML ones, e.g. pytest --cov-report term --cov-report html --cov=sentence_transformers.

We can add make test and make test-cov perhaps? Ideally, such that extra args are preserved, e.g. make test-cov --last-failed. I'm not sure if that's reasonable via Makefile, but I assume that it is?

  • Tom Aarsen

@fpgmaas
Copy link
Contributor Author

fpgmaas commented Jul 9, 2024

Do you generally use the xml-based coverage outputs? I've only used the HTML ones, e.g. pytest --cov-report term --cov-report html --cov=sentence_transformers.

We can add make test and make test-cov perhaps? Ideally, such that extra args are preserved, e.g. make test-cov --last-failed. I'm not sure if that's reasonable via Makefile, but I assume that it is?

* Tom Aarsen

I tend to use the xml reports, since they are commonly accepted by tools like codecov. But since sentence-transformers does not use that, I agree with you that it is more useful to use the HTML output format. I will change the PR accordingly :)

With regards to the Makefile + arguments, I am not sure if they play nice together... I'll look into it.

@fpgmaas fpgmaas force-pushed the chore/coverage-start branch 3 times, most recently from 23dfa17 to 30064e2 Compare July 9, 2024 12:05
@tomaarsen
Copy link
Collaborator

With regards to the Makefile + arguments, I am not sure if they play nice together... I'll look into it.

Otherwise we can leave it as-is and just do make test and make test-cov without kwargs :)

  • Tom Aarsen

tomaarsen and others added 2 commits July 9, 2024 14:17
Allow inheriting the Transformer class (UKPLab#2810)

[`feat`] Add hard negatives mining utility (UKPLab#2768)

* Add hard negatives mining utility

* Add example datasets/models for hard negative mining tip

* Update phrasing in dataset overview

[chore] add test for NoDuplicatesBatchSampler (UKPLab#2795)

* add test for NoDuplicatesBatchSampler

* formatting

* simplify tests

[chore] Add test for RoundrobinBatchSampler (UKPLab#2798)

* Add test for RoundrobinBatchSampler

* fix test

* improve RoundRobinBatchSampler and add additional test

* Make datasets in ConcatDataset different sizes

As the real "use case" of the RoundRobin sampler is to avoid sampling from one dataset more than from another. This is best tested when the datasets have different sizes.

---------

Co-authored-by: Tom Aarsen <[email protected]>

[feat] Improve GroupByLabelBatchSampler (UKPLab#2788)

* Improve GroupByLabelBatchSampler

* small fix

* improve test

* Update sentence_transformers/sampler.py

Co-authored-by: Tom Aarsen <[email protected]>

* fix sampler and add unit test

* fix comment

* remove .DS_Store

* rm DS_Store

* change self.groups statement

* move to damplers dir

* Update sentence_transformers/sampler.py

Co-authored-by: Tom Aarsen <[email protected]>

* Add typing

---------

Co-authored-by: Tom Aarsen <[email protected]>
Co-authored-by: Tom Aarsen <[email protected]>

[`chore`] Clean-up `.gitignore` (UKPLab#2799)

add test coverage command

add to workflow

fix cicd

fix cicd

fix

leave cicd untouched

fix gitignore

fix gitignore

update gitignore

update gitignore

fix gitignore

fix gitignor
@fpgmaas fpgmaas force-pushed the chore/coverage-start branch from 5b64364 to 12fa14a Compare July 9, 2024 12:18
@tomaarsen tomaarsen merged commit 5a71df8 into UKPLab:master Jul 9, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants