Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to achieve high recall and high precision for the bigger datasets #15

Open
Murray1991 opened this issue Sep 17, 2018 · 2 comments

Comments

@Murray1991
Copy link
Contributor

Hello,

In the entity matching step I'm trying to combine different bag models with similarity measures for the dirty dataset "movies" in the data folder.

Unfortunately I'm unable to get high recall and high precision, could you give a good "recipe" to get good results for that dataset?

Thank you

@gpapadis
Copy link
Collaborator

Hi,

you can find an example of how to optimize the performance for the Dirty ER movies dataset here: https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/configuration/OptimizeDirtyMoviesDataset.java . Just make sure you unzip the profiles in the data folder.

Kind regards,
George

@GabrielePisciotta
Copy link
Contributor

GabrielePisciotta commented Sep 27, 2018

Hi George,
something isn't working as it should. Launching the algorithm, filtering for "F-Measure" string, I obtain the following

gabriele@aroxy:/media/gabriele/DATA/Universita/Tesi/tool/Jedai/JedAIToolkit$ java -jar jedai-core/target/jedai-core-1.3.jar | grep F-Measure
F-Measure	:	0.0
F-Measure	:	0.06148590947907771
F-Measure	:	0.021220159151193636
F-Measure	:	0.0
F-Measure	:	0.05328596802841918
F-Measure	:	0.011299435028248586
F-Measure	:	0.0
F-Measure	:	0.01474201474201474
F-Measure	:	0.0
F-Measure	:	0.02127659574468085
F-Measure	:	0.011834319526627219
F-Measure	:	0.0350109409190372
F-Measure	:	0.005602240896358543
F-Measure	:	0.02952029520295203
F-Measure	:	0.05687203791469195
F-Measure	:	0.07124681933842239
F-Measure	:	0.18016378525932666
F-Measure	:	0.005586592178770949
F-Measure	:	0.031413612565445025
F-Measure	:	0.0
F-Measure	:	0.026373626373626374
F-Measure	:	0.005633802816901408
F-Measure	:	0.017897091722595078
F-Measure	:	0.0
F-Measure	:	0.01550387596899225
F-Measure	:	0.042194092827004225
F-Measure	:	0.04979253112033195
F-Measure	:	0.0
F-Measure	:	0.24346405228758167
F-Measure	:	0.013856812933025405
F-Measure	:	0.1213235294117647
F-Measure	:	0.00396039603960396
F-Measure	:	0.09782608695652174
F-Measure	:	0.0056179775280898875
F-Measure	:	0.005221932114882507
F-Measure	:	0.015228426395939085
F-Measure	:	0.022172949002217297
F-Measure	:	0.1769087523277467
F-Measure	:	0.0056179775280898875
F-Measure	:	0.05758157389635317
F-Measure	:	0.037578288100208766
F-Measure	:	0.10510948905109489
F-Measure	:	0.021220159151193636
F-Measure	:	0.010075566750629721
F-Measure	:	0.0
F-Measure	:	0.12126537785588755
F-Measure	:	0.030769230769230767
F-Measure	:	0.0
F-Measure	:	0.09671848013816925
F-Measure	:	0.026246719160104987
F-Measure	:	0.005333333333333333
F-Measure	:	0.026373626373626374
F-Measure	:	0.02727272727272727
F-Measure	:	0.005747126436781609
F-Measure	:	0.04519774011299435
F-Measure	:	0.07692307692307693
F-Measure	:	0.029268292682926828
**Best F-Measure**	:	0.24346405228758167
F-Measure	:	0.06531881804043545
F-Measure	:	0.33240997229916897
F-Measure	:	0.7068723702664796
F-Measure	:	0.23611111111111108
F-Measure	:	0.16641813301521025
F-Measure	:	0.07221431344635693
F-Measure	:	0.5612343297974927
F-Measure	:	0.10637480798771122
F-Measure	:	0.7549407114624507
F-Measure	:	0.051971127151582454
F-Measure	:	0.5468904244817374
F-Measure	:	0.060599502218374644
F-Measure	:	0.12273120138288679
F-Measure	:	0.1254868022501082
F-Measure	:	0.07572383073496658
F-Measure	:	0.23976608187134502
F-Measure	:	0.29317507418397626
F-Measure	:	0.3779527559055118
F-Measure	:	0.8632326820603907
F-Measure	:	0.21333333333333332
F-Measure	:	0.5058087578194816
F-Measure	:	0.21828908554572274
F-Measure	:	0.5579999999999999
F-Measure	:	0.6247544204322201
F-Measure	:	0.1894150417827298
F-Measure	:	0.045255720053835796
F-Measure	:	0.02837542874961023
F-Measure	:	0.6962233169129721
F-Measure	:	0.5479082321187584
F-Measure	:	0.04672669749330738
F-Measure	:	0.026598271112377694
F-Measure	:	0.8484848484848486
F-Measure	:	0.016411253430924064
F-Measure	:	0.0505996673378272
F-Measure	:	0.7314578005115091
F-Measure	:	0.6907775768535263
F-Measure	:	0.21588749524895479
F-Measure	:	0.7608069164265131
F-Measure	:	0.028737358566135976
F-Measure	:	0.5949895615866388
F-Measure	:	0.8810289389067525
F-Measure	:	0.7853403141361257
F-Measure	:	0.6308724832214766
F-Measure	:	0.473063973063973
F-Measure	:	0.3970223325062035
F-Measure	:	0.07230422817112833
F-Measure	:	0.036293683873036234
F-Measure	:	0.5448979591836733
F-Measure	:	0.596949891067538
F-Measure	:	0.581986143187067
F-Measure	:	0.4731543624161074
F-Measure	:	0.5520361990950227
F-Measure	:	0.3736842105263158
F-Measure	:	0.05519230769230769
F-Measure	:	0.1483679525222552
F-Measure	:	0.5922077922077922
F-Measure	:	0.4793152639087018
F-Measure	:	0.14285714285714288
F-Measure	:	0.4455066921606119
F-Measure	:	0.6817248459958931
F-Measure	:	0.46865671641791046
**Best F-Measure**	:	0.8810289389067525
F-Measure	:	0.8810289389067525
F-Measure	:	0.13047732956398825
F-Measure	:	0.11595155898953366
F-Measure	:	0.042819724404965266
F-Measure	:	0.22642479058533327
F-Measure	:	0.23221586263287
F-Measure	:	0.10016565433462175
F-Measure	:	0.23370924121038936
F-Measure	:	0.05917226582349951
F-Measure	:	0.07644096250699497
F-Measure	:	0.0655110310670869
F-Measure	:	0.114667836000877
F-Measure	:	0.004642256136482331
F-Measure	:	0.13254834179539807
F-Measure	:	0.03248684511553421
F-Measure	:	0.016043397968605724
F-Measure	:	0.2066214185793482
F-Measure	:	0.19158976510067113
F-Measure	:	0.21924444673504098
F-Measure	:	0.03669933895600638
F-Measure	:	0.18115597783056211
F-Measure	:	0.1830387580636702
F-Measure	:	0.007420289855072463
F-Measure	:	0.22951154710811364
F-Measure	:	0.11181342632955536
F-Measure	:	0.0701813486047948
F-Measure	:	0.20964230171073095
F-Measure	:	0.18857053061652088
F-Measure	:	0.21968997022892928
F-Measure	:	0.25012647981382174
F-Measure	:	0.17085427135678394
F-Measure	:	0.15634139856421164
F-Measure	:	0.2163971572767535
F-Measure	:	0.2221075978748646
F-Measure	:	0.22456320657759507
F-Measure	:	0.23606590724165988
F-Measure	:	0.2014095536413469
F-Measure	:	0.10430664170062783
F-Measure	:	0.20771574652688118
F-Measure	:	0.2076253626191463
F-Measure	:	0.029182879377431907
F-Measure	:	0.21359323432343233
F-Measure	:	0.0025572474718121587
F-Measure	:	0.08629893238434164
F-Measure	:	0.11281268733990953
F-Measure	:	0.17873733108108109
F-Measure	:	0.17257546225570247
F-Measure	:	0.19978046103183317
F-Measure	:	0.20251193689018063
F-Measure	:	0.012711619575894147
F-Measure	:	0.02595620604882511
F-Measure	:	0.010996006713351467
F-Measure	:	0.026391279403327594
F-Measure	:	0.028059325430911074
F-Measure	:	0.10585969738651993
F-Measure	:	0.23020550402295906
F-Measure	:	0.20560844909213882
F-Measure	:	0.2730715567071956
F-Measure	:	0.21276153886091065
**Best F-Measure**	:	0.2730715567071956

Exception in thread "main" java.lang.NullPointerException
	at org.scify.jedai.blockprocessing.comparisoncleaning.WeightedEdgePruning.setNumberedRandomConfiguration(WeightedEdgePruning.java:150)
	at org.scify.jedai.workflowbuilder.Main.main(Main.java:170)

It found 3 different best F-Measure and ends with an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants