Dirty ER examples input .csv #60

vromaniello · 2022-06-16T16:11:56Z

Hi, it is possible to have sample files in .csv format for

entity profile D1
ground truth
because .csv files with any formatting will not work.
The error from JedAI-gui is the following:

Thanks you for the support

gpapadis · 2022-06-20T16:19:20Z

Hi Vito,

thanks for your interest in JedAI!

The error you get is caused by an incompatibility between JedAI-gui and JedAI-core. The former is practically deprecated. It's better to use JedAI's web app through Docker. See Table 4 here for instructions: https://helios2.mi.parisdescartes.fr/~themisp/publications/is21-jedaiRepro.pdf .

Still, our custom implementation of the CSV reader gets confused by the separator, when it's not a comma. The best approach is to format your dataset like the Leipzig benchmarks (https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution), using comma as a separator and quotes for the values. Below you can find screenshots showing that JedAI can successfully read the csv files of the Abt-Buy dataset.

Btw, some of the Leipzig datasets involve more than than one duplicate per entity, even though they are Clean-Clean ER datasets. This is not supported by JedAI, which automatically removes equivalence clusters with more than two entities and prints the relevant messages in the command line.

vromaniello · 2022-06-20T17:48:51Z

Thanks! My tests succedeed with JedAI's web app through Docker.

vromaniello closed this as completed Jun 20, 2022

gpapadis mentioned this issue Aug 9, 2022

Unable to Read csv or json files #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dirty ER examples input .csv #60

Dirty ER examples input .csv #60

vromaniello commented Jun 16, 2022

gpapadis commented Jun 20, 2022

vromaniello commented Jun 20, 2022

Dirty ER examples input .csv #60

Dirty ER examples input .csv #60

Comments

vromaniello commented Jun 16, 2022

gpapadis commented Jun 20, 2022

vromaniello commented Jun 20, 2022