Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dirty ER examples input .csv #60

Closed
vromaniello opened this issue Jun 16, 2022 · 2 comments
Closed

Dirty ER examples input .csv #60

vromaniello opened this issue Jun 16, 2022 · 2 comments

Comments

@vromaniello
Copy link

Hi, it is possible to have sample files in .csv format for

  • entity profile D1
  • ground truth
    because .csv files with any formatting will not work.
    The error from JedAI-gui is the following:

image

Thanks you for the support

@gpapadis
Copy link
Collaborator

Hi Vito,

thanks for your interest in JedAI!

The error you get is caused by an incompatibility between JedAI-gui and JedAI-core. The former is practically deprecated. It's better to use JedAI's web app through Docker. See Table 4 here for instructions: https://helios2.mi.parisdescartes.fr/~themisp/publications/is21-jedaiRepro.pdf .

Still, our custom implementation of the CSV reader gets confused by the separator, when it's not a comma. The best approach is to format your dataset like the Leipzig benchmarks (https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution), using comma as a separator and quotes for the values. Below you can find screenshots showing that JedAI can successfully read the csv files of the Abt-Buy dataset.

Btw, some of the Leipzig datasets involve more than than one duplicate per entity, even though they are Clean-Clean ER datasets. This is not supported by JedAI, which automatically removes equivalence clusters with more than two entities and prints the relevant messages in the command line.

Screenshot from 2022-06-20 19-05-24

Screenshot from 2022-06-20 19-05-34

@vromaniello
Copy link
Author

Thanks! My tests succedeed with JedAI's web app through Docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants