-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on TestGtRDFReader #11
Comments
Hi, we are glad you are interested in JedAI! I didn't have the time to reproduce the error you mention. It is probably caused because there is a same-as statement that connects an entity to itself. I guess you have modified TestGtRDFReader.java so that it reads both datasets. Which of the two datasets do you use as input for the GtRDFReader? Kind regards, |
Hi, these datasets come from silkframework.org. ` final String sub = stmt.getSubject().toString();
Thanks ! |
Hi, for some reason, I see lots of sameAs statements in the datasets you have uploaded. Kind regards, |
Many thanks for the reply! kind regards, |
You are welcome Frank! Let us know if we can assist you in any other way. |
Hi George, attached you can find the java class that you have provided to me, modified with block management and similarity process. What do you think? Any suggestions? Thanks in advance, |
Hi Frank, I am sorry for the late response. I updated the TestSilkData.java class (https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java) with a more complete version of the code. The code you sent me didn't perform Entity Clustering, which is necessary for yielding the final results. The absolute values of the similarities might be low, but what matters is their relative values. In the Clean-Clean ER scenario you are considering, Unique Mapping Clustering should be applied in the end so that for every entity, the best match is selected (i.e., the pair with the highest similarity), as long as this similarity exceeds a certain threshold. Note that the new code tests a large number of configurations in order to find the one with the highest performance. As a result, it will take some time to complete. I ran it, but no meaningful results were produced, because the ground-truth reader cannot extract any pair of duplicates from the source.nt file that is used as the source of the groundtruth. Kind regards, |
Thank you for your time! Kind regards, |
Hi Frank, You can find many relevant datasets here You can also check the following datasets along with the expected mappings: Best regards, |
Many thanks for the indications and suggestions! Kind regards, |
Hi, I'm tried some tests with JedAI tool.
This tool is useful for my job and I think that it has big potentiality.
I've downloaded the attached file in nt format: source.nt, target.nt.
In the firts step I have successfully executed TestRdfReader class presents in the test package for both datasets. After that I've tried to execute TestGtRDFReader class with the same datasets used before, but I have the following error:
Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:203) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.performReading(GtRDFReader.java:236) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.getDuplicatePairs(GtRDFReader.java:92) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:57) at org.scify.jedai.datareader.TestGtRDFReader.main(TestGtRDFReader.java:39)
datasets.zip
Thanks in advance!
The text was updated successfully, but these errors were encountered: