Error on TestGtRDFReader #11

franklarryx · 2018-05-25T08:43:47Z

Hi, I'm tried some tests with JedAI tool.
This tool is useful for my job and I think that it has big potentiality.
I've downloaded the attached file in nt format: source.nt, target.nt.
In the firts step I have successfully executed TestRdfReader class presents in the test package for both datasets. After that I've tried to execute TestGtRDFReader class with the same datasets used before, but I have the following error:
Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:203) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.performReading(GtRDFReader.java:236) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.getDuplicatePairs(GtRDFReader.java:92) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:57) at org.scify.jedai.datareader.TestGtRDFReader.main(TestGtRDFReader.java:39)

datasets.zip

Thanks in advance!

The text was updated successfully, but these errors were encountered:

gpapadis · 2018-05-25T11:56:49Z

Hi, we are glad you are interested in JedAI!

I didn't have the time to reproduce the error you mention. It is probably caused because there is a same-as statement that connects an entity to itself. I guess you have modified TestGtRDFReader.java so that it reads both datasets. Which of the two datasets do you use as input for the GtRDFReader?

Kind regards,
George

franklarryx · 2018-05-25T15:40:54Z

Hi, these datasets come from silkframework.org.
I have already checked that both datasets not contain any "sameAs" property.
I have debugged the code and seems that the issue is derivated from the following part of the code presents in the GtRDFReader (line 229-234) class:

` final String sub = stmt.getSubject().toString();
final String obj = stmt.getObject().toString();

        // add a new edge for every pair of duplicate entities
        int entityId1 = urlToEntityId1.get(sub);
        int entityId2 = urlToEntityId1.get(obj) + datasetLimit;`

Thanks !

gpapadis · 2018-05-27T04:24:30Z

Hi,

for some reason, I see lots of sameAs statements in the datasets you have uploaded.
I created here a class that tries to reproduce the error you are mentioning:
https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java
So, my question is which file does gtFilePath point to in your case (Line 21)?
On my computer, I run TestSilkData.java without getting any exception.
The problem I see with setting
gtFilePath = mainDir + "source.nt";
is that I only get sameAs statements like the following:
[http://dbpedia.org/resource/Karma_%28film%29, http://www.w3.org/2002/07/owl#sameAs, http://data.linkedmdb.org/resource/film/7632]
where http://data.linkedmdb.org/resource/film/7632 is not included in any of the given datasets and causes problems.
I would be happy to help you if you clarified which dataset you use for groundtruth, provided of course that this groundtruth file contains correct links of the form
URL_from_Dataset_1 sameAs URL_from_Dataset_2.

Kind regards,
George

franklarryx · 2018-05-30T08:17:32Z

Many thanks for the reply!
I had written wrong code.
My goal is to check how JedAI links the two datasets (source.nt and target.nt) in order to replace the silk tool!

kind regards,
Frank

gpapadis · 2018-05-30T08:31:59Z

You are welcome Frank! Let us know if we can assist you in any other way.

franklarryx · 2018-06-01T07:44:13Z

Hi George,

attached you can find the java class that you have provided to me, modified with block management and similarity process.
I can't understand very well the result obtain from the class (result.txt in attached). I think that the percentages of similarity are not highly, but the linkage between datasets are present!

What do you think? Any suggestions?

Thanks in advance,
Frank

classAndResult.zip

gpapadis · 2018-06-06T07:42:47Z

Hi Frank,

I am sorry for the late response.

I updated the TestSilkData.java class (https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java) with a more complete version of the code. The code you sent me didn't perform Entity Clustering, which is necessary for yielding the final results. The absolute values of the similarities might be low, but what matters is their relative values. In the Clean-Clean ER scenario you are considering, Unique Mapping Clustering should be applied in the end so that for every entity, the best match is selected (i.e., the pair with the highest similarity), as long as this similarity exceeds a certain threshold.

Note that the new code tests a large number of configurations in order to find the one with the highest performance. As a result, it will take some time to complete. I ran it, but no meaningful results were produced, because the ground-truth reader cannot extract any pair of duplicates from the source.nt file that is used as the source of the groundtruth.

Kind regards,
George

franklarryx · 2018-06-11T07:35:25Z

Thank you for your time!
Where can I find a simple example in rdf in order to better understand your tool?

Kind regards,
Frank

mthanos · 2018-06-11T12:12:44Z

Hi Frank,

You can find many relevant datasets here
http://oaei.ontologymatching.org/2009/ ,
where we have also taken many of our benchmarks from.

You can also check the following datasets along with the expected mappings:
oaeiIMidentity.zip
They were used for OAEI instance matching track (http://oaei.ontologymatching.org/2014/im/index.html)

Best regards,
Manos.

franklarryx · 2018-06-14T09:47:21Z

Many thanks for the indications and suggestions!

Kind regards,
Frank

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on TestGtRDFReader #11

Error on TestGtRDFReader #11

franklarryx commented May 25, 2018

gpapadis commented May 25, 2018

franklarryx commented May 25, 2018

gpapadis commented May 27, 2018

franklarryx commented May 30, 2018

gpapadis commented May 30, 2018

franklarryx commented Jun 1, 2018

gpapadis commented Jun 6, 2018

franklarryx commented Jun 11, 2018

mthanos commented Jun 11, 2018

franklarryx commented Jun 14, 2018

Error on TestGtRDFReader #11

Error on TestGtRDFReader #11

Comments

franklarryx commented May 25, 2018

gpapadis commented May 25, 2018

franklarryx commented May 25, 2018

gpapadis commented May 27, 2018

franklarryx commented May 30, 2018

gpapadis commented May 30, 2018

franklarryx commented Jun 1, 2018

gpapadis commented Jun 6, 2018

franklarryx commented Jun 11, 2018

mthanos commented Jun 11, 2018

franklarryx commented Jun 14, 2018