-
Notifications
You must be signed in to change notification settings - Fork 1
This fork features
Besides regular Duke mode which uses Bayeses computation of relations between items' properties, here there two modes more:
- Linear
In linear mode property compare will return value fallowing formula:
finalResult = result * (high - low) + low
where result is result of comparator you are using, and high and low are values specified in your Duke configuration file. Final similarity result of two items is sum of property comparison results, normalized to range [0..1]. To use linear mode it is necessary to specify that in configuration file by:
<workingMode on="linear" />
- Regression
In regression mode property compare result is value of formula:
finalResult = result * high
where parameters are same as above. Final similarity result is sum of each property compare result + intercept value.
This is useful when you get your coefficients by linear regression of some training data. Intercept value is specified in duke configuration file by e.g.:
<property intercept="true">
<name>interceptValue</name>
<comparator>co.ray.recommpoc.similarity.Duke.ActorComparator</comparator>
<low>0.0</low>
<high>0.46</high>
</property>
where intercept value in formula is represented with high node value of property configuration. Default intercept value is 0. To use linear mode it is necessary to specify that in configuration file by:
<workingMode on="regression" />
By default working mode has value "regular" which means that Duke will work in its regular mode
It is possible to turn on optimization, so if relation item A to item B is calculated, relation of item B to item A will not be recalculated but used same as A->B. To enable this feature in configuration file need to put:
<reverseOptimization on="true">
<ubacacheSize>10000000</ubacacheSize>
</reverseOptimization>
ubacacheSize value represents maximum number of relations that can be cached. If it's value is -1 (default value) all relations will be cached. This solution is fastest but demand more memory.
If you set
<treatRequiredPropertiesAsFilter on="true" />
then all required properties will acts like a filter. So item A and item B even will not be compared if there required properties' values are not same.
Linkage mode (two sets cross compare) is also parallel using number of threads you specified.
Example configuration file for linear mode:
<duke>
<reverseOptimization on="true">
<ubacacheSize>-1</ubacacheSize>
</reverseOptimization>
<workingMode on="linear" />
<treatRequiredPropertiesAsFilter on="true" />
<schema>
<threshold>0.12</threshold>
<property type="id">
<name>ID</name>
</property>
<property lookup="true">
<name>summary</name>
<comparator>com.foo.bar.KeywordComparator</comparator>
<low>0.0</low>
<high>0.9</high>
</property>
<property lookup="true">
<name>country</name>
<comparator>com.foo.bar.NameComparator</comparator>
<low>0.0</low>
<high>1.8</high>
</property>
<property lookup="true">
<name>president</name>
<comparator>com.foo.bar.NameComparator</comparator>
<low>0.0</low>
<high>0.8</high>
</property>
<property lookup="false">
<name>year</name>
<comparator>com.foo.bar.SimpleComparator</comparator>
<low>0.0</low>
<high>0.4</high>
</property>
<property>
<name>keywords</name>
<comparator>com.foo.bar.NameComparator</comparator>
<low>0.0</low>
<high>0.9</high>
</property>
<property lookup="false">
<name>city</name>
<comparator>com.foo.bar.NameComparator</comparator>
<low>0.0</low>
<high>0.3</high>
</property>
<property lookup="required">
<name>language</name>
<low>0.0</low>
<high>0.0</high>
</property>
</schema>
</duke>