Skip to content
Marko edited this page Nov 1, 2015 · 35 revisions

Modes

Besides regular Duke mode which uses Bayeses computation of relations between items' properties, here there two modes more:

  • Linear

In linear mode property compare will return value fallowing formula:

 finalResult = result * (high - low) + low

where result is result of comparator you are using, and high and low are values specified in your Duke configuration file. Final similarity result of two items is sum of property comparison results, normalized to range [0..1]. To use linear mode it is necessary to specify that in configuration file by:

<workingMode on="linear" />
  • Regression

In regression mode property compare result is value of formula:

finalResult = result * high

where parameters are same as above. Final similarity result is sum of each property compare result + intercept value.

This is useful when you get your coefficients by linear regression of some training data. Intercept value is specified in duke configuration file by e.g.:

<property intercept="true">
    <name>interceptValue</name>
    <comparator>co.ray.recommpoc.similarity.Duke.ActorComparator</comparator>
    <low>0.0</low>
    <high>0.46</high>
</property>

where intercept value in formula is represented with high node value of property configuration. Default intercept value is 0. To use linear mode it is necessary to specify that in configuration file by:

 <workingMode on="regression" />

By default working mode has value "regular" which means that Duke will work in its regular mode

Optimization

It is possible to turn on optimization, so if relation item A to item B is calculated, relation of item B to item A will not be recalculated but used same as A->B. To enable this feature in configuration file need to put:

<reverseOptimization on="true">
    <ubacacheSize>10000000</ubacacheSize>
</reverseOptimization>

ubacacheSize value represents maximum number of relations that can be cached. If it's value is -1 (default value) all relations will be cached. This solution is fastest but demand more memory.

Treat required properties as filter

If you set

<treatRequiredPropertiesAsFilter on="true" />

then all required properties will acts like a filter. So item A and item B even will not be compared if there required properties' values are not same.

Parallel linkage mode

Linkage mode (two sets cross compare) is also parallel using number of threads you specified.

Example

Example configuration file for linear mode:

 <duke>    
   <reverseOptimization on="true">
      <ubacacheSize>-1</ubacacheSize>
   </reverseOptimization>
   <workingMode on="linear" />
   <treatRequiredPropertiesAsFilter on="true" />

   <schema>
       <threshold>0.12</threshold>
       
       <property type="id">
           <name>ID</name>
       </property>

       <property lookup="true">
           <name>summary</name>
           <comparator>com.foo.bar.KeywordComparator</comparator>
           <low>0.0</low>
           <high>0.9</high>
       </property>

       <property lookup="true">
           <name>country</name>
           <comparator>com.foo.bar.NameComparator</comparator>
           <low>0.0</low>
           <high>1.8</high>
       </property>

   	   <property lookup="true">
           <name>president</name>
           <comparator>com.foo.bar.NameComparator</comparator>
           <low>0.0</low>
           <high>0.8</high>
       </property>
       
       <property lookup="false">
           <name>year</name>
           <comparator>com.foo.bar.SimpleComparator</comparator>
           <low>0.0</low>
           <high>0.4</high>
       </property>

       <property>
           <name>keywords</name>
           <comparator>com.foo.bar.NameComparator</comparator>
           <low>0.0</low>
           <high>0.9</high>
       </property>

       <property lookup="false">
           <name>city</name>
           <comparator>com.foo.bar.NameComparator</comparator>
           <low>0.0</low>
           <high>0.3</high>
       </property>

       <property lookup="required">
           <name>language</name>
           <low>0.0</low>
           <high>0.0</high>
       </property>                                

  </schema>
</duke>
Clone this wiki locally