Skip to content

Commit

Permalink
Merge pull request #49 from quarkslab/haussmann_dist
Browse files Browse the repository at this point in the history
Change the name from jaccard-strong to haussmann. NFC
  • Loading branch information
RobinDavid authored Jan 26, 2024
2 parents 4e2c995 + fff6b5a commit c25955b
Show file tree
Hide file tree
Showing 6 changed files with 21 additions and 21 deletions.
6 changes: 3 additions & 3 deletions doc/source/api/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ pairwise_distances

.. autofunction:: qbindiff.passes.metrics.pairwise_distances

jaccard_strong
--------------
haussmann
---------

.. autofunction:: qbindiff.passes.metrics.jaccard_strong
.. autofunction:: qbindiff.passes.metrics.haussmann

canberra_distances
------------------
Expand Down
4 changes: 2 additions & 2 deletions doc/source/basicex.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ BinExport is more flexible since it can be used with IDA, Ghidra and BinaryNinja

* `-n`, `--normalize` Normalize the Call Graph by removing some of the edges/nodes that should worsen the diffing result. **WARNING:** it can potentially lead to a worse matching. To know the details of the normalization step look at {ref}`normalization`

* `-d`, `--distance` Set the default distance that should be used by the features. The possible values are `canberra, correlation, cosine, euclidean, jaccard-strong`. The default one is `canberra`. To know the details of the jaccard-strong distance look here {ref}`jaccard-strong`
* `-d`, `--distance` Set the default distance that should be used by the features. The possible values are `canberra, correlation, cosine, euclidean, haussmann`. The default one is `canberra`. To know the details of the haussmann distance look here {ref}`haussmann`

* `-s`, `--sparsity-ratio` Set the density of the similarity matrix. This will loose some information (hence decrease accuracy) but it will also increase the performace. `0.999` means that the 99.9% of the matrix will be filled with zeros. The default value is `0.75`

Expand Down Expand Up @@ -65,7 +65,7 @@ Some examples are displayed below :
-f dat \
-f cst \
-f addr:0.01 \
-d jaccard-strong -s 0.999 -sr \
-d haussmann -s 0.999 -sr \
-ff bindiff -o ./result.BinDiff -vv
.. code-block:: bash
Expand Down
8 changes: 4 additions & 4 deletions doc/source/params.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,16 @@ Most of the distance functions that QBinDiff uses come from `Scipy <https://docs
* seuclidean
* sqeuclidean

However, some distance are unique in QBinDiff, such as the jaccard-strong distance.
However, some distance are unique in QBinDiff, such as the haussmann distance.
This is a experimental new metric that combines the jaccard index and the canberra distance.

Jaccard-strong
~~~~~~~~~~~~~~
Haussmann
~~~~~~~~~

Formally it is defined as:

.. math::
d(u, v) = \sum_{i=0}^n\frac{f(u_i, v_i)}{ | \{ i | u_i \neq 0 \lor v_i \neq 0 \} | }
d(u, v) = \sum_{i=0}^n\frac{f(u_i, v_i)}{ \lvert \{ j | u_j \neq 0 \lor v_j \neq 0 \} \rvert }
.. math::
with\ u, v \in \mathbb{R}^n
Expand Down
2 changes: 1 addition & 1 deletion src/qbindiff/passes/fast_metrics.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def sparse_canberra(floating[::1] X_data, int[:] X_indices, int[:] X_indptr,

D[px, py] = d

def sparse_strong_jaccard(floating[::1] X_data, int[:] X_indices, int[:] X_indptr,
def sparse_haussmann(floating[::1] X_data, int[:] X_indices, int[:] X_indptr,
floating[::1] Y_data, int[:] Y_indices, int[:] Y_indptr,
double[:, ::1] D, double[:] w):
"""Pairwise canberra distances for CSR matrices"""
Expand Down
20 changes: 10 additions & 10 deletions src/qbindiff/passes/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
import sklearn.metrics
from scipy.spatial import distance
from scipy.sparse import issparse, csr_matrix
from qbindiff.passes.fast_metrics import sparse_canberra, sparse_strong_jaccard
from qbindiff.passes.fast_metrics import sparse_canberra, sparse_haussmann
from qbindiff.types import Distance


Expand Down Expand Up @@ -106,16 +106,16 @@ def canberra_distances(X, Y, w=None):
ValueError("Cannot assign weights with non-sparse matrices")


def jaccard_strong(X, Y, w=None):
def haussmann(X, Y, w=None):
r"""
Compute a variation of the jaccard distances between the vectors in X and Y using
the optional array of weights w.
Custom distance that takes inspiration from the jaccard index and the canberra distance.
If computes the distance between the vectors in X and Y using the optional array of weights w.
The distance function between two vector ``u`` and ``v`` is the following:
.. math::
\sum_{i}\frac{f(u_i, v_i)}{ | \{ i | u_i \neq 0 \lor v_i \neq 0 \} | }
\sum_{i}\frac{f(u_i, v_i)}{ | \{ j | u_j \neq 0 \lor v_j \neq 0 \} | }
where the function ``f`` is defined like this:
Expand All @@ -127,7 +127,7 @@ def jaccard_strong(X, Y, w=None):
.. math::
\sum_{i}\frac{w_i * f(u_i, v_i)}{ | \{ i | u_i \neq 0 \lor v_i \neq 0 \} | }
\sum_{i}\frac{w_i * f(u_i, v_i)}{ | \{ j | u_j \neq 0 \lor v_j \neq 0 \} | }
:param X: array-like of shape (n_samples_X, n_features)
An array where each row is a sample and each column is a feature.
Expand All @@ -139,7 +139,7 @@ def jaccard_strong(X, Y, w=None):
Default is None, which gives each value a weight of 1.0
:return D: ndarray of shape (n_samples_X, n_samples_Y)
D contains the pairwise strong jaccard distances.
D contains the pairwise haussmann distances.
When X and/or Y are CSR sparse matrices and they are not already
in canonical format, this function modifies them in-place to
Expand All @@ -159,19 +159,19 @@ def jaccard_strong(X, Y, w=None):
w = _validate_weights(w)
if w.size != X.shape[1]:
ValueError("Weights size mismatch")
sparse_strong_jaccard(X.data, X.indices, X.indptr, Y.data, Y.indices, Y.indptr, D, w)
sparse_haussmann(X.data, X.indices, X.indptr, Y.data, Y.indices, Y.indptr, D, w)
return D

if w is None:
return sparse_strong_jaccard(
return sparse_haussmann(
X.data, X.indices, X.indptr, Y.data, Y.indices, Y.indptr, D, None
)
ValueError("Cannot assign weights with non-sparse matrices")


CUSTOM_DISTANCES = {
Distance.canberra: canberra_distances,
Distance.jaccard_strong: jaccard_strong,
Distance.haussmann: haussmann,
}


Expand Down
2 changes: 1 addition & 1 deletion src/qbindiff/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,4 +207,4 @@ class Distance(IntEnum):
canberra = 0 # doc: canberra distance
euclidean = 1 # doc: euclidean distance
cosine = 2 # doc: cosine distance
jaccard_strong = 3 # doc: custom distance
haussmann = 3 # doc: haussmann distance

0 comments on commit c25955b

Please sign in to comment.