Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanakoStrategy query logic Line 280 - printMap overwrites frequency and time values? #34

Open
lucaslawes opened this issue Aug 30, 2022 · 1 comment

Comments

@lucaslawes
Copy link

Possible minor refactoring to improve the recognition rate.

Issue
Initial testing against a set of audio tracks shows a fingerprint pattern (hash, f1, t1) will sometimes be repeated, but more often the hash is repeated and the f1/t1 is different.

In the current application logic, the use of a HashMap for the printMap means the f1/t1 information is sometimes lost resulting in a slightly less accurate recognition.

//query
for(PanakoFingerprint print : prints) {
	long hash = print.hash();
	db.addToQueryQueue(hash);
	printMap.put(hash, print);
}
...
hit.queryTime = printMap.get(fingerprintHash).t1;
hit.queryF1 = printMap.get(fingerprintHash).f1;

Suggestion
Pass the entire fingerprint to the db queue, extend the PanakoHit class to support queryTime and queryF1, set them when processing the db queue and do anyway with the printMap.

//query
for(PanakoFingerprint print : prints) {
	db.addToQueryQueue(print);
}
...
hit.queryTime = dbHit.queryT1;
hit.queryF1 = dbHit.queryF1;
@JorenSix
Copy link
Owner

JorenSix commented Sep 6, 2022

Hi thanks for the suggestion,

The reason for not allowing duplicate hashes is twofold (and is reflected at the storage side, it is essentially the same as #37):

If a hash is common it means (almost by definition) that it does not have much discriminative power. The idea implemented here is that they can be safely ignored.

Another reason is performance: not wasting storage space or computation on hashes with little discriminative power. While some hash collisions are allowed having too many could have an effect on query performance.

However, letting users choose would indeed be a good improvement. For small collections or powerful servers the collisions can perhaps be not that big of a problem. Either using a Set (to avoid duplicates) or an Array (to allow) to store temporary prints could be an idea indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants