Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indices field to _matchesPosition to specify where in an array a match comes from #5005

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

LukasKalbertodt
Copy link
Contributor

Pull Request

Related issue

https://github.com/orgs/meilisearch/discussions/550

What does this PR do?

This adds an indices fields to the objects returned in _matchesPosition. The new field describes in what array element in the document the match was found. This was impossible before and users simply did not know where a match originated inside an array.

Example document:

{
  "id": "123",
  "names": ["foo", "bar", "catnip"],
  "noarray": "dog cat fox",
  "nested": [
    ["dog", "cat"],
    ["fox", "bear"]
  ]
}

Searching for cat now returns this:

"_matchesPosition": {
  "names": [
    {
      "start": 0,
      "length": 3,
      "indices": [2]
    }
  ],
  "nested": [
    {
      "start": 0,
      "length": 3,
      "indices": [0, 1]
    }
  ],
  "noarray": [
    {
      "start": 4,
      "length": 3
    }
  ]
}

Having indices be an array is required due to nested arrays, so one sometimes needs multiple indices to know what data the match comes from.

Alternative API designs

An alternative design would be to include the indices inside the key of _matchesPosition, e.g. foo.bar[2].baz. This is more intuitive to me and puts all "location inside document" information into one place, but has some disadvantages: the index needs to be parsed out of the key, which is annoying for end users. Also, JSON fields can have keys containing [2], so escaping would be necessary. Or one could insert . dots before the [2] (e.g. foo.bar.[2].baz) which might make parsing easier.

Another alternative could be to just include the full path inside the object, e.g.:

{
  "start": 4,
  "length": 3,
  "path": ["foo", "bar", 2, "baz"]
}

String elements would mean fields in an object, numbers would mean indices into an array. This is nice as all path information is in one place. This this, unlike with the current design (of this PR), you would also not need to know the document structure to understand at what levels the indices actually apply.

Of course, the disadvantage is that there is duplication with the keys inside _matchesPosition. One could also convert _matchesPosition to an array, but that's quite the breaking change and it would make some use cases more annoying.

In summary: I personally am fine with all three designs. The implicit "you have to know where the indices go" of the current design is not too bad; the parsing of the foo.bar[2].baz approach also seems ok; adding the nicely typed full path also probably doesn't hurt too much thanks to compression. Let me know what you think! I can change this PR to switch to another approach.

@ManyTheFish
Copy link
Member

Hello @LukasKalbertodt,
is your PR ready for review? If it's not, could you please convert it as a draft PR?

Thanks!

@LukasKalbertodt LukasKalbertodt force-pushed the array-indices-for-matches branch from 9fd1116 to 257a8f6 Compare October 15, 2024 11:34
@LukasKalbertodt
Copy link
Contributor Author

@ManyTheFish The PR is ready for review (now that CI should be fixed...). The alternatives in the descriptions are just considerations for you, to decide what path to choose.

@LukasKalbertodt LukasKalbertodt force-pushed the array-indices-for-matches branch from 257a8f6 to 2606275 Compare November 4, 2024 08:19
@ManyTheFish
Copy link
Member

Hello, @LukasKalbertodt, there are some conflicts with your PR. Could you fix them? :)

For matches inside arrays, this field holds the indices of the array
elements that matched. For example, searching for `cat` inside
`{ "a": ["dog", "cat", "fox"] }` would return `indices: [1]`. For nested
arrays, this contains multiple indices, starting with the one for the
top-most array. For matches in fields without arrays, `indices` is not
serialized (does not exist) to save space.
@LukasKalbertodt LukasKalbertodt force-pushed the array-indices-for-matches branch from 2606275 to 9de947f Compare November 6, 2024 09:50
Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @LukasKalbertodt for your work,

bors merge

Copy link
Contributor

meili-bors bot commented Nov 18, 2024

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@ManyTheFish ManyTheFish added this to the v1.12.0 milestone Nov 18, 2024
@ManyTheFish
Copy link
Member

bors merge

1 similar comment
@curquiza
Copy link
Member

bors merge

@dureuill
Copy link
Contributor

looks like bors is encountering an issue 🤔

bors merge

@curquiza
Copy link
Member

Yes it's a common issue we have often with bors with external contributions only. Unfortunately I don't know how to solve it, except by bypassing bors. So merging by hand. What I do now because the branch is up to date with main and tests are green

@curquiza curquiza merged commit 057fcb3 into meilisearch:main Nov 20, 2024
9 of 10 checks passed
@LukasKalbertodt LukasKalbertodt deleted the array-indices-for-matches branch November 20, 2024 08:04
@meili-bot meili-bot added the v1.12.0 PRs/issues solved in v1.12.0 released on 2024-12-23 label Dec 23, 2024
LukasKalbertodt added a commit to LukasKalbertodt/meilisearch-rust that referenced this pull request Jan 14, 2025
LukasKalbertodt added a commit to LukasKalbertodt/meilisearch-rust that referenced this pull request Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.12.0 PRs/issues solved in v1.12.0 released on 2024-12-23
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants