Find indexes of item in list #20812

rjthoen · 2025-01-20T21:07:32Z

Description

In #19894 index_of was introduced, which gets the index of the first occurrence of an item in a column/list when available. This is a very helpful feature, but why only return the first occurrence and not the first n or all of them?

Could index_of be changed to indexes_of where it returns all the occurrences of the item in the list and an empty list when the item is not found? To get back the current behavior (minus the return type) an optional keyword argument n = number of indexes to return could be added to this expression or to indexes_of_exact (like in split_exact).

Expanding on the example in the documentation for index_of we would get the following usage:

>>> df = pl.DataFrame({"a": [1, None, 17, 1]})
>>> df.select(
...    [
...        pl.col("a").indexes_of(1).alias("one"),
...        pl.col("a").indexes_of(17).alias("seventeen"),
...        pl.col("a").indexes_of(None).alias("null"),
...        pl.col("a").indexes_of(55).alias("fiftyfive"),
...    ]
... )

shape: (1, 4)
┌───────────┬───────────┬───────────┬───────────┐
│ one       ┆ seventeen ┆ null      ┆ fiftyfive │
│ ---       ┆ ---       ┆ ---       ┆ ---       │
│ list[u32] ┆ list[u32] ┆ list[u32] ┆ list[u32] │
╞═══════════╪═══════════╪═══════════╪═══════════╡
│ [0, 3]    ┆ [2]       ┆ [1]       ┆ []        │
└───────────┴───────────┴───────────┴───────────┘

The text was updated successfully, but these errors were encountered:

rjthoen · 2025-01-20T21:09:38Z

The output of the proposed indexes_of expression can already be achieved with polars:

name_value_map = {
    "one": 1,
    "seventeen": 17,
    "null": None,
    "fiftyfive": 55
}

df = pl.DataFrame({"a": [1, None, 17, 1]})

(
    df
    .with_row_index()
    .select(
        [
            pl.col("index")
            .filter(
                # Option A: with `index_of`
                pl.col("a").index_of(value).over("index").is_not_null()
                
                # Option B: without `index_of`
                (pl.col("a") == value) |
                (pl.col("a").is_null() & (value is None))
            )
            .implode()
            .alias(name)
            for name, value in name_value_map.items()
        ]
    )
)

cmdlineluser · 2025-01-20T22:25:12Z

There is also .arg_true()

df.select(
    pl.col("a").eq_missing(value).arg_true().implode()
      .alias(name)
    for name, value in name_value_map.items()
)

# shape: (1, 4)
# ┌───────────┬───────────┬───────────┬───────────┐
# │ one       ┆ seventeen ┆ null      ┆ fiftyfive │
# │ ---       ┆ ---       ┆ ---       ┆ ---       │
# │ list[u32] ┆ list[u32] ┆ list[u32] ┆ list[u32] │
# ╞═══════════╪═══════════╪═══════════╪═══════════╡
# │ [0, 3]    ┆ [2]       ┆ [1]       ┆ []        │
# └───────────┴───────────┴───────────┴───────────┘

rjthoen · 2025-01-21T07:55:11Z

There is also .arg_true()

Thanks @cmdlineluser, would you say that using the compound expression should be favored over adding a dedicated expression like indexes_of?

The result of index_of could (inefficiently) be replicated in a similar way:

df.select(
    pl.col("a").eq_missing(value).arg_true().first()
      .alias(name)
    for name, value in name_value_map.items()
)

If there is no intention to add an expression to return all the indexes of an item in a column/list then I would suggest to:

Rename index_of to first_index_of to increase the descriptiveness.
Mention in the documentation of (first_)index_of how to use the compound expression to find all indexes as I can imagine that it is a relatively common use case that is highly related.

rjthoen added the enhancement New feature or an improvement of an existing feature label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find indexes of item in list #20812

Find indexes of item in list #20812

rjthoen commented Jan 20, 2025 •

edited

Loading

rjthoen commented Jan 20, 2025 •

edited

Loading

cmdlineluser commented Jan 20, 2025

rjthoen commented Jan 21, 2025 •

edited

Loading

Find indexes of item in list #20812

Find indexes of item in list #20812

Comments

rjthoen commented Jan 20, 2025 • edited Loading

Description

rjthoen commented Jan 20, 2025 • edited Loading

cmdlineluser commented Jan 20, 2025

rjthoen commented Jan 21, 2025 • edited Loading

rjthoen commented Jan 20, 2025 •

edited

Loading

rjthoen commented Jan 20, 2025 •

edited

Loading

rjthoen commented Jan 21, 2025 •

edited

Loading