Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Performance differences between encode() vs __call__() on tf Encoder block in CPU #1213

Open
lecardozo opened this issue Oct 4, 2023 · 4 comments

Comments

@lecardozo
Copy link

❓ Questions & Help

What is the preferred way of generating predictions from a trained Encoder from a TwoTowerModelV2? There seem to be at least two ways of doing that, with apparently huge performance differences.

Details

After training a TwoTowerModelV2 I noticed that there is a huge difference in performance between calling the model.query_encoder.encode() method of each tower versus calling it directly model.query_encoder() on a single node with CPU.

Setup

import pandas as pd
import nvtabular as nvt

# Encoder
query_encoder = trained_two_tower_model.query_encoder

# Raw Features
features = pd.DataFrame(...)

# Transformed features with nvt.Worflow
query_preprocessor = workflow.get_subworkflow("query_preprocessor")
data = nvt.Dataset(features, schema=self._user_schema)
transformed_data = query_preprocessor.transform(data)

Calling encode()

This takes more >1 hour on 434457 rows. Resource usage metrics show that the CPU is idle most of the time, which is quite unexpected.

outputs = query_encoder.encode(transformed_data, batch_size=1024, index=Tags.USER_ID).compute()

Tried increasing the number of partitions of the transformed dataset and set the .compute(scheduler='processes') to benefit from Dask's parallelization, but it didn't work (failed with serialization issues)

Calling __call__() with Loader

This takes ~30 seconds on 434457 rows. As my data fits into memory, this ended up being a clear winner.

outputs = []
for inputs, _ mm.Loader(transformed_data, batch_size=1024, shuffle=False):
    outputs.append(query_encoder(inputs))

output = np.concatenate(outputs)

Is this difference expected or am I doing something wrong?

@rnyak
Copy link
Contributor

rnyak commented Oct 4, 2023

@lecardozo you can check out the Generate top-K recommendations section in this example nb showcasing how to generate topK recommendations for a given batch. and you can loop over the batches and then concat the outputs.

@lecardozo
Copy link
Author

Thanks for the answer @rnyak!

Sorry, I think I wasn't clear before. I'm looking specifically for a way of generating embeddings for query/candidates independently, instead of generating recommendations. The idea is to have candidate embeddings indexed on an external vector search engine and use ANN for retrieval later.

@rnyak
Copy link
Contributor

rnyak commented Oct 5, 2023

@lecardozo the same notebook shows how to generate candidate and query embeddings.

queries = model.query_embeddings(Dataset(user_features, schema=schema.select_by_tag(Tags.USER)), 
                                 batch_size=1024, index=Tags.USER_ID)
query_embs_df = queries.compute(scheduler="synchronous").reset_index()

item_features = (
    unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID).compute().reset_index(drop=True)
)
item_embs = model.candidate_embeddings(Dataset(item_features, schema=schema.select_by_tag(Tags.ITEM)), 
                                       batch_size=1024, index=Tags.ITEM_ID)

hope that helps.

@lecardozo
Copy link
Author

That was my first try, as I followed along the whole notebook. As these methods are just thin wrappers around the Encoder.encode(), we end up having the same performance issues that I mentioned befored (which is what made me look at the source code of these methods in the first place).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants