Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

factors for cold start items #44

Closed
RileyChiu opened this issue Aug 8, 2023 · 5 comments
Closed

factors for cold start items #44

RileyChiu opened this issue Aug 8, 2023 · 5 comments

Comments

@RileyChiu
Copy link

RileyChiu commented Aug 8, 2023

Hi,

I have the following questions:

  1. I want to use CMF_implicit for item cold start problems. But I don't know which built-in function can be used to generate factors for cold-start/new items.
    Assume that I have three matrices:
  • X: user-item interactions of shape 100*50 with 100 users and 50 old items
  • I: item-attribute of shape 50*30 with 50 old items and 30 item side information/features
  • I_new: new item attribute of shape 10*30 with 10 new items and 30 item side information/features

I first trained the model by calling model.fit(X=X, I=I). This gave me factors matrices A,B,D where X ~ AB^T and I ~ BD^T.
How can I get the new item factors B_new where I_new ~ B_new*D^T? I feel like factors_multiple is the most possible function but it requires the matrix X so I am not sure.

  1. Did you or other users experience longer training time after 7/17? I observed way longer (16x longer) model fitting time after 7/17. I noticed that cython dependency has released new version 3.0.0 on 7/17.
    https://pypi.org/project/Cython/#history
    But when I rolled back to the old version, the fitting time is still very slow.

Thanks in advance.

@david-cortes
Copy link
Owner

  1. You can use item_factors_cold or swap_users_and_items.
  2. In order to make it use older cython, you'll need to build the package with that older version, not just have it as a run-time dependency. That is, you'll first need to install an older cython (say, cython==0.29.36), some numpy+scipy at your preferred version, and then install this package, with arguments like --no-use-pep517 or similar:
pip install cython==0.29.36 numpy scipy
pip uninstall -y cmfrec # if you had installed it before
pip install --no-use-pep517 cmfrec

Nevertheless, I did some testing on my end and did not find any speed difference from using older cython or newer cython. Are you able to provide an example and timings that show a slowdown?

@david-cortes
Copy link
Owner

Also, I've just pushed some updates for the newer cython idiosyncrachies just in case - could you give it a try and confirm if it also runs slower for you?

pip install git+https://github.com/david-cortes/cmfrec.git

@RileyChiu
Copy link
Author

Hi David,

Thanks, I am still collecting some data and examples. A follow up question to item_factors_cold:
I am running model.item_factors_cold(I = I_new)


ValueError Traceback (most recent call last)
Cell In[27], line 1
----> 1 model.item_factors_cold(I=I_new)

File ~/miniconda3/lib/python3.11/site-packages/cmfrec/init.py:5091, in CMF_implicit.item_factors_cold(self, I, I_col, I_val)
5058 def item_factors_cold(self, I=None, I_col=None, I_val=None):
5059 """
5060 Determine item-factors from new data, given I
5061
(...)
5089 The item-factors as determined by the model.
5090 """
-> 5091 return self._item_factors_cold(I=I, I_bin=None, I_col=I_col, I_val=I_val)

File ~/miniconda3/lib/python3.11/site-packages/cmfrec/init.py:1960, in _CMF._item_factors_cold(self, I, I_bin, I_col, I_val)
1957 l1_lambda = self.l1_lambda
1958 lambda_bias = self.l1_lambda
-> 1960 I, I_col, I_val, I_bin = self._process_new_U(U=I, U_col=I_col, U_val=I_val, U_bin=I_bin, is_I=True)
1962 c_funs = wrapper_float if self.use_float else wrapper_double
1964 if (not self._implicit):

File ~/miniconda3/lib/python3.11/site-packages/cmfrec/init.py:552, in CMF.process_new_U(self, U, U_col, U_val, U_bin, is_I)
550 U = np.array(U).reshape(-1).astype(self.dtype
)
551 if U.shape[0] != Mat.shape[0]:
--> 552 raise ValueError("Dimensions of %s don't match with earlier data."
553 % letter)
554 else:
555 U = np.empty(0, dtype=self.dtype
)

ValueError: Dimensions of I don't match with earlier data.

Here I_new is a numpy array of shape (10,30). What did I do wrong? What does earlier data mean here. I assume that we only need features dimensions equal (which are both 30).

@david-cortes
Copy link
Owner

Thanks for pointing this out - there was an error (different from what you're seeing there) which is now fixed:

pip install cmfrec==3.5.1.post6

Note that the method item_factors_cold takes as input one item entry, so you'd need one call per row.

@RileyChiu
Copy link
Author

Thanks. I confirmed that version 3.5.1.post6 works for item_factors_cold and also runs as fast as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants