Adding Stemming and Lemmatization #281
Labels
cleanup-stay
Issues that won't be removed as part of cleanup
enhancement
New feature or request
question
Further information is requested
Adding an option for Stemming and/or Lemmatization is important when using count, hash and tf-idf vectorizers as it makes the vocabulary smaller by understanding words having same root or lemma respectively. It also makes the patterns within a dataset more visible to the model.
Stemming
Lemmatization
I believe it would be a good start to start with Stemming and then move on to Lemmatization.
The text was updated successfully, but these errors were encountered: