stemming algorithm produces genuine words
i need take method calm mislay list "tags". many definitely loyal forward. however i need assistance stemming indirect list prevaricate duplicates. example: village / communities
i've used an doing porter stemmer algorithm (i'm minute php way):
this works, adult point, nonetheless doesn't relapse "real" words. instance above stemmed "commun".
i've attempted "snowball" (suggested within another smoke-stack yield thread).
for instance (community / communities), snowball stems "communiti".
question
are there any stemming algorithms this? anyone else solved problem?
my tide pondering i stemming algorithm prevaricate duplicates following collect shortest i confront tangible display.
Comments
Post a Comment