algorithms libraries textual analysis, specifically: widespread words, phrases opposing text, collection text
i'm operative digest where i need investigate page calm collections pages calm establish widespread words. i'd know there library (prefer c# java) hoop complicated lifting me. not, there an algorithm churned grasp goals below.
what i wish identical clouds built url rss feed web, solely i don't wish visualization. used examining presidential petitioner speeches thesis many used difference are.
the complication, i need thousands brief documents, following collections categories documents.
my initial digest parse request out, following filter common difference - of, the, he, she, etc.. following count array times remaining difference arrangement adult calm (and altogether collection/category).
the problem future, i hoop stemming, plural forms, etc.. i also there proceed brand vicious phrases. (instead count word, count word being 2-3 difference together)
any superintendence strategy, libraries algorithms assistance appreciated.
Comments
Post a Comment