Tuesday, April 14 • 11:40am - 12:30pm
Filtering Twitter with UIMA - Neal Lewis, IBM Watson Group

What's the best movie to see this weekend? This common question might be solved by asking "what does everyone on twitter like"? But it turns out writing a system to answer is complicated. First you pull an initial set of data based on keywords. Then you see most of your millions of tweets are noise and spam. Now you need filtering before you can do decision making. This can be a combination of heuristics (e.g., posters with no followers are probably spammers) and traditional NLP (e.g., tweets talking about movies in the future tense are not ones the poster has already seen).

Apache UIMA (tm) provides and ideal framework for developing and deploying such a system.

We demo a system to take a large pull from twitter, remove noise and calculate sentiment. We will show how a pipeline of a ~6 analytics can remove the majority of the junk and spam from the feed and get useful results.


Neal Lewis

Neal Lewis is a Research Engineer for the IBM Watson Group focusing on statistical methods in Natural Language Processing for improving Text Analytic outcomes in multiple domains including Social Media and Healthcare. His speaking experience includes countless speaking engagements within IBM, as well as seminars and presentations at universities and conferences. He also performs improv comedy for public audiences in San Jose, CA.

Tuesday April 14, 2015 11:40am - 12:30pm
Texas I

