ApacheCon NA 2015 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Content [clear filter]
Tuesday, April 14

11:40am CDT

Filtering Twitter with UIMA - Neal Lewis, IBM Watson Group
What's the best movie to see this weekend? This common question might be solved by asking "what does everyone on twitter like"? But it turns out writing a system to answer is complicated. First you pull an initial set of data based on keywords. Then you see most of your millions of tweets are noise and spam. Now you need filtering before you can do decision making. This can be a combination of heuristics (e.g., posters with no followers are probably spammers) and traditional NLP (e.g., tweets talking about movies in the future tense are not ones the poster has already seen).

Apache UIMA (tm) provides and ideal framework for developing and deploying such a system.

We demo a system to take a large pull from twitter, remove noise and calculate sentiment. We will show how a pipeline of a ~6 analytics can remove the majority of the junk and spam from the feed and get useful results.


Neal Lewis

Neal Lewis is a Research Engineer for the IBM Watson Group focusing on statistical methods in Natural Language Processing for improving Text Analytic outcomes in multiple domains including Social Media and Healthcare. His speaking experience includes countless speaking engagements... Read More →

Tuesday April 14, 2015 11:40am - 12:30pm CDT
Texas I
Wednesday, April 15

11:15am CDT

Apache CXF, Tika and Lucene: The Power of Search the JAX-RS Way - Andriy Redko, AppDirect
I would like to present the work Apache CXF team has done around integration with Apache Tika for binary content extraction, Apache Lucene for full-text search capabilities, using JAX-RS/REST search extensions.

avatar for Andriy Redko

Andriy Redko

Professional software developer, currently employed by AppDirect at Montreal, Canada. Joined Apache Foundation and Apache CXF project a year ago, actively participating in development process. Have no experience of speaking at conferences of such level.

Wednesday April 15, 2015 11:15am - 12:05pm CDT
Texas I

1:15pm CDT

Storm-Crawler: Real-Time Web Crawling on Apache Storm - Jake Dodd, Ontopic
It’s 2015, and the Web is a dynamic place. The web crawlers of old tackled the problems of batch-based page discovery and indexing. A modern web crawler must be able to handle real-time and ubounded streams of new content.

Storm-Crawler is a next-generation web crawler that discovers and processes content on the Web, in real-time with low latency. This open source (and Apache Licensed) project is built on the Apache Storm framework, which provides a great foundation for a distributed real-time web crawler.

In this presentation, Jake Dodd will deliver a conceptual and technical overview of Storm-Crawler, demonstrate its use in a production environment, and discuss the project’s ongoing and future development.


Jake Dodd

My name is Jake Dodd, and I’m a co-founder of a software company based in Santa Monica, California. I attended the University of Southern California (B.S./M.S. Astronautical Engineering, 2011/2012). After receiving my B.S., I co-founded a company and then worked for a contractor... Read More →

Wednesday April 15, 2015 1:15pm - 2:05pm CDT
Texas I