ApacheCon NA 2015 has ended
Back To Schedule
Wednesday, April 15 • 1:15pm - 2:05pm
Storm-Crawler: Real-Time Web Crawling on Apache Storm - Jake Dodd, Ontopic

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

It’s 2015, and the Web is a dynamic place. The web crawlers of old tackled the problems of batch-based page discovery and indexing. A modern web crawler must be able to handle real-time and ubounded streams of new content.

Storm-Crawler is a next-generation web crawler that discovers and processes content on the Web, in real-time with low latency. This open source (and Apache Licensed) project is built on the Apache Storm framework, which provides a great foundation for a distributed real-time web crawler.

In this presentation, Jake Dodd will deliver a conceptual and technical overview of Storm-Crawler, demonstrate its use in a production environment, and discuss the project’s ongoing and future development.


Jake Dodd

My name is Jake Dodd, and I’m a co-founder of a software company based in Santa Monica, California. I attended the University of Southern California (B.S./M.S. Astronautical Engineering, 2011/2012). After receiving my B.S., I co-founded a company and then worked for a contractor... Read More →

Wednesday April 15, 2015 1:15pm - 2:05pm CDT
Texas I

Attendees (0)