Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, April 15 • 1:15pm - 2:05pm
Storm-Crawler: Real-Time Web Crawling on Apache Storm - Jake Dodd, Ontopic

Sign up or log in to save this to your schedule and see who's attending!

It’s 2015, and the Web is a dynamic place. The web crawlers of old tackled the problems of batch-based page discovery and indexing. A modern web crawler must be able to handle real-time and ubounded streams of new content.

Storm-Crawler is a next-generation web crawler that discovers and processes content on the Web, in real-time with low latency. This open source (and Apache Licensed) project is built on the Apache Storm framework, which provides a great foundation for a distributed real-time web crawler.

In this presentation, Jake Dodd will deliver a conceptual and technical overview of Storm-Crawler, demonstrate its use in a production environment, and discuss the project’s ongoing and future development.

Speakers
JD

Jake Dodd

Ontopic
My name is Jake Dodd, and I’m a co-founder of a software company based in Santa Monica, California. | | I attended the University of Southern California (B.S./M.S. Astronautical Engineering, 2011/2012). | | After receiving my B.S., I co-founded a company and then worked for a contractor on a national security space program at Air Force SMC. In my time there, I built a modeling/sim application that receives ongoing use for a number of... Read More →


Wednesday April 15, 2015 1:15pm - 2:05pm
Texas I

Attendees (36)