Loading…
ApacheCon NA 2015 has ended
Back To Schedule
Wednesday, April 15 • 3:15pm - 4:05pm
Real-time Big Data Analytics with Apache Spark and Apache Solr - Timothy Potter, LucidWorks

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark.

Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification.

Speakers
avatar for Timothy Potter

Timothy Potter

Senior Software Engineer, Lucidworks
Timothy Potter is a senior member of the engineering team at Lucidworks and PMC member of the Apache Lucene/Solr project. At Lucidworks, Tim leads a team that builds tools to empower business analysts and data scientists to search, analyze, and visualize large-scale enterprise data... Read More →


Wednesday April 15, 2015 3:15pm - 4:05pm CDT
Texas VI

Attendees (0)