Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, April 15 • 3:15pm - 4:05pm
Real-time Big Data Analytics with Apache Spark and Apache Solr - Timothy Potter, LucidWorks

Sign up or log in to save this to your schedule and see who's attending!

Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark.

Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification.

Speakers
avatar for Timothy Potter

Timothy Potter

Senior Software Engineer, Lucidworks
Timothy Potter is a senior member of the engineering team at Lucidworks and PMC member of the Apache Lucene/Solr project. At Lucidworks, Tim leads a team that builds tools to empower business analysts and data scientists to search, analyze, and visualize large-scale enterprise data sets using Fusion. Tim is the original designer of the Spark-Solr open source project and actively contributes to a number of open source projects for integrating Solr... Read More →


Wednesday April 15, 2015 3:15pm - 4:05pm
Texas VI