ApacheCon NA 2015 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Big Data: Big Picture [clear filter]
Tuesday, April 14

4:20pm CDT

Apache Ignite (incubating): Anatomy of an In-Memory Data Fabric - Dmitriy Setrakyan, GridGain
In this presentation, we will describe the strategy and architecture behind Apache IgniteTM (incubating), a high-performance, distributed in-memory data management software layer that has been designed to operate between both new and existing data sources and applications, boosting application performance and scale by orders of magnitude. We will dive into the technical details of distributed clusters and compute grids as well as distributed data grids, and provide code samples for each. As integral parts of an In-Memory Data Fabric, we will also cover distributed streaming, CEP and Hadoop acceleration. This presentation is particularly relevant for software developers and architects who work on the front lines of high-speed, low-latency big data systems, high-performance transactional systems and real-time analytics applications. - Apache Ignite is either a registered trademark or a trademark of the Apache Software Foundation in the United Stated and/or other countries.

avatar for Dmitriy Setrakyan

Dmitriy Setrakyan

Co-Founder and EVP of Engineering, GridGain
Dmitriy Setrakyan is co-founder and EVP of Engineering at GridGain Systems. Dmitriy has been designing, architecting and developing software and applications for over 15 years and has expertise in the development of distributed computing systems, middleware platforms, financial trading... Read More →

Tuesday April 14, 2015 4:20pm - 5:10pm CDT
Texas I
Wednesday, April 15

9:00am CDT

Kafka at Scale: Multi-Tier Architectures - Todd Palino, LinkedIn
If data is the lifeblood of high technology, Apache Kafka is the circulatory system in use at LinkedIn. It is used for moving every type of data around between systems, and it touches virtually every server, every day. This can only be accomplished with multiple Kafka clusters, installed at several sites, and they must all work together to assure no message loss, and almost no message duplication. In this presentation, we will discuss the architectural choices behind how the clusters are deployed, and the tools and processes that have been developed to manage them. Todd Palino will also discuss some of the challenges of running Kafka at this scale, and how they are being addressed both operationally and in the Kafka development community.

avatar for Todd Palino

Todd Palino

Staff Site Reliability Engineer, http://linkedin.com/
Todd Palino is a Staff Site Reliability Engineer at LinkedIn, tasked with keeping Zookeeper, Kafka, and Samza deployments fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification... Read More →

Wednesday April 15, 2015 9:00am - 9:50am CDT
Texas VI

10:00am CDT

From MapReduce to Spark with Apache Crunch - Micah Whitacre, Cerner Corporation
With companies having made heavy investments in MapReduce the emergence of Apache Spark as a new processing platform is both tempting and daunting. Refactoring code or altering processing steps can be a significant investment. The Apache Crunch project can help with the transition utilizing its built in support for reusing code in both execution environments. Teams can make incrementally migrate their processing workflows or utilize the appropriate execution engine depending on their use case while still utilizing a common set of concepts provided by Apache Crunch. The presentation will cover the basics of Apache Spark, how to reuse the same code in both MapReduce and Spark, as well as differences with using Apache Crunch over plain Apache Spark.

avatar for Micah Whitacre

Micah Whitacre

Software Architect, Cerner Corporation
Micah is a committer on the Apache Crunch project as well as a Software Architect for Cerner Corporation, a leading provider of healthcare technology. For almost a decade he has worked on building infrastructure and reusable assets. In the last few years his focus has shifted towards... Read More →

Wednesday April 15, 2015 10:00am - 10:50am CDT
Texas VI

11:15am CDT

Unleashing the Silicon Forest Fire - the Open Sourcing of GemFire - Brian Dunlap, Southwest Airlines; Sudhir Menon; Pivotal; Jags Ramnarayan, Pivotal; Dan Smith, Pivotal
Pivotal GemFire has had a long and winding journey, starting in 2002, winding through VMware, Pivotal, and finding it’s way to Apache in 2015.  Companies using GemFire have deployed it in some of the most mission critical time sensitive applications in their enterprises, making sure tickets are purchased in a timely fashion, hotel rooms are booked, trades are made, and credit card transactions are cleared. Come to this session to understand:
  • A brief history of GemFire
  • Architecture and use cases
  • Why we are taking GemFire Open Source
  • Design philosophy and principles
But most importantly: how you can join this exciting community to work on the bleeding edge in-memory platform.

avatar for Brian Dunlap

Brian Dunlap

Senior Software Engineer, Southwest Airlines
As a tech lead at Southwest Airlines, Brian has more than 15 years of experience in domains including crew scheduling, passenger reservations, flight operations, and optimization. He is currently using Gemfire on a large-scale project that will replace several legacy operational... Read More →
avatar for Sudhir Menon

Sudhir Menon

Sudhir Menon is one of the key architects for the Gemfire & SQLFire. Sudhir is the Head of Products for all Real Time and Big Data products at Pivotal. He holds multiple patents in the areas of scaled up networking systems. His expertise in distributed data management spans multiple... Read More →
avatar for Jags Ramnarayan

Jags Ramnarayan

Jags is the Chief Architect for “fast data” products(GemFire) at Pivotal and serves in the extended leadership team of the company. At pivotal and previously at VMWare he led the technology direction for its high performance distributed data Grid and in-memory DB products. He... Read More →
avatar for Dan Smith

Dan Smith

Staff Engineer, Pivotal
Dan Smith has been writing code ever since he typed in some BASIC from the back of a magazine in elementary school. For the last 10 years Dan has been working in distributed systems development. He's currently a Staff Engineer at Pivotal working on GemFire.

Wednesday April 15, 2015 11:15am - 12:05pm CDT
Texas VI

1:15pm CDT

Delivering Systems of Insight by Leveraging the Hadoop Ecosystem - Eberhard Hechler, IBM Germany R&D Lab
This presentation will illustrate how to complement existing 'traditional' analytical capabilities with Big Data analytics, e.g. by using text analytics and Natural Language Processing (NLP) as part of IBM InfoSphere BigInsights. This leverages key Hadoop components (MapReduce programming model, HDFS, HBase, Zookeeper, etc.) to analyse data from Enterprise-owned systems of engagement (e.g. call center transcripts, e-mail traffic, Facebook), and data from external social media sites (e.g. Twitter tweeds, Facebook sites, Blogs) and putting this in context with transaction insight from data on IBM z Systems. We will provide examples on how Hadoop systems - by using HBase and Hive with corresponding connectors to existing systems – and Big SQL on HDFS and Hive will enrich analytical insight.


Eberhard Hechler

Executive Architect, IBM Germany R&D Lab
Eberhard is an Executive Architect working at the IBM Germany R&D Lab. He is a member of IBM DB2 Analytics Accelerator development. After 2,5 years at the IBM Kingston Development Lab in New York, he worked in software development, performance optimization and benchmarking, IT/solution... Read More →

Wednesday April 15, 2015 1:15pm - 2:05pm CDT
Texas VI

3:15pm CDT

Real-time Big Data Analytics with Apache Spark and Apache Solr - Timothy Potter, LucidWorks
Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark.

Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification.

avatar for Timothy Potter

Timothy Potter

Senior Software Engineer, Lucidworks
Timothy Potter is a senior member of the engineering team at Lucidworks and PMC member of the Apache Lucene/Solr project. At Lucidworks, Tim leads a team that builds tools to empower business analysts and data scientists to search, analyze, and visualize large-scale enterprise data... Read More →

Wednesday April 15, 2015 3:15pm - 4:05pm CDT
Texas VI