ApacheCon NA 2015 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Big Data Technologies [clear filter]
Tuesday, April 14

10:40am CDT

Apache Slider Makes Running Applications on YARN a Breeze - Zhihong Yu, Hortonworks
The YARN framework is getting more popular as foundation for managing cluster resources.
However, developing / deploying / managing distributed applications on YARN cluster requires expertise.

Apache Slider is a YARN application to deploy existing distributed applications on YARN, monitor them and make them larger or smaller as desired -even while the application is running.
Slider allows users to create on-demand applications in a YARN cluster. It allow users to configure different application instances differently. Application instances can be stopped / suspended / resumed as needed. Docker based app packaging is also supported.

In this presentation, we will review what applications need when deployed on YARN, discuss how Slider makes application deployment and management easier, the challenges Slider faces and showcase applications that are ready to be deployed through Slider.


Zhihong Yu

Staff Engineer, VMware
I have been Apache HBase PMC for 5 and half years.I am also committer for Apache Slider and Apache Bahir.I contribute to Apache Phoenix and Apache Spark.I have presented at the past 3 ApacheCon NA events.

Tuesday April 14, 2015 10:40am - 11:30am CDT
Texas VI

11:40am CDT

Apache Flink: Fast and Reliable Large-Scale Data Processing - Fabian Hueske, Data Artisans
Apache Flink is one of the latest addition to the Apache family of data processing engines. Flink’s design aims to provide a system that is as fast as in-memory engines, while providing the reliability of Hadoop. Flink contains programming APIs in Java and Scala that unify batch processing and data streaming applications, a translation stack for transforming these programs to parallel data flows, and a runtime that supports both proper streaming and batch processing for executing these data flows in large compute clusters. Flink is compatible with the Hadoop ecosystem, and has a growing community of currently more than 70 contributors from industry and academia. In this presentation, Fabian will provide an overview of Flink both from the user standpoint and the system’s internal model, and discuss the project’s technical roadmap for the future.

avatar for Fabian Hueske

Fabian Hueske

Co-Founder & Software Engineer, data Artisans
Fabian Hueske is committer and PMC member at Apache Flink and co-founder of data Artisans, a Berlin-based company that is developing and contributing to Apache Flink. As a PhD student at Technische Universität Berlin he was part of the team that initiated and built the Stratosphere... Read More →

Tuesday April 14, 2015 11:40am - 12:30pm CDT
Texas VI

2:00pm CDT

Introduction to Apache Kafka - Jun Rao, Confluent
Apache Kafka has been used in a growing number of companies such as LinkedIn, Netflix, and Uber. I will first describe a common pattern of how those companies are using Kafka. All data including business metrics, operational metrics, logs and database records are collected as structured data into Kafka in real time. These data are then fed into batch processing systems such as Hadoop and data warehouses, as well as various real time systems such as search indexes, stream processing frameworks, graph libraries, and monitoring engines.
Next, I will explain some of the underlying technologies in Kafka that enable this common usage pattern. In particular, I will cover (1) the scale-out architecture of Kafka; (2) how Kafka achieves high throughput for both real time and non real time consumption; (3) how Kafka provides durability and availability.


Jun Rao

Jun Rao is currently a co-founder of Confluent, a company that provides a stream data platform on top of Apache Kafka. Before Confluent, Jun Rao was a senior staff engineer at LinkedIn where he led the development of Kafka. Before LinkedIn, Jun Rao was a researcher at IBM's Almaden... Read More →

Tuesday April 14, 2015 2:00pm - 2:50pm CDT
Texas VI

3:00pm CDT

Keep Me in the Loop: INotify in the Apache Hadoop Distributed Filesystem - Colin McCabe, Cloudera
An elephant never forgets-- at least, not if that elephant is Apache Hadoop. The Hadoop Distributed Filesystem (HDFS) can store petabytes of data. Services that run on top of HDFS often want to cache or index some of that data. When files in HDFS change, or when more files are added, these services need to update their caches and indices.

The new HDFS inotify API allows applications to listen for changes to files stored in HDFS. Instead of periodically rescanning the filesystem, applications can simply receive notifications about changes. In this talk, I will cover the design goals for INotify and how we accomplished them. I will talk about how other projects can make effective use of the new API. Finally, I'll discuss some ideas we might explore in the future.


Colin McCabe

Software Engineer, Cloudera
Colin McCabe is a Platform Software Engineer at Cloudera, where he works on HDFS and related technologies. He is a committer on HDFS. Prior to joining Cloudera, he worked on the Ceph Distributed Filesystem, and the Linux kernel, among other things. He studied Computer Science and... Read More →

Tuesday April 14, 2015 3:00pm - 3:50pm CDT
Texas VI

4:20pm CDT

Significantly Speedup Real-World Big Data Applications Using Apache Spark - Grace Huang, Intel SSG
With the bloom of Apache spark, various big data applications shift to Spark pool to pursue better user experience. During the past, we partnered with several top China internet companies to build their next generation big data engine on Spark – including graph analysis, interactive, batch OLAP/BI and real-time analytics. In this talk, we will share our experience to further optimize not only the real-world applications but also in Apache Spark, which brought x5-100 speedup versus their original Map Reduce implements. Several lessons are gained for better user experience from building real-world Spark applications in production environment, which will be shared as well.


Grace Huang

Grace Huang is currently an engineering manager in Intel SSG (Software and Services Group), responsible for advanced Big Data technology enhancement and optimization including Haodop, Spark and etc. Prior to that, she had been working in the big data area in Intel for over 6 years... Read More →

Tuesday April 14, 2015 4:20pm - 5:10pm CDT
Texas VI

5:20pm CDT

Apache Bigtop: In-Memory Analytic Software stack.Next - Konstantin Boudnik, Apache Software Foundation
Apache Bigtop has created the de-facto standard in how Hadoop-based stacks are developed, delivered, and managed. Now we are doing this again! This time we are going to deliver Bigtop 1.x that is focused on not just BigData, but FastData. Next generation of Apache data processing stack will focus on in-memory and transactional processing of the large amounts of data.

avatar for Konstantin Boudnik

Konstantin Boudnik

CEO, Memcore
Dr.Konstantin Boudnik, co-founder and CEO of Memcore Inc, is one of the early developers of Hadoop and a co-author of Apache BigTop, the open source framework and the community around creation of software stacks for data processing projects. With more than 20 years of experience in... Read More →

Tuesday April 14, 2015 5:20pm - 6:10pm CDT
Texas VI
Wednesday, April 15

2:15pm CDT

Mesos + YARN = Myriad. Why This is a Game Changer for Big Data Developers - Adam Bordelon, Mesosphere
It has become common practice to statically partition a datacenter into siloed clusters for each application. But there is an increasing need to integrate Apache Hadoop with other datacenter services, ideally co-locating the data in HDFS/HBase with the services that need it. Myriad, recently submitted to the Apache Incubator, integrates Apache YARN into Apache Mesos, allowing Apache Hadoop jobs to run alongside other applications, all dynamically sharing a single pool of resources. Apache Mesos enables efficient resource sharing and isolation across a variety of distributed applications including Apache Spark, MPI, Jenkins, traditional linux applications, and docker images. In this talk, Adam will explain how Myriad enables Apache YARN and Apache Mesos to share the same physical datacenter resources, improving overall cluster utilization and operational efficiency.


Adam Bordelon

Adam is a distributed systems architect at Mesosphere and an Apache Mesos committer. Before joining Mesosphere, Adam was lead developer on the Hadoop core team at MapR Technologies, he developed distributed systems for personalized recommendations at Amazon, and he re-architected... Read More →

Wednesday April 15, 2015 2:15pm - 3:05pm CDT
Texas II

3:15pm CDT

Implementing a Highly-Scalable Stock Prediction System with R, GemFire and Spring XD - Fred Melo, Pivotal and William Markito, Pivotal

Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud. This session is going to walk-through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the web interface that will consume data through a RESTful endpoint, real-time. The solution leverages  in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms

avatar for Fred Melo

Fred Melo

Director, Product Management and Tech Marketing, Pivotal
Fred has been in the software industry for +15 years. Currently working as a Director of Product Management and Tech Marketing for Pivotal, his job is to help customers from all industries build business-relevant Big Data, Fast Data, Mobile and IoT solutions. In recent past, he led... Read More →
avatar for William Markito Oliveira

William Markito Oliveira

Enterprise Architect, Pivotal
After spending years focusing on Enterprise Integration Systems, William has narrowed his focus and specialized on Java development, with emphasis on Service Oriented Architectures (SOA), Distributed Systems and Open source. Currently working at Pivotal helping customers mainly on... Read More →

Wednesday April 15, 2015 3:15pm - 4:05pm CDT
Texas II