ApacheCon NA 2015 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Content [clear filter]
Tuesday, April 14

10:40am CDT

If You Have The Content, Then Apache Has the Technology! - Chris Mattmann, NASA
Within the ASF, there are a wide variety of projects with technologies
to help you store, retrieve, host, transform and generate content. This
talk will review the landscape of Apache content technologies, provide a
quick introduction to the more common and more interesting projects, and
flag up new and innovative features within them. It'll also highlight
talks from the rest of the week on many of the projects covered, so that
you'll know where and when to go to learn more about those projects and
technologies which catch your eye!

This is a new era of Nick Burch's famous talks that he has given at many ApacheCons to date.

avatar for Chris Mattmann

Chris Mattmann

Chief Architect & Adjunct Associate Professor, NASA Jet Propulsion Laboratory & USC
Chris Mattmann has a wealth of experience in software design, and in the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth science system satellites, to assisting... Read More →

Tuesday April 14, 2015 10:40am - 11:30am CDT
Texas I

11:40am CDT

Filtering Twitter with UIMA - Neal Lewis, IBM Watson Group
What's the best movie to see this weekend? This common question might be solved by asking "what does everyone on twitter like"? But it turns out writing a system to answer is complicated. First you pull an initial set of data based on keywords. Then you see most of your millions of tweets are noise and spam. Now you need filtering before you can do decision making. This can be a combination of heuristics (e.g., posters with no followers are probably spammers) and traditional NLP (e.g., tweets talking about movies in the future tense are not ones the poster has already seen).

Apache UIMA (tm) provides and ideal framework for developing and deploying such a system.

We demo a system to take a large pull from twitter, remove noise and calculate sentiment. We will show how a pipeline of a ~6 analytics can remove the majority of the junk and spam from the feed and get useful results.


Neal Lewis

Neal Lewis is a Research Engineer for the IBM Watson Group focusing on statistical methods in Natural Language Processing for improving Text Analytic outcomes in multiple domains including Social Media and Healthcare. His speaking experience includes countless speaking engagements... Read More →

Tuesday April 14, 2015 11:40am - 12:30pm CDT
Texas I

2:00pm CDT

Development of IBM Watson with UIMA DUCC - Eddie Epstein, IBM Watson Group
DUCC is a new Linux cluster controller designed to scale out any Apache UIMA (tm) pipeline for high throughput collection processing jobs as well as for low latency real-time applications. DUCC stands for Distributed UIMA Cluster Computing. DUCC is running on cluster sizes from 1 to many 100s of machines.

This talk will cover the motivations that led to the creation of DUCC (the IBM Watson Jeopardy! Challenge), DUCC's benefits to developers and to computing cluster administrators, and demos of what you can do with it. It will explain why DUCC is well suited to run large memory Java analytics in multiple threads in ways that fully utilizes modern multi-core machines.

Attendees will leave with an appreciation of where DUCC "fits" in the UIMA set of subprojects, and an understanding of the value and applicability of using DUCC as part of their UIMA infrastructure deployments.


Eddie Epstein

IBM Watson Group
Eddie Epstein is a development manager in the IBM Watson Group and committer on the Apache UIMA (tm) project. For the past 9 years he has been manager of the IBM team doing ongoing development of Apache UIMA. The team's current focus is facilitating UIMA-based processing on large... Read More →

Tuesday April 14, 2015 2:00pm - 2:50pm CDT
Texas I

3:00pm CDT

Big Data Graphs and Apache Tinkerpop 3 - David Robinson, IBM
Learn how Apache TinkerPop 3, a recent Incubator addition, facilitates the inclusion of graph system technologies into production or data science environments. Graph systems have experienced a renaissance due to a renewed focus on understanding connections between data features in data sets. TinkerPop 3 supports both OLAP graph processors as well as OLTP graph databases, which are two ways of interacting with graphs. Learn how TinkerPop's vendor-neutral graph APIs, its Gremlin domain specific query language, and its graph computation model work together to provide a comprehensive approach for interacting with graph systems. TinkerPop is already supported by numerous commercial and open source graph databases, including Apache Giraph and, in the future, Apache Spark. Discover why TinkerPop 3 makes graph technology accessible and interchangeable in an analytics/data solution.

avatar for David Robinson

David Robinson

Software Engineer, IBM
David Robinson is currently a software engineer with IBM. David uses open source software to build big data/analytics solutions for business. Graph technologies in the context of analytics and machine learning are of particular interest to him. He has been an architect and developer... Read More →

Tuesday April 14, 2015 3:00pm - 3:50pm CDT
Texas I

5:20pm CDT

Super8: Delivering HTTP Adaptive Streaming Video for all of Comcast - Neill A. Kipp, Comcast
The Video IP Engineering and Research (VIPER) team at Comcast is responsible for HTTP video delivery that exceeds 500M transactions per day. Our DASH VOD Origin is a Java Tomcat application built with Maven. Our Super8 just-in-time packager is an Apache HTTP module written in C that uses Apache Portable Runtime. We implement our forward and reverse caching proxies using Apache Traffic Server, and our browser PlayerPlatformAPI is an Apache Flex application. We ingest and maintain 70,000 hours of VOD content, compress it using H.264/AVC, and store it on a 2PB network attached storage system. Sourcing our content in DASH (Dynamic Adaptive Streaming over HTTP) lets our Super8 packager easily convert video into proprietary formats such as Apple HTTP Live Streaming (HLS) and Adobe HTTP Dynamic Streaming (HDS) for video playback on mobile, browser, and IP set-top devices all across the country.

avatar for Neill A. Kipp

Neill A. Kipp

Distinguished Engineer, Comcast SPACE
Neill A. Kipp is a Distinguished Engineer for Comcast Video IP Engineering and Research (VIPER). Kipp designed and developed VIPER's Super8 video origination system that serves IP video for Xfinity TV and TV Go apps. Prior to joining Comcast, Kipp developed IPTV set-top guide applications... Read More →

Tuesday April 14, 2015 5:20pm - 6:10pm CDT
Texas I
Wednesday, April 15

9:00am CDT

What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And Friends - Nick Burch, Quanticate
If you have one or two files, you can take the time to manually work out what they are, what they contain, and how to get the useful bits out (probably....). However, this approach really doesn't scale, mechanical turks or no! Luckily, there are Apache projects out there which can help!

In this talk, we'll first look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how to do all of this with Apache Tika, and how to dive down to the underlying libraries (including its Apache friends like POI and PDFBox) for specialist cases. Finally, we'll look a little bit about how to roll this all out on a Big Data or Large-Search case.

avatar for Nick Burch

Nick Burch

CTO, Quanticate
Nick began contributing to Apache projects in 2003, and hasn't looked back since! Most of the projects Nick has worked in belong in the "Content" space, such as Apache POI (ex-PMC Chair), Apache Tika and Apache Chemistry. As well as coding projects, Nick is also involved in a number... Read More →

Wednesday April 15, 2015 9:00am - 9:50am CDT
Texas I

10:00am CDT

Evaluating Text Extraction: Developing a Toolkit for Apache Tika™ - Tim Allison, The MITRE Corporation
Text extraction tools are essential for obtaining the textual content and metadata of computer files for use in a wide variety of applications, including search and natural language processing tools. Techniques and tools for evaluating text extraction tools are missing from academia and industry. Apache Tika™ detects file types and extracts metadata and text from many file types. Tika is a crucial component in a wide variety of tools, including Solr™, Nutch™, Alfresco, Elasticsearch and Sleuth Kit®/Autopsy®. In this talk, we will give an overview of a new initiative within Tika to create an evaluation toolkit that allows integrators to evaluate Tika and other content extraction systems on client-specific documents. This talk will end with a brief discussion of a related initiative to take this evaluation methodology public and evaluate Tika on large batches of public domain documents.

Note: This talk was co-authored with Paul M. Herceg, Lead Artificial Intelligence Engineer, The MITRE Corporation. Paul holds an M.S. in Computer Science and a B.S. in Computer Science-Mathematics, both from the State University of New York at Binghamton.

avatar for Tim Allison

Tim Allison

Principal Artificial Intelligence Engineer, The MITRE Corporation
Tim has been working in natural language processing since 2002. In recent years, his focus has shifted to advanced search and content/metadata extraction. Tim is committer and PMC member on Apache PDFBox (since September 2016), and on Apache POI and Apache Tika since (July, 2013... Read More →

Wednesday April 15, 2015 10:00am - 10:50am CDT
Texas I

11:15am CDT

Apache CXF, Tika and Lucene: The Power of Search the JAX-RS Way - Andriy Redko, AppDirect
I would like to present the work Apache CXF team has done around integration with Apache Tika for binary content extraction, Apache Lucene for full-text search capabilities, using JAX-RS/REST search extensions.

avatar for Andriy Redko

Andriy Redko

Professional software developer, currently employed by AppDirect at Montreal, Canada. Joined Apache Foundation and Apache CXF project a year ago, actively participating in development process. Have no experience of speaking at conferences of such level.

Wednesday April 15, 2015 11:15am - 12:05pm CDT
Texas I

1:15pm CDT

Storm-Crawler: Real-Time Web Crawling on Apache Storm - Jake Dodd, Ontopic
It’s 2015, and the Web is a dynamic place. The web crawlers of old tackled the problems of batch-based page discovery and indexing. A modern web crawler must be able to handle real-time and ubounded streams of new content.

Storm-Crawler is a next-generation web crawler that discovers and processes content on the Web, in real-time with low latency. This open source (and Apache Licensed) project is built on the Apache Storm framework, which provides a great foundation for a distributed real-time web crawler.

In this presentation, Jake Dodd will deliver a conceptual and technical overview of Storm-Crawler, demonstrate its use in a production environment, and discuss the project’s ongoing and future development.


Jake Dodd

My name is Jake Dodd, and I’m a co-founder of a software company based in Santa Monica, California. I attended the University of Southern California (B.S./M.S. Astronautical Engineering, 2011/2012). After receiving my B.S., I co-founded a company and then worked for a contractor... Read More →

Wednesday April 15, 2015 1:15pm - 2:05pm CDT
Texas I

2:15pm CDT

SQL over Anything with Apache Calcite - Tom Barber, Meteorite Consulting
Apache Calcite is already used in a number of high profile Apache projects. Calcite allows you to create SQL(JDBC Compliant) interfaces over pretty much any inspectable object you want.

During this presentation we'll look at the history of Apache Calcite, various use cases, existing adapters. We'll also take a look at how to create simple interfaces to various objects, how to join datasources using data federation and caching options available to improve performance.

avatar for Tom Barber

Tom Barber

Technical Director, Spicule LTD
Tom Barber is the director of Meteorite BI and Spicule BI. A member of the Apache Software Foundation and regular speaker at ApacheCon, Tom has a passion for simplifying technology. The creator of Saiku Analytics and open source stalwart, when not working for NASA, Tom currently deals... Read More →

Wednesday April 15, 2015 2:15pm - 3:05pm CDT
Texas I

3:15pm CDT

How Apache Gets GoT to Your iPad - Philip Sorber, Comcast
Comcast has millions of customers nationwide and serving them "over the top" video and other content efficiently is a daunting task. In this talk Phil Sorber will explain how Comcast does this leveraging Apache projects and commodity hardware. He will explain why decisions were made and what was learned from trying to execute this monumental task.


Phillip Sorber

Principal Engineer, Comcast
Phil Sorber is employed by the next generation content delivery service team at Comcast to work on ATS integration. He is an ATS PMC member and ASF Member. He has spoken at ApacheCon in the past as well as other conferences. He is an avid Open Source proponent and has contributed... Read More →

Wednesday April 15, 2015 3:15pm - 4:05pm CDT
Texas I