Loading…
ApacheCon NA 2015 has ended
content [clear filter]
Tuesday, April 14
 

2:00pm CDT

Development of IBM Watson with UIMA DUCC - Eddie Epstein, IBM Watson Group
DUCC is a new Linux cluster controller designed to scale out any Apache UIMA (tm) pipeline for high throughput collection processing jobs as well as for low latency real-time applications. DUCC stands for Distributed UIMA Cluster Computing. DUCC is running on cluster sizes from 1 to many 100s of machines.

This talk will cover the motivations that led to the creation of DUCC (the IBM Watson Jeopardy! Challenge), DUCC's benefits to developers and to computing cluster administrators, and demos of what you can do with it. It will explain why DUCC is well suited to run large memory Java analytics in multiple threads in ways that fully utilizes modern multi-core machines.

Attendees will leave with an appreciation of where DUCC "fits" in the UIMA set of subprojects, and an understanding of the value and applicability of using DUCC as part of their UIMA infrastructure deployments.

Speakers
EE

Eddie Epstein

IBM Watson Group
Eddie Epstein is a development manager in the IBM Watson Group and committer on the Apache UIMA (tm) project. For the past 9 years he has been manager of the IBM team doing ongoing development of Apache UIMA. The team's current focus is facilitating UIMA-based processing on large... Read More →


Tuesday April 14, 2015 2:00pm - 2:50pm CDT
Texas I

5:20pm CDT

Super8: Delivering HTTP Adaptive Streaming Video for all of Comcast - Neill A. Kipp, Comcast
The Video IP Engineering and Research (VIPER) team at Comcast is responsible for HTTP video delivery that exceeds 500M transactions per day. Our DASH VOD Origin is a Java Tomcat application built with Maven. Our Super8 just-in-time packager is an Apache HTTP module written in C that uses Apache Portable Runtime. We implement our forward and reverse caching proxies using Apache Traffic Server, and our browser PlayerPlatformAPI is an Apache Flex application. We ingest and maintain 70,000 hours of VOD content, compress it using H.264/AVC, and store it on a 2PB network attached storage system. Sourcing our content in DASH (Dynamic Adaptive Streaming over HTTP) lets our Super8 packager easily convert video into proprietary formats such as Apple HTTP Live Streaming (HLS) and Adobe HTTP Dynamic Streaming (HDS) for video playback on mobile, browser, and IP set-top devices all across the country.

Speakers
avatar for Neill A. Kipp

Neill A. Kipp

Distinguished Engineer, Comcast SPACE
Neill A. Kipp is a Distinguished Engineer for Comcast Video IP Engineering and Research (VIPER). Kipp designed and developed VIPER's Super8 video origination system that serves IP video for Xfinity TV and TV Go apps. Prior to joining Comcast, Kipp developed IPTV set-top guide applications... Read More →


Tuesday April 14, 2015 5:20pm - 6:10pm CDT
Texas I
 
Wednesday, April 15
 

10:00am CDT

Evaluating Text Extraction: Developing a Toolkit for Apache Tika™ - Tim Allison, The MITRE Corporation
Text extraction tools are essential for obtaining the textual content and metadata of computer files for use in a wide variety of applications, including search and natural language processing tools. Techniques and tools for evaluating text extraction tools are missing from academia and industry. Apache Tika™ detects file types and extracts metadata and text from many file types. Tika is a crucial component in a wide variety of tools, including Solr™, Nutch™, Alfresco, Elasticsearch and Sleuth Kit®/Autopsy®. In this talk, we will give an overview of a new initiative within Tika to create an evaluation toolkit that allows integrators to evaluate Tika and other content extraction systems on client-specific documents. This talk will end with a brief discussion of a related initiative to take this evaluation methodology public and evaluate Tika on large batches of public domain documents.

Note: This talk was co-authored with Paul M. Herceg, Lead Artificial Intelligence Engineer, The MITRE Corporation. Paul holds an M.S. in Computer Science and a B.S. in Computer Science-Mathematics, both from the State University of New York at Binghamton.

Speakers
avatar for Tim Allison

Tim Allison

Principal Artificial Intelligence Engineer, The MITRE Corporation
Tim has been working in natural language processing since 2002. In recent years, his focus has shifted to advanced search and content/metadata extraction. Tim is committer and PMC member on Apache PDFBox (since September 2016), and on Apache POI and Apache Tika since (July, 2013... Read More →


Wednesday April 15, 2015 10:00am - 10:50am CDT
Texas I
 
Filter sessions
Apply filters to sessions.