Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Monday, April 13 • 5:00pm - 5:50pm
Content Extraction from Images and Video in Tika - Chris Mattmann, NASA

Sign up or log in to save this to your schedule and see who's attending!

The DARPA Memex project and NSF Polar Cyber Infrastructure project have been funding a ton of improvements in the Apache Tika framework. Apache Tika is a content detection and analysis toolkit that has support for file type identification (MIME identification) for over 1200 types of files; extraction of text and metadata and language information from those files; even translation!

Though Tika supports all those file types, its support for extraction from images, and videos has been lacking. Via the Memex and NSF projects, we have expanded Tika to extract text from images (using Tesseract OCR); and are actively integrating other analyses (Visual Sentiment analysis; geo-location using toolkits like GDAL; and analyes of scenes and objects).

I'll tell you all about how to install and use these improvements and even illustrate them in a cool example from Memex and NSF Polar.

Speakers
avatar for Chris Mattmann

Chris Mattmann

Chief Architect & Adjunct Associate Professor, NASA Jet Propulsion Laboratory & USC
Chris Mattmann has a wealth of experience in software design, and in the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth science system satellites, to assisting graduate students at the University of Southern California (his Alma mater) in the study of software architecture, all the way to helping industry and open... Read More →


Monday April 13, 2015 5:00pm - 5:50pm
Texas II

Attendees (23)