Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Monday, April 13 • 11:45am - 12:35pm
Applying Apache Hadoop to NASA’s Big Climate Data - Glenn Tamkin, NASA

Sign up or log in to save this to your schedule and see who's attending!

The NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance analytics because it optimizes computer clusters and combines distributed storage of large data sets with parallel computation. We have built a platform for developing new climate analysis capabilities with Hadoop.

Hadoop is well known for text-based problems. Our scenario involves binary data. So, we created custom Java applications to read/write data during the MapReduce process. Our solution is unique because it: a) uses a custom composite key design for fast data access, and b) utilizes the Hadoop Bloom filter, a data structure designed to identify rapidly and memory-efficiently whether an element is present.

This presentation, which touches on motivation, use cases, and lessons learned, will explore the software architecture, including all Apache contributions (Avro, Maven, etc.).

Speakers
avatar for Glenn Tamkin

Glenn Tamkin

NASA
Mr. Tamkin is the lead software engineer and architect for the NASA Center for Climate Simulation’s (NCCS) Climate Informatics project. Recently, he has built a Hadoop-based system designed to perform analytics across NASA’s Big Climate Data. Prior endeavors extended from spacecraft flight dynamics to space shuttle support spanning 17 years at NASA. Mr. Tamkin has also architected one of the first nation-wide web-service based... Read More →


Monday April 13, 2015 11:45am - 12:35pm
Texas II

Attendees (34)