Scientists study and monitor Earth to better understand the planet’s environmental system and the impact of climate change. As analyses grow in volume, innovative approaches are required to pull knowledge from large datasets. One innovative approach is to utilize foundational models.
Foundation models are types of AI models that are trained on a broad set of unlabeled data, are used for different tasks and apply information about one situation to another. These models have rapidly advanced the field of natural language processing technology over the past half decade, and IBM is pioneering applications of foundation models beyond language.
Through a collaboration, IBM and NASA’s Marshall Space Flight Center plan to develop several new technologies to extract insights from Earth observations by applying AI foundation model technology to NASA's Earth-observing satellite data.
"Foundation models have proven successful in natural language processing, and it's time to expand that to new domains and modalities important for business and society," said Raghu Ganti, principal researcher at IBM.
One project is to train an IBM geospatial intelligence foundation model on NASA's Harmonized Landsat Sentinel-2 dataset, a record of land cover and land use changes captured by Earth-orbiting satellites. This specific model helps researchers provide critical analysis of the planet's environmental systems. It is accomplished by analyzing petabytes of satellite data to identify changes in the geographic footprint of phenomena such as natural disasters, cyclical crop yields and wildlife habitats.
The second output that comes from this collaboration is expected to be an easily searchable corpus of Earth science literature. IBM has an NLP model trained on nearly 300,000 Earth science journal articles to organize the literature and make it easier to discover new knowledge.
The NLP model is trained on Red Hat’s OpenShift software and uses PrimeQA, IBM's open-source multilingual question-answering system. Beyond providing a resource to researchers, the new language model for Earth science could be infused into NASA's scientific data management and stewardship processes.
These are not the only projects planned by IBM and NASA. Other joint projects include constructing a foundation model for weather and climate prediction using MERRA-2, a dataset of atmospheric observations.
"Applying foundation models to geospatial, event-sequence, time-series, and other non-language factors within Earth science data could make enormously valuable insights,” said Ganti. “Ultimately, it could facilitate a larger number of people working on some of our most pressing climate issues."
The collaboration between NASA and IBM is part of NASA's Open-Source Science Initiative, a commitment to building an inclusive, transparent and collaborative open science community over the next decade.
Edited by Greg Tavarez