Professor Xinbing Wang and Xiaoying Gan’s team have made significant progress in the interdisciplinary research of data-driven geoscience knowledge discovery

Recently, the research team led by Professor Xinbing Wang and Professor Xiaoying Gan from the Department of Electronic Engineering at Shanghai Jiao Tong University, in collaboration with the School of Oceanography at Shanghai Jiao Tong University, conducted interdisciplinary research and proposed a deep map learning model driven by sparse ocean observation data called OxyGenerator. OxyGenerator accurately reconstructed global ocean dissolved oxygen data from 1920 to 2023 based on observation data, providing strong data support for analyzing complex oxygen cycles and climate regulation. This research represents a positive attempt at the interdisciplinary integration of artificial intelligence and oceanography. The results were published under the title "OxyGenerator: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning" at the International Conference on Machine Learning (ICML).


Figure 1. The Information of the published paper.


Research Background

Dissolved oxygen in the ocean is a key factor in maintaining the functionality of marine ecosystems. However, with the intensification of global warming and human activities, the ocean has shown a trend of deoxygenation in recent years, causing serious consequences for fisheries development, climate regulation, and other aspects. To comprehensively understand ocean deoxygenation and uncover the patterns of the oxygen cycle and its changes from effective data, Schmidtko et al. published a study titled "Decline in global oceanic oxygen content during the past five decades" in Nature 2017. This study was the first to use spatial interpolation methods to reconstruct and quantitatively analyze global ocean dissolved oxygen data since 1960. However, to assess the specific impacts of human activities since the Industrial Revolution, reconstructing dissolved oxygen records from the past fifty years is far from sufficient. The highly sparse historical observations and the limited accuracy of spatial interpolation methods have become major bottlenecks in addressing this issue.


Results and Impacts

To solve above challenges, the research team gathered a total of 6 billion ocean dissolved oxygen data since 1900, including survey data from research vessels, Argo float observation data, and real-time deep-sea mooring observations, amounting to approximately 2TB of stored data. They performed unified quality control on this data. Considering the irregular boundaries of ocean water bodies and the non-uniform characteristics of highly sparse observational data, they established a four-dimensional spatiotemporal graph network using graph modeling concepts. This approach fully considered spatial correlations in geography and high-value observational samples, enabling the message passing of information across time and space between observed and missing data.


Figure 2. Five main oceanic observation databases, including World Ocean Database 2018, CLIVAR and Carbon Hydrographic Database, Argo, Global Ocean Data Analysis Project version2.2022, Geotraces IDP


Given that changes in ocean dissolved oxygen concentrations are influenced by both physical and biochemical variables, the team first used a multilayer perceptron (MLP) to perform nonlinear feature extraction on multi-element data. They then utilized a bidirectional long short-term memory (BiLSTM) network to explore the temporal variation characteristics of the dissolved oxygen observations. Additionally, since global oceans exhibit heterogeneous spatiotemporal correlations across different historical periods and regions, they proposed a Zoning-Varying Message-Passing mechanism inspired by the concept of oceanographic zonation. This mechanism uses a hypernetwork parameter generation algorithm to perform affine transformations on graph messages in different zones, enabling the transmission of graph information with adaptable zoning. Finally, integrating domain knowledge from oceanography helps calibrate the uncertainty of neural networks. The study incorporated the Redfield Ratio, which represents the ideal balance of nitrogen, phosphorus, and oxygen in the ocean, into a gradient regularization method with chemical knowledge embedding. This approach aims to minimize signal anomalies in the reconstructed results.


Figure 3. The framework of proposed OxyGenerator


Through multi-fold cross-validation with observed variables and comparison with three expert-driven CMIP6 numerical models, the proposed OxyGenerator achieved the best performance across four reconstruction evaluation metrics. The Mean Absolute Percentage Error (MAPE) was reduced by 38.77%, significantly lowering reconstruction errors in open sea areas. In regions with abundant observational data, such as the Western Pacific, and areas affected by special environmental conditions, like the Black Sea, OxyGenerator performed exceptionally well, maintaining stable model performance over the century. Additionally, the results effectively reconstructed the disturbances to dissolved oxygen distribution caused by historical El Niño/La Niña events and accurately reflected the long-term water movement characteristics such as thermohaline circulation.

The reconstructed data indicates that over the past century, the minimum oxygen zone (OMZ30), where dissolved oxygen levels are below 30 µmol/kg, has expanded rapidly. By 2023, its area had more than tripled compared to 1920. This finding is significant for understanding the long-term changes in OMZ, aiding in better future ocean monitoring and conservation efforts. Looking ahead, the team plans to continue advancing interdisciplinary research in data-driven geoscientific discovery, actively developing advanced technologies to empower research in the AI for Science field.


Figure 4. The reconstruction results of the ocean's oxygen minimum zone (OMZ) from 1920 to 2023, where the yellow contour lines indicate the area where the dissolved oxygen minimum is below 30 µmol/kg.


Information of the Research Team

Bin Lu, the PhD student from the Department of Electronic Engineering at Shanghai Jiao Tong University, is the first author of the paper, and Ze Zhao, the master's student, is the second author. Professor Xiaoying Gan is the corresponding author. The research was guided by Professors Xinbing Wang from the Department of Electronic Engineering, Academician Jing Zhang, Professors Lei Zhou and Yuntao Zhou from the School of Oceanography, and Academician Chenghu Zhou from the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. This work was supported by the National Natural Science Foundation of China and the National Key R&D Program.


Figure 5. From left to right: Bin Lu, Ze Zhao, Xiaoying Gan, Xinbing Wang, Jing Zhang, Chenghu Zhou.



[ 2024-07-03 ]