Modeling, Simulation & Gaming Working Group

September 28, 2022
11:30 am - 1:00 pm ET



Continuing the Tech Talk Series on Synthetic Data for AI, Dr. Edgar Bernal, Chief Scientist, FLX AI, joins us to discuss Bridging the Real-to-Synthetic Cross-Domain Gap in a Data-Efficient Manner.

Dr. Edgar Bernal, Chief Scientist, FLX AI

In recent years, technological advances in deep and machine learning have enabled the automation of certain processing and analysis tasks for enormous amounts of data beyond the scale of human capacity. Despite the popularity of deep learning algorithms, they possess shortcomings which are yet to be fully addressed. Most notably, training the traditionally large networks involves substantial data quantities in order to converge to robust and well-performing models. This requirement becomes particularly problematic in scenarios where data is inherently scarce or where data acquisition and/or labeling demand expert involvement, as is often the case in Geospatial tasks. One approach that has shown promise at ameliorating the impact of the significant data and labeling requirements relies on the generation of synthetic data/label pairs. This approach, however, has limitations including significant time and expertise requirements to render realistic, useful data, as well as the inherent cross-domain gap between the synthetically generated and real-world data. In this work, we propose to bridge that gap in a data-efficient manner by implementing and deploying a suite of metrics aimed at quantifying the extent of the cross-domain discrepancies as well as the statistical properties of the data. We show how the insights yielded by the deployment of the metrics can be leveraged to estimate the relation between the parameters of the synthetic data generator and the semantic properties of the synthetic data, which in turn can be used to design a synthetic data generation strategy that maximizes the discriminability of the generated samples, thus leading to significantly improved data-efficiency over current approaches.

Edgar A. Bernal’s focuses on achieving high-level image and video abstraction and understanding through statistical learning. His current interests include image and video analytics, computer vision, image- and video-based action and activity recognition, machine and deep learning, and multimodal fusion.

The Modeling, Simulation & Gaming Working Group (MS&G WG) collaborates with industry, academia, and government leaders to advance GEOINT tradecraft capable of delivering authoritative and relevant at the point of need.