尹志 (2022-06-27 08:22):
#paper doi:10.1016/j.tics.2021.11.008 Trends in Cognitive Sciences, Vol 26, Issue 2, 2022, Next-generation deep learning based on simulators and synthetic data. 目前的主流的深度学习应用主要利用了监督学习的技术,但这需要大量的有标注的数据,考虑到获取大量有标注数据的困难(经济上、效率上),这就成为了深度学习发展的瓶颈。为了解决这个问题,一个有可能的解决方案是充分利用合成数据。本文就综述了这一主题的情况。文章将合成数据的来源分为了三种类型,分别是渲染方式下产生的,简单的说就是在各类建模渲染过程中产生的;各类生成模型产生的;融合模型产生的。再具体一点,第一类是模拟建模过程产生的,其具有较好的物理背景和流程;第二类是各类具有统计背景的生成模型基于对数据的分布进行的估计产生的;第三类则是将不同的domain的数据进行融合产生的,比如将前景域和背景域做各种融合。当然,考虑到合成数据和真实数据还存在很多gap,因此类似域适配这样的技术也在不断发展,使得合成数据更好的被使用。除此之外,这些合成数据的生成方案,大量借鉴了人类自然学习的模式,因此也促成了双向发展的趋势。即,数据合成的方案上不断借鉴自然学习的各种特点,而数据合成的研究也不断反向推动生物系统的各种性质的理解。最后,文章总结了利用合成数据进行科学探索、物理学研究、多模态学习等领域的特点及相关挑战,这一块的内容非常精炼,对相关主题感兴趣的小伙伴可以通过参考文献进行扩展,非常有价值的研究线索。
Next-generation deep learning based on simulators and synthetic data
翻译
Abstract:
Deep learning (DL) is being successfully applied across multiple domains, yet these models learn in a most artificial way: they require large quantities of labeled data to grasp even simple concepts. Thus, the main bottleneck is often access to supervised data. Here, we highlight a trend in a potential solution to this challenge: synthetic data. Synthetic data are becoming accessible due to progress in rendering pipelines, generative adversarial models, and fusion models. Moreover, advancements in domain adaptation techniques help close the statistical gap between synthetic and real data. Paradoxically, this artificial solution is also likely to enable more natural learning, as seen in biological systems, including continual, multimodal, and embodied learning. Complementary to this, simulators and deep neural networks (DNNs) will also have a critical role in providing insight into the cognitive and neural functioning of biological systems. We also review the strengths of, and opportunities and novel challenges associated with, synthetic data.
翻译
回到顶部