林海onrush (2023-12-30 00:06):
#paper,Using sequences of life-events to predict human lives. Nat Comput Sci (2023). Lives,https://doi.org/10.1038/s43588-023-00573-5,大语言模型可以精准算命了吗?是的!发表于Nature Computational Science的论文提出预测人生走向的模型,用与语言结构相似的方式来表示人类生活,将一系列人类行为事件构建为生命序列。该论文提出了一个名为life2vec的深度学习模型,用于预测人类生活轨迹的各种结果,比如早逝风险和个性特质。该模型基于Transformer架构,可以学习表示人生事件序列的稠密向量表示。研究使用了丹麦全国范围内约600万居民近10年的详细劳动力和医疗数据,构建了生活事件序列。L2V模型的Accuracy达到了78.8%(0.788 [0.782, 0.794])。 该模型包含三个组件:嵌入层、编码器和特定任务的解码器。模型首先通过masked language modeling任务和sequence ordering预测任务进行预训练,学习事件表示和序列结构。之后进行微调,通过早逝预测和个性特质预测等下游任务学习整个生活轨迹的向量表示。结果显示,该模型能够准确预测各种不同领域的结果,在早逝预测任务上明显优于当前最先进的方法。 研究同时分析了模型学习的事件表示空间和个体表示空间,发现它们具有明显的结构,能够体现事件之间的语义关联。该研究也证明了Transformer模型和大规模数据集可用于预测和理解个体生活轨迹,为社会科学和医疗健康领域的新研究打开了新的可能性。需要注意的是,该模型目前只用于研究目的,实际应用中有许多伦理考量需要谨慎对待。那么问题来了,还有什么是大模型所不能的呢。
Using sequences of life-events to predict human lives
翻译
Abstract:
Here we represent human lives in a way that shares structural similarity to language, and we exploit this similarity to adapt natural language processing techniques to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on a comprehensive registry dataset, which is available for Denmark across several years, and that includes information about life-events related to health, education, occupation, income, address and working hours, recorded with day-to-day resolution. We create embeddings of life-events in a single vector space, showing that this embedding space is robust and highly structured. Our models allow us to predict diverse outcomes ranging from early mortality to personality nuances, outperforming state-of-the-art models by a wide margin. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to discover potential mechanisms that impact life outcomes as well as the associated possibilities for personalized interventions.
翻译
回到顶部