On the relation between K-L divergence and transfer learning performance on causality extraction tasks

因果关系（物理学）分歧（语言学）关系（数据库）萃取（化学）学习迁移关系抽取计算机科学认知心理学心理学计量经济学人工智能数学化学色谱法数据挖掘物理哲学语言学量子力学

作者

Seethalakshmi Gopalakrishnan,Victor Zitian Chen,Wenwen Dou,Wlodek Zadrozny

标识

DOI：10.1016/j.nlp.2024.100055

摘要

The problem of extracting causal relations from text remains a challenging task, even in the age of Large Language Models (LLMs). A key factor that impedes the progress of this research is the availability of the annotated data and the lack of common labeling methods. We investigate the applicability of transfer learning (domain adaptation) to address these impediments in experiments with three publicly available datasets: FinCausal, SCITE, and Organizational. We perform pairwise transfer experiments between the datasets using DistilBERT, BERT, and SpanBERT (variants of BERT) and measure the performance of the resulting models. To understand the relationship between data sets and performance, we measure the differences between vocabulary distributions in the datasets using four methods: Kullback–Leibler (K-L) divergence, Wasserstein metric, Maximum Mean Discrepancy, and Kolmogorov–Smirnov test. We also estimate the predictive capability of each method using linear regression. We record the predictive values of each measure. Our results show that K-L divergence between the distribution of the vocabularies in the data predicts the performance of the transfer learning with R2 = 0.0746. Surprisingly, the Wasserstein distance predictive value is low (R2=0.52912), and the same for the Kolmogorov–Smirnov test (R2 =0.40025979). This is confirmed in a series of experiments. For example, with variants of BERT, we observe an almost a 29% to 32% increase in the macro-average F1-score, when the gap between the training and test distributions is small, according to the K-L divergence — the best-performing predictor on this task. We also discuss these results in the context of the sub-par performance of some large language models on causality extraction tasks. Finally, we report the results of transfer learning informed by K-L divergence; namely, we show that there is a 12 to 63% increase in the performance when a small portion of the test data is added to the training data. This shows that corpus expansion and n-shot learning benefit, when the process of choosing examples maximizes their information content, according to the K-L divergence.

求助该文献

最长约 10秒，即可获得该文献文件

On the relation between K-L divergence and transfer learning performance on causality extraction tasks

今日热心研友