SiaKey: A Method for Improving Few-shot Learning with Clinical Domain Information


Zhuochun Li; Khushboo Thaker; Daqing He

Abstract


The Few-Shot Learning (FSL) method is enhancing the field of Natural Language Processing (NLP) by requiring only a small amount of labeled data to achieve significant improvements. The supervised models usually need a huge amount of annotated data to train and are computationally expensive. However, the annotation process is difficult and time-consuming for clinical data that exists in large-scale electronic health records (EHRs) and online posts, where only specialists with professional clinical knowledge could annotate them manually. On the other hand, fine-tuning Pretrained Language Model (PLM) always has poor performance on few-shot training data. Thus, we have introduced a novel FSL technique named SiaKey, which utilizes Siamese Networks, and integrates Keyphrases Extraction and Domain Knowledge. The task of post-classification is challenging since online posts typically contain a greater amount of irrelevant information compared to traditional EHRs. We tested Siakey including 5, 10, 15, and 20-shot learning on health-related online post-classification tasks. The results of our experiments demonstrate the effectiveness of our Siakey in capturing text features, and indicate superior performance compared to BioBERT on similar FSL tasks. This paper introduces a novel and efficient approach to automatically identify patients' disease trajectories based on their clinical descriptions and provides inspiration for other related NLP tasks.

Keywords: Health Informatics; Social Media; Few-shot Learning

Links

[Full text PDF][Bibtex]