A Brief Survey of Semi-Supervised Learning and Its Application in Natural Language Processing

COMP 599 - Statistical Learning Theory (taught by Profs. Adam Oberman and Prakash Panangaden) at McGill University, 2020

Abstract: We provide a survey of the motivations, key notions, methods, and theoretical results that underpin semi-supervised learning. Since work on learning from labelled and unlabelled data is extensive and spans several decades, this work highlights specific examples of semi-supervised methods and theoretical analyses of them, rather than providing a broad overview of the topic as a whole. We discuss self-training, generative models, discriminative models, and word embeddings. We highlight the relevance of these methods to natural language processing, an example of an area where manual annotation of data is expensive and inter-annotator agreement is often low.