Dual Representations of Words with Asymmetric Contexts

Undergraduate Summer Research Report, 2019

Abstract: Typically, pre-trained word embeddings are trained with cooccurrence statistics from symmetric context windows around a focal word. Furthermore, it is standard practice to keep only a single representation for each word after training even though separate context vectors are trained. In this work, we show how the use of two vector representations per word allows for the modelling of asymmetric relationships between words, such as ordering and dependency. We also investigate the necessity of training two vector representations of words during the training process.