半监督卷积神经网络的词义消歧 | 您所在的位置:网站首页 › about的词义 › 半监督卷积神经网络的词义消歧 |
Abstract:
In order to solve the difficulty of acquiring tagged corpus, a Chinese word sense disambiguation method is proposed on the basis of semi-supervised learning convolutional neural networks (CNN). Firstly, the word, part of speech and semantic category are extracted as discriminative features, which are acquired from 2 word units on the both left and right adjacent to ambiguous word. Word vector tool is used to denote discriminative features as vector. Secondly, tagged corpus is preprocessed to obtain initialized clustering centers and thresholds. At the same time, it is used to train convolutional neural networks. The optimized CNN is applied for determining the semantic categories of ambiguous words in the untagged corpus. Corpus with high confidence that meets threshold conditions is selected into the training corpus. The above process is repeated until the training corpus is no longer expanded. In the last, SemEval-2007: Task#5 is used as the tagged corpus, and the unannotated corpus from Harbin Institute of Technology is used as the untagged corpus. Experimental results show that the proposed method improve disambiguation accuracy of CNN by 3.1%.
|
CopyRight 2018-2019 实验室设备网 版权所有 |