半监督卷积神经网络的词义消歧 您所在的位置:网站首页 about的词义 半监督卷积神经网络的词义消歧

半监督卷积神经网络的词义消歧

2024-05-19 14:46| 来源: 网络整理| 查看: 265

Abstract:

In order to solve the difficulty of acquiring tagged corpus, a Chinese word sense disambiguation method is proposed on the basis of semi-supervised learning convolutional neural networks (CNN). Firstly, the word, part of speech and semantic category are extracted as discriminative features, which are acquired from 2 word units on the both left and right adjacent to ambiguous word. Word vector tool is used to denote discriminative features as vector. Secondly, tagged corpus is preprocessed to obtain initialized clustering centers and thresholds. At the same time, it is used to train convolutional neural networks. The optimized CNN is applied for determining the semantic categories of ambiguous words in the untagged corpus. Corpus with high confidence that meets threshold conditions is selected into the training corpus. The above process is repeated until the training corpus is no longer expanded. In the last, SemEval-2007: Task#5 is used as the tagged corpus, and the unannotated corpus from Harbin Institute of Technology is used as the untagged corpus. Experimental results show that the proposed method improve disambiguation accuracy of CNN by 3.1%.

 



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有