MongoDB可以在基本多语言平面之外存储和操纵UTF | 您所在的位置:网站首页 › mongodb多实例 › MongoDB可以在基本多语言平面之外存储和操纵UTF |
百度翻译此文
有道翻译此文
问题描述
In MongoDB 2.0.6, when attempting to store documents or query documents that contain string fields, where the value of a string include characters outside the BMP, I get a raft of errors like: "Not proper UTF-16: 55357", or "buffer too small" What settings, changes, or recommendations are there to permit storage and query of multi-lingual strings in Mongo, particularly ones that include these characters above 0xFFFF? Thanks. 推荐答案There are several issues here: 1) Please be aware that MongoDB stores all documents using the BSON format. Also note that the BSON spec referes to a UTF-8 string encoding, not a UTF-16 encoding. Ref: http://bsonspec.org/#/specification 2) All of the drivers, including the JavaScript driver in the mongo shell, should properly handle strings that are encoded as UTF-8. (If they don't then it's a bug!) Many of the drivers happen to handle UTF-16 properly, as well, although as far as I know, UTF-16 isn't officially supported. 3) When I tested this with the Python driver, MongoDB could successfully load and return a string value that contained a broken UTF-16 code pair. However, I couldn't load a broken code pair using the mongo shell, nor could I store a string containing a broken code pair into a JavaScript variable in the shell. 4) mapReduce() runs correctly on string data using a correct UTF-16 code pair, but it will generate an error when trying to run mapReduce() on string data containing a broken code pair. It appears that the mapReduce() is failing when MongoDB is trying to convert the BSON to a JavaScript variable for use by the JavaScript engine. 5) I've filed Jira issue SERVER-6747 for this issue. Feel free to follow it and vote it up. |
CopyRight 2018-2019 实验室设备网 版权所有 |