ceph系列三、可用空间计算

2024-07-03 01:26| 来源: 网络整理| 查看: 265

在日常使用ceph过程中，我们常用ceph -s查看集群的状态和基本容量，也可以使用ceph df精确查看ceph的容量状态，那么两者有什么区别呢？随着集群存储文件的增多，为什么两者的呈现的可用容量会不一致，应该以那个为准。

一、ceph df 获取ceph pool信息

因为文件默认都存在data的pool，所以我们获取pool的相关信息。从结果可以看到pool只有2备份。这里因为是测试使用，生产环境建议3备份，具有更高的可靠性。

[root@test-01 ~]# ceph osd dump | grep pool | grep buckets.data pool 3 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_ruleset 5 object_hash rjenkins pg_num 256 pgp_num 256 last_change 194 flags hashpspool stripe_width 0 获取集群容量

从大的分类具有GLOBALS和POOLS，顾名思义GLOBALS代表全局的信息：SIZE（全局容量），AVAIL（全局可用容量），RAW USED（已使用容量），%RAW USED （使用容量占比）；POOLS就是每个pool的使用情况USED（已使用容量），%USED（使用占比），MAX AVAIL（最大可使用容量），OBJECTS（文件个数）。

[root@test-01 ~]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 299G 273G 27337M 8.90 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root 11 1588 0 122G 4 default.rgw.control 12 0 0 122G 8 default.rgw.data.root 13 77090 0 122G 222 default.rgw.gc 14 0 0 122G 32 default.rgw.log 15 0 0 122G 127 default.rgw.intent-log 16 0 0 122G 0 default.rgw.usage 17 0 0 122G 24 default.rgw.users.keys 18 3602 0 122G 122 default.rgw.users.email 19 0 0 122G 0 default.rgw.users.swift 20 0 0 122G 0 default.rgw.users.uid 21 49345 0 122G 209 default.rgw.buckets.index 22 0 0 122G 111 default.rgw.buckets.data 23 206G 62.78 122G 4643 default.rgw.meta 24 0 0 122G 0 default.rgw.buckets.non-ec 25 0 0 122G 358 rbd-01 26 0 0 83743M 0

从上面的信息中，你一定发现有一些问题。比如：AVAIL的值和MAX AVAIL为什么不相等？每个pool的MAX AVAIL为什么都一样？

我们先回答第二个问题，每个pool的MAX AVAIL为什么都一样，但是他们的总和又远大于AVAIL。这是因为ceph的每个pool都是共用相同的可用空间。 MAX AVAIL*副本数就是最终占用的集群磁盘空间，所以在ceph集群数据较少的时候 MAX AVAIL*副本数 ≈AVAIL

AVAIL

通过查看文档或者代码可以很清晰的知道GLOBAL的数值是基于底层文件系统统计而来，比如ceph的Filestore最终调用的就是::statfs()这个系统调用来获取信息的。这里的basedir.c_str()就是data目录。所以RAW SIZE计算的就是将所有osd 数据目录的磁盘使用量加起来，同理AVAIL就是磁盘可用容量的总和。

int FileStore::statfs(struct statfs *buf) { if (::statfs(basedir.c_str(), buf) < 0) { int r = -errno; assert(!m_filestore_fail_eio || r != -EIO); assert(r != -ENOENT); return r; } return 0; } MAX AVAIL

MAX AVAIL的计算比较复杂，我们通过源码分析，重点关注get_rule_weight_osd_map函数的pmap值，其实就是map(osd_id,weight/sum)的值，其他代码有兴趣的可以深入研究。

int CrushWrapper::get_rule_weight_osd_map(unsigned ruleno, map *pmap) { if (ruleno >= crush->max_rules) return -ENOENT; if (crush->rules[ruleno] == NULL) return -ENOENT; crush_rule *rule = crush->rules[ruleno]; // build a weight map for each TAKE in the rule, and then merge them for (unsigned i=0; ilen; ++i) { map m; float sum = 0; if (rule->steps[i].op == CRUSH_RULE_TAKE) { int n = rule->steps[i].arg1; if (n >= 0) { m[n] = 1.0; sum = 1.0; } else { list q; q.push_back(n); //breadth first iterate the OSD tree while (!q.empty()) { int bno = q.front(); q.pop_front(); crush_bucket *b = crush->buckets[-1-bno]; assert(b); for (unsigned j=0; jsize; ++j) { int item_id = b->items[j]; if (item_id >= 0) { //it's an OSD float w = crush_get_bucket_item_weight(b, j); m[item_id] = w; sum += w; } else { //not an OSD, expand the child later q.push_back(item_id); } } } } } for (map::iterator p = m.begin(); p != m.end(); ++p) { map::iterator q = pmap->find(p->first); if (q == pmap->end()) { (*pmap)[p->first] = p->second / sum; } else { q->second += p->second / sum; } } } return 0; }

通过代码我们知道get_rule_avail通过调用osdmap.crush->get_rule_weight_osd_map来实现avail的计算。接下来我们观察get_rule_avail函数，函数的返回值就是MAX AVAIL的值。我们发现proj值就是osd的磁盘可用容量减去osd的mon_osd_full_ratio值，得出实际可用容量后除以wm（就是get_rule_weight_osd_map中的pmap），然后选择最小的值赋值给min（也就是MAX AVAIL）

int64_t PGMonitor::get_rule_avail(OSDMap& osdmap, int ruleno) const { map wm; int r = osdmap.crush->get_rule_weight_osd_map(ruleno, &wm); if (r < 0) { return r; } if (wm.empty()) { return 0; } int64_t min = -1; for (map::iterator p = wm.begin(); p != wm.end(); ++p) { ceph::unordered_map::const_iterator osd_info = pg_map.osd_stat.find(p->first); if (osd_info != pg_map.osd_stat.end()) { if (osd_info->second.kb == 0 || p->second == 0) { // osd must be out, hence its stats have been zeroed // (unless we somehow managed to have a disk with size 0...) // // (p->second == 0), if osd weight is 0, no need to // calculate proj below. continue; } double unusable = (double)osd_info->second.kb * (1.0 - g_conf->mon_osd_full_ratio); double avail = MAX(0.0, (double)osd_info->second.kb_avail - unusable); avail *= 1024.0; int64_t proj = (int64_t)(avail / (double)p->second); if (min < 0 || proj < min) { min = proj; } } else { dout(0)

【本文地址】

公司简介

联系我们