将数据拆分为不同组并计算统计量

2023-05-25 10:37| 来源: 网络整理| 查看: 265

加载患者数据

加载从 100 位患者收集的样本数据。

load patients

将 Gender 和 SelfAssessedHealthStatus 转换为分类数组。

Gender = categorical(Gender); SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus); whos Name Size Bytes Class Attributes Age 100x1 800 double Diastolic 100x1 800 double Gender 100x1 330 categorical Height 100x1 800 double LastName 100x1 11616 cell Location 100x1 14208 cell SelfAssessedHealthStatus 100x1 560 categorical Smoker 100x1 100 logical Systolic 100x1 800 double Weight 100x1 800 double 计算平均体重

使用 Smoker 变量将患者划分为非吸烟者和吸烟者。计算每个组的平均体重。

[G,smoker] = findgroups(Smoker); meanWeight = splitapply(@mean,Weight,G)meanWeight = 2×1 149.9091 161.9412

findgroups 函数会返回 G（从 Smoker 创建的组数目向量）。splitapply 函数会使用 G 将 Weight 分为两个组。splitapply 会将 mean 函数应用于每个组并将平均体重串联到向量中。

findgroups 会返回组标识符向量作为第二个输出参数。组标识符是逻辑值，因为 Smoker 包含逻辑值。第一组中的患者是非吸烟者，第二组中的患者是吸烟者。

smokersmoker = 2x1 logical array 0 1

按吸烟者的性别和状态划分患者体重，并计算平均体重。

G = findgroups(Gender,Smoker); meanWeight = splitapply(@mean,Weight,G)meanWeight = 4×1 130.3250 130.9231 180.0385 181.1429

跨 Gender 和 Smoker 的唯一组合可确定四组患者：女性非吸烟者、女性吸烟者、男性非吸烟者和男性吸烟者。将这四个组及其平均体重汇总在一个表中。

[G,gender,smoker] = findgroups(Gender,Smoker); T = table(gender,smoker,meanWeight)T=4×3 table gender smoker meanWeight ______ ______ __________ Female false 130.32 Female true 130.92 Male false 180.04 Male true 181.14

T.gender 包含分类值，而 T.smoker 包含逻辑值。这些表变量的数据类型分别与 Gender 和 Smoker 的数据类型一致。

为四组患者计算体重指数 (BMI)。定义一个函数，该函数采用 Height 和 Weight 作为其两个输入参数，并计算 BMI。

meanBMIfcn = @(h,w)mean((w ./ (h.^2)) * 703); BMI = splitapply(meanBMIfcn,Height,Weight,G)BMI = 4×1 21.6721 21.6686 26.5775 26.4584 根据各自的报告对患者分组

计算将其健康状态报告为 Poor 或 Fair 的患者百分比。首先，使用 splitapply 统计每个组中的患者数：女性非吸烟者、女性吸烟者、男性非吸烟者和男性吸烟者。然后，使用 S 和 G 上的逻辑索引，仅统计其健康状况报告为 Poor 或 Fair 的那些患者。根据这两组计数，计算每个组的百分比。

[G,gender,smoker] = findgroups(Gender,Smoker); S = SelfAssessedHealthStatus; I = ismember(S,{'Poor','Fair'}); numPatients = splitapply(@numel,S,G); numPF = splitapply(@numel,S(I),G(I)); numPF./numPatientsans = 4×1 0.2500 0.3846 0.3077 0.1429

比较健康状况报告为 Poor 或 Fair 的那些患者在 Diastolic 读数中的标准差，以及健康状况报告为 Good 或 Excellent 的那些患者的相应标准差。

stdDiastolicPF = splitapply(@std,Diastolic(I),G(I)); stdDiastolicGE = splitapply(@std,Diastolic(~I),G(~I));

在表中收集结果。在这些患者中，健康状况报告为 Poor 或 Fair 的女性非吸烟者的血压读数差别最大。

T = table(gender,smoker,numPatients,numPF,stdDiastolicPF,stdDiastolicGE,BMI)T=4×7 table gender smoker numPatients numPF stdDiastolicPF stdDiastolicGE BMI ______ ______ ___________ _____ ______________ ______________ ______ Female false 40 10 6.8872 3.9012 21.672 Female true 13 5 5.4129 5.0409 21.669 Male false 26 8 4.2678 4.8159 26.578 Male true 21 3 5.6862 5.258 26.458

【本文地址】

公司简介

联系我们