BIOMOD | 您所在的位置:网站首页 › ensemble平台 › BIOMOD |
Models sub-selection (models.chosen)
Applying get_built_models function to the bm.mod object gives the names of the single models created with the BIOMOD_Modeling function. The models.chosen argument can take either a sub-selection of these single model names, or the all default value, to decide which single models will be used for the ensemble model building. Models assembly rules (em.by) Single models built with the BIOMOD_Modeling function can be combined in 5 different ways to obtain ensemble models : PA+run : each combination of pseudo-absence and repetition datasets is done, merging algorithms together PA+algo : each combination of pseudo-absence and algorithm datasets is done, merging repetitions together PA : pseudo-absence datasets are considered individually, merging algorithms and repetitions together algo : algorithm datasets are considered individually, merging pseudo-absence and repetitions together all : all models are combined into one Hence, depending on the chosen method, the number of ensemble models built will vary. Be aware that if no evaluation data was given to the BIOMOD_FormatingData function, some ensemble model evaluations may be biased due to difference in data used for single model evaluations. Be aware that all of these combinations are allowed, but some may not make sense depending mainly on how pseudo-absence datasets have been built and whether all of them have been used for all single models or not (see PA.nb.absences and models.pa parameters in BIOMOD_FormatingData and BIOMOD_Modeling functions respectively). Evaluation metrics metric.select : the selected metrics must be chosen among the ones used within the BIOMOD_Modeling function to build the model.output object, unless metric.select = 'user.defined' and therefore values will be provided through the metric.select.table parameter. In the case of the selection of several metrics, they will be used at different steps of the ensemble modeling function : remove low quality single models, having a score lower than metric.select.thresh perform the binary transformation needed if 'EMca' was given to argument em.algo weight models if 'EMwmean' was given to argument em.algo metric.select.thresh : as many values as evaluation metrics selected with the metric.select parameter, and defining the corresponding quality thresholds below which the single models will be excluded from the ensemble model building. metric.select.table : a data.frame must be given if metric.select = 'user.defined' to allow the use of evaluation metrics other than those calculated within biomod2. The data.frame must contain as many columns as models.chosen with matching names, and as many rows as evaluation metrics to be used. The number of rows must match the length of the metric.select.thresh parameter. The values contained in the data.frame will be compared to those defined in metric.select.thresh to remove low quality single models from the ensemble model building. metric.select.dataset : a character determining the dataset which evaluation metric should be used to filter and/or weigh the ensemble models. Should be among evaluation, validation or calibration. By default BIOMOD_EnsembleModeling will use the validation dataset unless no validation is available in which case calibration dataset are used. metric.eval : the selected metrics will be used to validate/evaluate the ensemble models built Ensemble-models algorithms The set of models to be calibrated on the data. 6 modeling techniques are currently available : EMmean : Mean of probabilities over the selected models. Old name: prob.mean EMmedian : Median of probabilities over the selected models The median is less sensitive to outliers than the mean, however it requires more computation time and memory as it loads all predictions (on the contrary to the mean or the weighted mean). Old name: prob.median EMcv : Coefficient of variation (sd / mean) of probabilities over the selected models This model is not scaled. It will be evaluated like all other ensemble models although its interpretation will be obviously different. CV is a measure of uncertainty rather a measure of probability of occurrence. If the CV gets a high evaluation score, it means that the uncertainty is high where the species is observed (which might not be a good feature of the model). The lower is the score, the better are the models. CV is a nice complement to the mean probability. Old name: prob.cv EMci & EMci.alpha : Confidence interval around the mean of probabilities of the selected models It is also a nice complement to the mean probability. It creates 2 ensemble models : LOWER : there is less than 100 * EMci.alpha / 2 % of chance to get probabilities lower than the given ones UPPER : there is less than 100 * EMci.alpha / 2 % of chance to get probabilities upper than the given ones These intervals are calculated with the following function : $$I_c = [ \bar{x} - \frac{t_\alpha sd }{ \sqrt{n} }; \bar{x} + \frac{t_\alpha sd }{ \sqrt{n} }]$$ Old parameter name: prob.ci & prob.ci.alpha EMca : Probabilities from the selected models are first transformed into binary data according to the thresholds defined when building the model.output object with the BIOMOD_Modeling function, maximizing the evaluation metric score over the testing dataset. The committee averaging score is obtained by taking the average of these binary predictions. It is built on the analogy of a simple vote : each single model votes for the species being either present (1) or absent (0) the sum of 1 is then divided by the number of single models voting The interesting feature of this measure is that it gives both a prediction and a measure of uncertainty. When the prediction is close to 0 or 1, it means that all models agree to predict 0 or 1 respectively. When the prediction is around 0.5, it means that half the models predict 1 and the other half 0. Old parameter name: committee.averaging EMwmean & EMwmean.decay : Probabilities from the selected models are weighted according to their evaluation scores obtained when building the model.output object with the BIOMOD_Modeling function (better a model is, more importance it has in the ensemble) and summed. Old parameter name: prob.mean.weight & prob.mean.weight.decay The EMwmean.decay is the ratio between a weight and the next or previous one. The formula is : W = W(-1) * EMwmean.decay. For example, with the value of 1.6 and 4 weights wanted, the relative importance of the weights will be 1/1.6/2.56(=1.6*1.6)/4.096(=2.56*1.6) from the weakest to the strongest, and gives 0.11/0.17/0.275/0.445 considering that the sum of the weights is equal to one. The lower the EMwmean.decay, the smoother the differences between the weights enhancing a weak discrimination between models. If EMwmean.decay = 'proportional', the weights are assigned to each model proportionally to their evaluation scores. The discrimination is fairer than using the decay method where close scores can have strongly diverging weights, while the proportional method would assign them similar weights. It is also possible to define the EMwmean.decay parameter as a function that will be applied to single models scores and transform them into weights. For example, if EMwmean.decay = function(x) {x^2}, the squared of evaluation score of each model will be used to weight the models predictions.
|
CopyRight 2018-2019 实验室设备网 版权所有 |