Skip to main content

Model Evaluation

A model evaluation task simply runs a trained model on a dataset much like the inference task, but instead of returning the results, it returns a set of metrics about the model's performance on that dataset.

Metrics

The metrics returned are dictated by the type of model that is detected. The algorithm looks for the presence of either ClassifierMixIn, RegressorMixIn or SegmentationMixIn in the model's inheritance hierarchy. Based on the presence of these mixins, the algorithm will determine the type of metrics to return. The RegressorMixIn and SegmentationMixIn mixins are currently only used for tagging purposes and have no configuration options whereas the ClassifierMixIn has logic for determining the type of classification problem which in turn determines the type of metrics to return.

Model TypeMetrics
Binary ClassificationAccuracy, Precision, Recall, F1 Score, ROC AUC, Brier Loss
Multiclass ClassificationAccuracy, Precision, Recall, F1 Score, ROC AUC
Multilabel ClassificationAccuracy, Precision, Recall, F1 Score, ROC AUC
RegressionMean Absolute Error, Mean Squared Error, R2 Score, Root Mean Squared Error, Kolmogorov-Smirnov Test
SegmentationIoU, Dice Coefficients, Dice Score
tip

Mixin classes must be specified first in the model's inheritance hierarchy.

Results

As the name implies, the bitfount.ResultsOnly protocol simply returns the results from the model evaluation task. The results are returned as a dictionary of strings to floats. The keys of the dictionary are the names of the metrics and the values are the values of the metrics. By default, the results are not persisted anywhere. If running the protocol via the SDK, this behaviour may be fine because the results are returned to a variable which you can access. However, if running the protocol as part of a task in the app, the results are lost unless you specify a save location for the results. You can specify the save location for the results by setting the save_location argument to the bitfount.ResultsOnly protocol. The available save locations are:

  • Worker: Save the results to the worker side.
  • Modeller: Save the results to the modeller side.

Both locations can be chosen to save the results to both the worker and modeller sides.

Example

An example task file for using a model evaluation task is shown below:

pods:
identifiers:
- <replace-with-dataset-identifier>

modeller:
identity_verification_method: key-based

task:
protocol:
name: bitfount.ResultsOnly
arguments:
save_location:
- Worker
- Modeller
algorithm:
- name: bitfount.ModelEvaluation
arguments:
model:
bitfount_model:
username: amin-nejad
model_ref: HeartDiseaseModel
model_version: 3
data_structure:
select:
include:
- Age
- Gender
- Chest_Pain_Type
- Resting_Blood_Pressure
- Cholesterol
- Fasting_Blood_Sugar
- Resting_ECG
- Max_Heart_Rate
- Exercise_Induced_Angina
- ST_Depression
- ST_Slope
- Number_of_Major_Vessels
- Thalassemia
assign:
target: Heart_Disease
data_split:
args:
shuffle: true
test_percentage: 0
validation_percentage: 100 # 100% of the data is used for the evaluation task
data_splitter: percentage