Model Evaluation

A model evaluation task simply runs a trained model on a dataset much like the inference task, but instead of returning the results, it returns a set of metrics about the model's performance on that dataset.

Metrics

The metrics returned are dictated by the type of model that is detected. The algorithm looks for the presence of either ClassifierMixIn, RegressorMixIn or SegmentationMixIn in the model's inheritance hierarchy. Based on the presence of these mixins, the algorithm will determine the type of metrics to return. The RegressorMixIn and SegmentationMixIn mixins are currently only used for tagging purposes and have no configuration options whereas the ClassifierMixIn has logic for determining the type of classification problem which in turn determines the type of metrics to return.

Model Type	Metrics
Binary Classification	`Accuracy`, `Precision`, `Recall`, `F1 Score`, `ROC AUC`, `Brier Loss`
Multiclass Classification	`Accuracy`, `Precision`, `Recall`, `F1 Score`, `ROC AUC`
Multilabel Classification	`Accuracy`, `Precision`, `Recall`, `F1 Score`, `ROC AUC`
Regression	`Mean Absolute Error`, `Mean Squared Error`, `R2 Score`, `Root Mean Squared Error`, `Kolmogorov-Smirnov Test`
Segmentation	`IoU`, `Dice Coefficients`, `Dice Score`

tip

Mixin classes must be specified first in the model's inheritance hierarchy.

Results

As the name implies, the bitfount.ResultsOnly protocol simply returns the results from the model evaluation task. The results are returned as a dictionary of strings to floats. The keys of the dictionary are the names of the metrics and the values are the values of the metrics. By default, the results are not persisted anywhere. If running the protocol via the SDK, this behaviour may be fine because the results are returned to a variable which you can access. However, if running the protocol as part of a task in the app, the results are lost unless you specify a save location for the results. You can specify the save location for the results by setting the save_location argument to the bitfount.ResultsOnly protocol. The available save locations are:

Worker: Save the results to the worker side.
Modeller: Save the results to the modeller side.

Both locations can be chosen to save the results to both the worker and modeller sides.

Example

An example task file for using a model evaluation task is shown below:

pods:
  identifiers:
    - <replace-with-dataset-identifier>

modeller:
  identity_verification_method: key-based

task:
  protocol:
    name: bitfount.ResultsOnly
    arguments:
      save_location:
        - Worker
        - Modeller
  algorithm:
    - name: bitfount.ModelEvaluation
      arguments:
        model:
          bitfount_model:
            username: amin-nejad
            model_ref: HeartDiseaseModel
            model_version: 3
  data_structure:
    select:
      include:
        - Age
        - Gender
        - Chest_Pain_Type
        - Resting_Blood_Pressure
        - Cholesterol
        - Fasting_Blood_Sugar
        - Resting_ECG
        - Max_Heart_Rate
        - Exercise_Induced_Angina
        - ST_Depression
        - ST_Slope
        - Number_of_Major_Vessels
        - Thalassemia
    assign:
      target: Heart_Disease
    data_split:
      args:
        shuffle: true
        test_percentage: 0
        validation_percentage: 100 # 100% of the data is used for the evaluation task
      data_splitter: percentage

Metrics​

Results​

Example​

Metrics

Results

Example