Skip to main content

models

Backend-agnostic models that have no extra requirements.

info

Models defined here cannot be trained in a federated manner.

Classes

LogisticRegressionClassifier

class LogisticRegressionClassifier(    datastructure: DataStructure,    schema: BitfountSchema,    inverse_regularisation: Optional[float] = None,    max_steps: Optional[int] = None,    model_type: Optional[str] = None,    penalty: Optional[str] = None,    early_stopping_tolerance: Optional[float] = None,    verbose: Optional[int] = None,    multilabel: bool = False,    param_clipping: Optional[Dict[str, int]] = None,    seed: Optional[int] = None,):

Wrapper around sklearn.linear_model.LogisticRegression model.

For more details on the parameters, go to the scikit-learn documentation.

Arguments

  • datastructure: DataStructure to be passed to the model when initialised
  • early_stopping_tolerance: Tolerance for early stopping. Defaults to 1e-05.
  • inverse_regularisation: Inverse regularisation parameter. Defaults to 0.0001.
  • max_steps: Maximum number of steps to take. Defaults to 10000.
  • model_type: Type of solver to use. Defaults to "lbfgs".
  • multilabel: Whether the problem is a multi-label problem. i.e. each datapoint belongs to multiple classes
  • param_clipping: Arguments for clipping for BatchNorm parameters. Used for federated models with secure aggregation. It should contain the SecureShare variables and the number of workers in a dictionary, e.g. {"prime_q":13, "precision": 10**3,"num_workers":2}
  • penalty: Penalty to use. Defaults to "l2".
  • schema: The BitfountSchema object associated with the datasource on which the model will be trained on.
  • seed: Random number seed. Used for setting random seed for all libraries. Defaults to None.
  • verbose: Verbosity level. Defaults to 0.

Attributes

  • early_stopping_tolerance: Tolerance for early stopping.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • inverse_regularisation: Inverse regularisation parameter.
  • max_steps: Maximum number of steps to take.
  • model_type: Type of solver to use.
  • multilabel: Whether the problem is a multi-label problem
  • n_classes: Number of classes in the problem
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • penalty: Penalty to use.
  • verbose: Verbosity level.

Ancestors

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

Methods


deserialize

def deserialize(self, content: Union[str, os.PathLike, bytes])> None:

Deserialize model.

danger

This should not be used on a model file that has been received across a trust boundary due to underlying use of pickle.

Arguments

  • content: Byte strem or path to file containing serialized model.

evaluate

def evaluate(    self, test_dl: Optional[BitfountDataLoader] = None, *args: Any, **kwargs: Any,)> Tuple[numpy.ndarray, numpy.ndarray]:

Perform inference on test set and return predictions and targets.

Arguments

  • test_dl: Optional BitfountDataLoader object containing test data. If this is not provided, the test set from the BaseSource used to train the model is used if present.

Returns A tuple of numpy arrays containing the predicted and actual values.

Raises

  • ValueError: If there is no test data to evaluate the model on

fit

def fit(self, data: Optional[BaseSource] = None, *args: Any, **kwargs: Any)> None:

Trains a model using the training set provided by data.

info

The validation set in data is not used when training this model.

Arguments

  • data: BaseSource object containing training data.

initialise_model

def initialise_model(    self, data: Optional[BaseSource] = None, context: Optional[TaskContext] = None,)> None:

Can be implemented to initialise model if necessary.

This is automatically called by the fit() method if necessary.

Arguments

  • data: The data used for model training.
  • context: Indicates if the model is running as a modeller or worker. If None, there is no difference between modeller and worker. Defaults to None.

predict

def predict(self, *args: Any, **kwargs: Any)> List[numpy.ndarray]:

Returns model predictions. Not implemented yet.

serialize

def serialize(self, filename: Union[str, os.PathLike])> None:

Serialize model to file with provided filename.

Arguments

  • filename: Path to file to save serialized model.

set_number_of_classes

def set_number_of_classes(self, schema: TableSchema)> None:

Inherited from:

ClassifierMixIn.set_number_of_classes :

Sets the target number of classes for the classifier.

If the data is a multi-label problem, the number of classes is set to the number of target columns as specified in the DataStructure. Otherwise, the number of classes is set to the number of unique values in the target column as specified in the BitfountSchema. The value is stored in the n_classes attribute.

RegBoostRegressor

class RegBoostRegressor(    datastructure: DataStructure,    schema: BitfountSchema,    learning_rate: float = 0.1,    max_depth: int = 10,    min_data_points_per_node: int = 5,    stepwise_regression: "Literal['forward', 'backward']" = 'forward',    stepwise_regression_threshold: float = 0.15,    seed: Optional[int] = None,    param_clipping: Optional[Dict[str, int]] = None,):

Gradient Boosted Linear Regression Model.

Implementation of "RegBoost: a gradient boosted multivariate regression algorithm" by Li et al. (2020). For more details, see the paper: https://www.emerald.com/insight/content/doi/10.1108/IJCS-10-2019-0029/full/html

Arguments

  • datastructure: DataStructure to be passed to the model when initialised
  • learning_rate: Learning rate for gradient boosting. Defaults to 0.1.
  • max_depth: Maximum depth of tree (number of nodes between root and leaf). A depth of 0 is equivalent to a single Linear Regression model. Defaults to 10.
  • min_data_points_per_node: Minimum number of data points required to split a node. Defaults to 5.
  • param_clipping: Arguments for clipping for BatchNorm parameters. Used for federated models with secure aggregation. It should contain the SecureShare variables and the number of workers in a dictionary, e.g. {"prime_q":13, "precision": 10**3,"num_workers":2}. Defaults to None.
  • schema: The BitfountSchema object associated with the datasource on which the model will be trained on.
  • seed: Random number seed. Used for setting random seed for all libraries. Defaults to None.
  • stepwise_regression: Whether stepwise regression should go "forward" or "backward". Defaults to "forward".
  • stepwise_regression_threshold: Threshold for stepwise regression. Defaults to 0.15.

Attributes

  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • learning_rate: Learning rate for gradient boosting.
  • max_depth: Maximum depth of tree (number of nodes between root and leaf).
  • min_data_points_per_node: Minimum number of data points required to split a node.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • stepwise_regression: Whether stepwise regression should go "forward" or "backward".
  • stepwise_regression_threshold: Threshold for stepwise regression.

Ancestors

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

Methods


deserialize

def deserialize(self, content: Union[str, os.PathLike, bytes])> None:

Deserialize model.

danger

This should not be used on a model file that has been received across a trust boundary due to underlying use of pickle.

Arguments

  • content: Byte stream or path to file containing serialized model.

evaluate

def evaluate(    self, test_dl: Optional[BitfountDataLoader] = None, *args: Any, **kwargs: Any,)> Tuple[numpy.ndarray, numpy.ndarray]:

Perform inference on test set and return predictions and targets.

Arguments

  • test_dl: Optional BitfountDataLoader object containing test data. If this is not provided, the test set from the BaseSource used to train the model is used if present.

Returns A tuple of numpy arrays containing the predicted and actual values.

Raises

  • ValueError: If there is no test data to evaluate the model on

fit

def fit(    self,    data: Optional[BaseSource] = None,    metrics: Optional[Mapping[str, Metric]] = None,    *args: Any,    **kwargs: Any,)> None:

Trains a model using the training set provided by the BaseSource object.

initialise_model

def initialise_model(    self, data: Optional[BaseSource] = None, context: Optional[TaskContext] = None,)> None:

Can be implemented to initialise model if necessary.

This is automatically called by the fit() method if necessary.

Arguments

  • data: The data used for model training.
  • context: Indicates if the model is running as a modeller or worker. If None, there is no difference between modeller and worker. Defaults to None.

predict

def predict(self, *args: Any, **kwargs: Any)> List[numpy.ndarray]:

Returns model predictions. Not implemented yet.

serialize

def serialize(self, filename: Union[str, os.PathLike])> None:

Serialize model to file with provided filename.

Arguments

  • filename: Path to file to save serialized model.