models
Backend-agnostic models that have no extra requirements.
Models defined here cannot be trained in a federated manner.
Classes
LogisticRegressionClassifier
class LogisticRegressionClassifier( datastructure: DataStructure, schema: BitfountSchema, inverse_regularisation: Optional[float] = None, max_steps: Optional[int] = None, model_type: Optional[str] = None, penalty: Optional[str] = None, early_stopping_tolerance: Optional[float] = None, verbose: Optional[int] = None, multilabel: bool = False, param_clipping: Optional[Dict[str, int]] = None, seed: Optional[int] = None,):
Wrapper around sklearn.linear_model.LogisticRegression
model.
For more details on the parameters, go to the scikit-learn documentation.
Arguments
datastructure
:DataStructure
to be passed to the model when initialisedearly_stopping_tolerance
: Tolerance for early stopping. Defaults to 1e-05.inverse_regularisation
: Inverse regularisation parameter. Defaults to 0.0001.max_steps
: Maximum number of steps to take. Defaults to 10000.model_type
: Type of solver to use. Defaults to "lbfgs".multilabel
: Whether the problem is a multi-label problem. i.e. each datapoint belongs to multiple classesparam_clipping
: Arguments for clipping for BatchNorm parameters. Used for federated models with secure aggregation. It should contain the SecureShare variables and the number of workers in a dictionary, e.g. {"prime_q":13, "precision": 10**3,"num_workers":2}penalty
: Penalty to use. Defaults to "l2".schema
: TheBitfountSchema
object associated with the datasource on which the model will be trained on.seed
: Random number seed. Used for setting random seed for all libraries. Defaults to None.verbose
: Verbosity level. Defaults to 0.
Attributes
early_stopping_tolerance
: Tolerance for early stopping.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).inverse_regularisation
: Inverse regularisation parameter.max_steps
: Maximum number of steps to take.model_type
: Type of solver to use.multilabel
: Whether the problem is a multi-label problemn_classes
: Number of classes in the problemnested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})penalty
: Penalty to use.verbose
: Verbosity level.
Ancestors
- ClassifierMixIn
- bitfount.models.base_models._BaseModel
- bitfount.models.base_models._BaseModelRegistryMixIn
- bitfount.types._BaseSerializableObjectMixIn
- abc.ABC
- typing.Generic
Variables
- static
datastructure : DataStructure
- set in _BaseModel
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
- static
schema : BitfountSchema
- set in _BaseModel
Methods
deserialize
def deserialize(self, content: Union[str, os.PathLike, bytes]) ‑> None:
Deserialize model.
This should not be used on a model file that has been received across a
trust boundary due to underlying use of pickle
.
Arguments
content
: Byte strem or path to file containing serialized model.
evaluate
def evaluate( self, test_dl: Optional[BitfountDataLoader] = None, *args: Any, **kwargs: Any,) ‑> Tuple[numpy.ndarray, numpy.ndarray]:
Perform inference on test set and return predictions and targets.
Arguments
test_dl
: OptionalBitfountDataLoader
object containing test data. If this is not provided, the test set from theBaseSource
used to train the model is used if present.
Returns A tuple of numpy arrays containing the predicted and actual values.
Raises
ValueError
: If there is no test data to evaluate the model on
fit
def fit(self, data: Optional[BaseSource] = None, *args: Any, **kwargs: Any) ‑> None:
Trains a model using the training set provided by data
.
The validation set in data
is not used when training this model.
Arguments
data
:BaseSource
object containing training data.
initialise_model
def initialise_model( self, data: Optional[BaseSource] = None, context: Optional[TaskContext] = None,) ‑> None:
Can be implemented to initialise model if necessary.
This is automatically called by the fit()
method if necessary.
Arguments
data
: The data used for model training.context
: Indicates if the model is running as a modeller or worker. If None, there is no difference between modeller and worker. Defaults to None.
predict
def predict(self, *args: Any, **kwargs: Any) ‑> List[numpy.ndarray]:
Returns model predictions. Not implemented yet.
serialize
def serialize(self, filename: Union[str, os.PathLike]) ‑> None:
Serialize model to file with provided filename
.
Arguments
filename
: Path to file to save serialized model.
set_number_of_classes
def set_number_of_classes(self, schema: TableSchema) ‑> None:
Inherited from:
ClassifierMixIn.set_number_of_classes :
Sets the target number of classes for the classifier.
If the data is a multi-label problem, the number of classes is set to the number
of target columns as specified in the DataStructure
. Otherwise, the number of
classes is set to the number of unique values in the target column as specified
in the BitfountSchema
. The value is stored in the n_classes
attribute.
RegBoostRegressor
class RegBoostRegressor( datastructure: DataStructure, schema: BitfountSchema, learning_rate: float = 0.1, max_depth: int = 10, min_data_points_per_node: int = 5, stepwise_regression: "Literal[('forward', 'backward')]" = 'forward', stepwise_regression_threshold: float = 0.15, seed: Optional[int] = None, param_clipping: Optional[Dict[str, int]] = None,):
Gradient Boosted Linear Regression Model.
Implementation of "RegBoost: a gradient boosted multivariate regression algorithm" by Li et al. (2020). For more details, see the paper: https://www.emerald.com/insight/content/doi/10.1108/IJCS-10-2019-0029/full/html
Arguments
datastructure
:DataStructure
to be passed to the model when initialisedlearning_rate
: Learning rate for gradient boosting. Defaults to 0.1.max_depth
: Maximum depth of tree (number of nodes between root and leaf). A depth of 0 is equivalent to a single Linear Regression model. Defaults to 10.min_data_points_per_node
: Minimum number of data points required to split a node. Defaults to 5.param_clipping
: Arguments for clipping for BatchNorm parameters. Used for federated models with secure aggregation. It should contain the SecureShare variables and the number of workers in a dictionary, e.g. {"prime_q":13, "precision": 10**3,"num_workers":2}. Defaults to None.schema
: TheBitfountSchema
object associated with the datasource on which the model will be trained on.seed
: Random number seed. Used for setting random seed for all libraries. Defaults to None.stepwise_regression
: Whether stepwise regression should go "forward" or "backward". Defaults to "forward".stepwise_regression_threshold
: Threshold for stepwise regression. Defaults to 0.15.
Attributes
fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).learning_rate
: Learning rate for gradient boosting.max_depth
: Maximum depth of tree (number of nodes between root and leaf).min_data_points_per_node
: Minimum number of data points required to split a node.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})stepwise_regression
: Whether stepwise regression should go "forward" or "backward".stepwise_regression_threshold
: Threshold for stepwise regression.
Ancestors
- RegressorMixIn
- bitfount.models.base_models._BaseModel
- bitfount.models.base_models._BaseModelRegistryMixIn
- bitfount.types._BaseSerializableObjectMixIn
- abc.ABC
- typing.Generic
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
Methods
deserialize
def deserialize(self, content: Union[str, os.PathLike, bytes]) ‑> None:
Deserialize model.
This should not be used on a model file that has been received across a
trust boundary due to underlying use of pickle
.
Arguments
content
: Byte stream or path to file containing serialized model.
evaluate
def evaluate( self, test_dl: Optional[BitfountDataLoader] = None, *args: Any, **kwargs: Any,) ‑> Tuple[numpy.ndarray, numpy.ndarray]:
Perform inference on test set and return predictions and targets.
Arguments
test_dl
: OptionalBitfountDataLoader
object containing test data. If this is not provided, the test set from theBaseSource
used to train the model is used if present.
Returns A tuple of numpy arrays containing the predicted and actual values.
Raises
ValueError
: If there is no test data to evaluate the model on
fit
def fit( self, data: Optional[BaseSource] = None, metrics: Optional[Mapping[str, Metric]] = None, *args: Any, **kwargs: Any,) ‑> None:
Trains a model using the training set provided by the BaseSource object.
initialise_model
def initialise_model( self, data: Optional[BaseSource] = None, context: Optional[TaskContext] = None,) ‑> None:
Can be implemented to initialise model if necessary.
This is automatically called by the fit()
method if necessary.
Arguments
data
: The data used for model training.context
: Indicates if the model is running as a modeller or worker. If None, there is no difference between modeller and worker. Defaults to None.
predict
def predict(self, *args: Any, **kwargs: Any) ‑> List[numpy.ndarray]:
Returns model predictions. Not implemented yet.
serialize
def serialize(self, filename: Union[str, os.PathLike]) ‑> None:
Serialize model to file with provided filename
.
Arguments
filename
: Path to file to save serialized model.