Skip to main content

algorithms

Algorithms for remote processing of data.

Federated algorithm plugins can also be imported from this package.

Module

Submodules

Classes

BaseAlgorithmFactory

class BaseAlgorithmFactory(**kwargs: Any):

Base algorithm factory from which all other algorithms must inherit.

Attributes

  • class_name: The name of the algorithm class.

Ancestors

  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

CSVReportAlgorithm

class CSVReportAlgorithm(    save_path: Optional[Union[str, os.PathLike]] = None,    original_cols: Optional[List[str]] = None,    filter: Optional[List[ColumnFilter]] = None,    **kwargs: Any,):

Algorithm for generating the CSV results reports.

Arguments

  • save_path: The folder path where the csv report should be saved. The CSV report will have the same name as the taskID.
  • original_cols: The tabular columns from the datasource to include in the report. If not specified it will include all tabular columns from the datasource.
  • filter: A list of ColumnFilter instances on which we will filter the data on. Defaults to None. If supplied, columns will be added to the output csv indicating the records that match the specified criteria. If more than one ColumnFilter is given, and additional column will be added to the output csv indicating the datapoints that match all given criteria (as well as the individual matches)

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.csv_report_algorithm._ModellerSide:

Modeller-side of the algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.csv_report_algorithm._WorkerSide:

Worker-side of the algorithm.

ColumnAverage

class ColumnAverage(*, field: str, table_name: str):

Simple algorithm for taking the arithmetic mean of a column in a table.

Arguments

  • field: The name of the column to take the mean of.
  • table_name: The name of the table on which column average will be performed on.

Attributes

  • class_name: The name of the algorithm class.
  • field: The name of the column to take the mean of.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • table_name: The name of the table on which column average will be performed on.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.column_avg._ModellerSide:

Returns the modeller side of the ColumnAverage algorithm.

worker

def worker(self, **kwargs: Any)> bitfount.federated.algorithms.column_avg._WorkerSide:

Returns the worker side of the ColumnAverage algorithm.

ComputeIntersectionRSA

class ComputeIntersectionRSA(    datasource_columns: Optional[List[str]] = None,    datasource_table: Optional[str] = None,    pod_columns: Optional[List[str]] = None,    pod_table: Optional[str] = None,):

Algorithm for computing the private set intersection with RSA blinding.

caution

This algorithm does not work with iterable datasources such as multi-table databases.

Arguments

  • datasource_columns: The modeller's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.
  • datasource_table: The modeller's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
  • pod_columns: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.
  • pod_table: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pod_columns: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.
  • pod_table: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.

Ancestors

  • BaseAlgorithmFactory
  • bitfount.federated.mixins._PSIAlgorithmsMixIn
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn
  • bitfount.federated.types._DataLessAlgorithm

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

execute

def execute(    self: _PSICompatibleAlgoFactory_,    pod_identifiers: List[str],    datasource: BaseSource,    datasource_columns: Optional[List[str]] = None,    datasource_table: Optional[str] = None,    pod_columns: Optional[List[str]] = None,    pod_table: Optional[str] = None,    username: Optional[str] = None,    bitfounthub: Optional[BitfountHub] = None,    ms_config: Optional[MessageServiceConfig] = None,    message_service: Optional[_MessageService] = None,    pod_public_key_paths: Optional[Mapping[str, Path]] = None,    identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE,    private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None,    idp_url: Optional[str] = None,    require_all_pods: bool = False,    project_id: Optional[str] = None,)> List[pd.DataFrame]:

Execute PSI compatible algorithm.

Syntactic sugar to allow the modeller to call .intersect(...) on PrivateSetIntersection compatible algorithms.

Arguments

  • pod_identifiers: The pod identifier for the Private Set Intersection as a list.
  • datasource: The modeller datasource on which the Private Set Intersection will be performed on.
  • datasource_columns: The modeller's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.
  • datasource_table: The modeller's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
  • pod_columns: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.
  • pod_table: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
  • username: The modeller's username. Defaults to None.
  • bitfounthub: The bifount hub instance. Defaults to None.
  • ms_config: Configuration for the message service. Defaults to None.
  • message_service: The message service to use. Defaults to None.
  • pod_public_key_paths: The path for the pod public key. Used when authentication with the pod is done with the public key. Defaults to None.
  • identity_verification_method: The identity verification method to use. Defaults to None.
  • private_key_or_file: The private key or path to the private key. Defaults to None.
  • idp_url: The identity provider url. Defaults to None.
  • require_all_pods: Whether all pods are needed for the algorithm. Defaults to False.

Returns The records from the modeller dataset which were found in the intersection.

Raises

  • PSIMultiplePodsError: If execute is called on multiple pods.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.compute_intersection_rsa._ModellerSide:

Returns the modeller side of the SqlQuery algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.compute_intersection_rsa._WorkerSide:

Returns the worker side of the SqlQuery algorithm.

FederatedModelTraining

class FederatedModelTraining(    *,    model: _DistributedModelTypeOrReference,    modeller_checkpointing: bool = True,    checkpoint_filename: Optional[str] = None,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for training a model remotely and returning its updated parameters.

This algorithm is designed to be compatible with the FederatedAveraging protocol.

Arguments

  • model: The model to train on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • checkpoint_filename: The filename for the last checkpoint. Defaults to the task id and the last iteration number, i.e., {taskid}-iteration-{iteration_number}.pt.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to train on remote data.
  • modeller_checkpointing: Whether to save the last checkpoint on the modeller side. Defaults to True.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.federated_training._ModellerSide:

Returns the modeller side of the FederatedModelTraining algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.federated_training._WorkerSide:

Returns the worker side of the FederatedModelTraining algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.

HuggingFaceImageClassificationInference

class HuggingFaceImageClassificationInference(    model_id: str,    image_column_name: str,    seed: int = 42,    apply_softmax_to_predictions: bool = True,    batch_size: int = 1,    top_k: int = 5,):

Inference for pre-trained Hugging Face image classification models.

Arguments

  • batch_size: The batch size for inference. Defaults to 1.
  • image_column_name: The image column on which the inference should be done.
  • model_id: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • top_k: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.

Attributes

  • batch_size: The batch size for inference. Defaults to 1.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • image_column_name: The image column on which the inference should be done.
  • model_id: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • top_k: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.

Ancestors

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._ModellerSide:

Returns the modeller side of the HuggingFaceImageClassificationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._WorkerSide:

Returns the worker side of the HuggingFaceImageClassification algorithm.

HuggingFaceImageSegmentationInference

class HuggingFaceImageSegmentationInference(    model_id: str,    image_column_name: str,    alpha: float = 0.3,    batch_size: int = 1,    dataframe_output: bool = False,    mask_threshold: float = 0.5,    overlap_mask_area_threshold: float = 0.5,    save_path: Union[str, os.PathLike] = PosixPath('.'),    seed: int = 42,    subtask: Optional[_Subtask] = None,    threshold: float = 0.9,):

Inference for pre-trained Hugging Face image segmentation models.

Perform segmentation (detect masks & classes) in the image(s) passed as inputs.

Arguments

  • alpha: the alpha for the mask overlay.
  • batch_size: The batch size for inference. Defaults to 1.
  • dataframe_output: Whether to output the prediction results in a dataframe format. Defaults to False.
  • image_column_name: The image column on which the inference should be done.
  • mask_threshold: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.
  • model_id: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • overlap_mask_area_threshold: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.
  • save_path: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • subtask: Segmentation task to be performed, choose [semantic, instance and panoptic] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order: panoptic, instance, semantic.
  • threshold: Probability threshold to filter out predicted masks. Defaults to 0.9.

Attributes

  • alpha: the alpha for the mask overlay.
  • batch_size: The batch size for inference. Defaults to 1.
  • class_name: The name of the algorithm class.
  • dataframe_output: Whether to output the prediction results in a dataframe format. Defaults to False.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • image_column_name: The image column on which the inference should be done.
  • mask_threshold: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.
  • model_id: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • overlap_mask_area_threshold: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.
  • save_path: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • subtask: Segmentation task to be performed, choose [semantic, instance and panoptic] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order: panoptic, instance, semantic.
  • threshold: Probability threshold to filter out predicted masks. Defaults to 0.9.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._ModellerSide:

Returns the modeller side of the HuggingFaceImageSegmentationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._WorkerSide:

Returns the worker side of the HuggingFaceImageSegmentationInference algorithm.

HuggingFacePerplexityEvaluation

class HuggingFacePerplexityEvaluation(    model_id: str, text_column_name: str, stride: int = 512, seed: int = 42,):

Hugging Face Perplexity Algorithm.

Arguments

  • model_id: The model id to use for evaluating its perplexity. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • stride: Sets the stride of the algorithm. Defaults to 512.
  • text_column_name: The single column to query against. Should contain text for generation.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model_id: The model id to use for evaluation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • stride: Sets the stride of the algorithm. Defaults to 512.
  • text_column_name: The single column to query against. Should contain text for generation.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._ModellerSide:

Returns the modeller side of the HuggingFacePerplexityEvaluation algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._WorkerSide:

Returns the worker side of the HuggingFacePerplexityEvaluation algorithm.

HuggingFaceTextClassificationInference

class HuggingFaceTextClassificationInference(    model_id: str,    target_column_name: str,    batch_size: int = 1,    function_to_apply: Optional[_FunctionToApply] = None,    seed: int = 42,    top_k: int = 1,):

Inference for pre-trained Hugging Face text classification models.

Arguments

  • batch_size: The batch size for inference. Defaults to 1.
  • function_to_apply: The function to apply to the model outputs in order to retrieve the scores. Accepts four different values: if this argument is not specified, then it will apply the following functions according to the number of labels - if the model has a single label, will apply the sigmoid function on the output; if the model has several labels, will apply the softmax function on the output. Possible values are:
  • "sigmoid": Applies the sigmoid function on the output.
  • "softmax": Applies the softmax function on the output.
  • "none": Does not apply any function on the output. Default to None.
  • model_id: The model id to use for text classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • target_column_name: The target column on which the inference should be done.
  • top_k: The number of top labels that will be returned by the pipeline. Defaults to 1.

Attributes

  • batch_size: The batch size for inference. Defaults to 1.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • function_to_apply: The function to apply to the model outputs in order to retrieve the scores. Accepts four different values: if this argument is not specified, then it will apply the following functions according to the number of labels - if the model has a single label, will apply the sigmoid function on the output; if the model has several labels, will apply the softmax function on the output. Possible values are:
  • "sigmoid": Applies the sigmoid function on the output.
  • "softmax": Applies the softmax function on the output.
  • "none": Does not apply any function on the output. Default to None.
  • model_id: The model id to use for text classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • target_column_name: The target column on which the inference should be done.
  • top_k: The number of top labels that will be returned by the pipeline. Defaults to 1.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_classification._ModellerSide:

Returns the modeller side of the HuggingFaceTextClassificationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_classification._WorkerSide:

Returns the worker side of the HuggingFaceTextClassificationInference algorithm.

HuggingFaceTextGenerationInference

class HuggingFaceTextGenerationInference(    model_id: str,    text_column_name: Optional[str] = None,    prompt_format: Optional[str] = None,    max_length: int = 50,    num_return_sequences: int = 1,    seed: int = 42,    min_new_tokens: int = 1,    repetition_penalty: float = 1.0,    num_beams: int = 1,    early_stopping: bool = True,    pad_token_id: Optional[int] = None,    eos_token_id: Optional[int] = None,    device: Optional[str] = None,    torch_dtype: "Literal['bfloat16', 'float16', 'float32', 'float64']" = 'float32',):

Hugging Face Text Generation Algorithm.

Arguments

  • device: The device to use for the model. Defaults to None. On the worker side, will be set to the environment variable BITFOUNT_DEFAULT_TORCH_DEVICE if specified, otherwise "cpu".
  • early_stopping: Whether to stop the generation as soon as there are num_beams complete candidates. Defaults to True.
  • eos_token_id: The id of the token to use as the last token for each sequence. If None (default), it will default to the eos_token_id of the tokenizer.
  • max_length: The maximum length of the sequence to be generated. Defaults to 50.
  • min_new_tokens: The minimum number of new tokens to add to the prompt. Defaults to 1.
  • model_id: The model id to use for text generation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • num_beams: Number of beams for beam search. 1 means no beam search. Defaults to 1.
  • num_return_sequences: The number of sequence candidates to return for each input. Defaults to 1.
  • pad_token_id: The id of the token to use as padding token. If None (default), it will default to the pad_token_id of the tokenizer.
  • prompt_format: The format of the prompt as a string with a single {context} placeholder which is where the pod's input will be inserted. For example, You are a Language Model. This is the context: {context}. Please summarize it.. This only applies if text_column_name is provided, it is not used for dynamic prompting. Defaults to None.
  • repetition_penalty: The parameter for repetition penalty. 1.0 means no penalty. Defaults to 1.0.
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • text_column_name: The single column to query against. Should contain text for generation. If not provided, the algorithm must be used with a protocol which dynamically provides the text to be used for prompting.
  • torch_dtype: The torch dtype to use for the model. Defaults to "float32".

Attributes

  • class_name: The name of the algorithm class.
  • device: The device to use for the model. Defaults to None. On the worker side, will be set to the environment variable BITFOUNT_DEFAULT_TORCH_DEVICE if specified, otherwise "cpu".
  • early_stopping: Whether to stop the generation as soon as there are num_beams complete candidates. Defaults to True.
  • eos_token_id: The id of the token to use as the last token for each sequence. If None (default), it will default to the eos_token_id of the tokenizer.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • max_length: The maximum length of the sequence to be generated. Defaults to 50.
  • min_new_tokens: The minimum number of new tokens to add to the prompt. Defaults to 1.
  • model_id: The model id to use for text generation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • num_beams: Number of beams for beam search. 1 means no beam search. Defaults to 1.
  • num_return_sequences: The number of sequence candidates to return for each input. Defaults to 1.
  • pad_token_id: The id of the token to use as padding token. If None (default), it will default to the pad_token_id of the tokenizer.
  • prompt_format: The format of the prompt as a string with a single {context} placeholder which is where the pod's input will be inserted. For example, You are a Language Model. This is the context: {context}. Please summarize it.. This only applies if text_column_name is provided, it is not used for dynamic prompting. Defaults to None.
  • repetition_penalty: The parameter for repetition penalty. 1.0 means no penalty. Defaults to 1.0.
  • seed: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.
  • text_column_name: The single column to query against. Should contain text for generation. If not provided, the algorithm must be used with a protocol which dynamically provides the text to be used for prompting.
  • torch_dtype: The torch dtype to use for the model. Defaults to "float32".

Raises

  • ValueError: If prompt_format is provided without text_column_name.
  • ValueError: If prompt_format does not contain a single {context} placeholder.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_generation._ModellerSide:

Returns the modeller side of the HuggingFaceTextGenerationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_generation._WorkerSide:

Returns the worker side of the HuggingFaceTextGenerationInference algorithm.

HuggingFaceZeroShotImageClassificationInference

class HuggingFaceZeroShotImageClassificationInference(    model_id: str,    image_column_name: str,    candidate_labels: List[str],    batch_size: int = 1,    class_outputs: Optional[List[str]] = None,    hypothesis_template: Optional[str] = None,    seed: int = 42,):

Inference for pre-trained Hugging Face zero shot image classification models.

Arguments

  • batch_size: The batch size for inference. Defaults to 1.
  • candidate_labels: The candidate labels for this image.
  • class_outputs: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.
  • hypothesis_template: The sentence used in conjunction with candidate_labels to attempt the image classification by replacing the placeholder with the candidate_labels. Then likelihood is estimated by using logits_per_image. Defaults to None.
  • image_column_name: The image column on which the inference should be done.
  • model_id: The model id to use for zero shot image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co.
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.

Attributes

  • batch_size: The batch size for inference. Defaults to 1.
  • candidate_labels: The candidate labels for this image.
  • class_name: The name of the algorithm class.
  • class_outputs: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • hypothesis_template: The sentence used in conjunction with candidate_labels to attempt the image classification by replacing the placeholder with the candidate_labels. Then likelihood is estimated by using logits_per_image. Defaults to None.
  • image_column_name: The image column on which the inference should be done.
  • model_id: The model id to use for zero shot image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_zero_shot_image_classification._ModellerSide:

Returns the modeller side of the HuggingFaceZeroShotImageClassificationInference algorithm.

worker

def worker(    self,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_zero_shot_image_classification._WorkerSide:

Returns the worker side of the HuggingFaceZeroShotImageClassificationInference algorithm.

ModelEvaluation

class ModelEvaluation(    *,    model: _DistributedModelTypeOrReference,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for evaluating a model and returning metrics.

note

The metrics cannot currently be specified by the user.

Arguments

  • model: The model to evaluate on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to evaluate on remote data.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.evaluate._ModellerSide:

Returns the modeller side of the ModelEvaluation algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.evaluate._WorkerSide:

Returns the worker side of the ModelEvaluation algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.

ModelInference

class ModelInference(    *,    model: _DistributedModelTypeOrReference,    class_outputs: Optional[List[str]] = None,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for running inference on a model and returning the predictions.

danger

This algorithm could potentially return the data unfiltered so should only be used when the other party is trusted.

Arguments

  • class_outputs: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.
  • model: The model to infer on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • class_outputs: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to infer on remote data.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.inference._ModellerSide:

Returns the modeller side of the ModelInference algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.inference._WorkerSide:

Returns the worker side of the ModelInference algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.

ModelTrainingAndEvaluation

class ModelTrainingAndEvaluation(    *,    model: _DistributedModelTypeOrReference,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Algorithm for training a model, evaluating it and returning metrics.

note

The metrics cannot currently be specified by the user.

Arguments

  • model: The model to train and evaluate on remote data.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • model: The model to train and evaluate on remote data.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

  • bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
  • BaseAlgorithmFactory
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.train_and_evaluate._ModellerSide:

Returns the modeller side of the ModelTrainingAndEvaluation algorithm.

worker

def worker(    self, hub: BitfountHub, **kwargs: Any,)> bitfount.federated.algorithms.model_algorithms.train_and_evaluate._WorkerSide:

Returns the worker side of the ModelTrainingAndEvaluation algorithm.

Arguments

  • hub: BitfountHub object to use for communication with the hub.

PrivateSqlQuery

class PrivateSqlQuery(    *,    query: str,    epsilon: float,    delta: float,    column_ranges: dict,    table: Optional[str] = None,    db_schema: Optional[str] = None,):

Simple algorithm for running a SQL query on a table, with privacy.

note

The values provided for the privacy budget (i.e. epsilon and delta) will be applied individually to all columns included in the SQL query provided. If the total values of the epsilon and delta exceed the maximum allowed by the pod, the provided values will be reduced to the maximum values required to remain within the allowed privacy budget.

Arguments

  • column_ranges: A dictionary of column names and their ranges.
  • db_schema: The name of the schema for a database connection. If not provided, it will be set to the default schema name for the database.
  • delta: The target delta to use for the privacy budget.
  • epsilon: The maximum epsilon to use for the privacy budget.
  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Attributes

  • class_name: The name of the algorithm class.
  • column_ranges: A dictionary of column names and their ranges.
  • db_schema: The name of the schema for a database connection. If not provided, it will be set to the default schema name for the database.
  • delta: The target delta to use for the privacy budget.
  • epsilon: The maximum epsilon to use for the privacy budget.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Raises

  • DatabaseSchemaNotFoundError: If a non-existent db_schema name is provided.
  • PrivateSqlError: If there is an error executing the private SQL query (e.g. DP misconfiguration or bad query specified).
  • ValueError: If a pod identifier is not supplied, or if a join is attempted.

Ancestors

  • BaseAlgorithmFactory
  • bitfount.federated.mixins._ModellessAlgorithmMixIn
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

execute

def execute(    self,    pod_identifiers: List[str],    username: Optional[str] = None,    bitfounthub: Optional[BitfountHub] = None,    ms_config: Optional[MessageServiceConfig] = None,    message_service: Optional[_MessageService] = None,    pod_public_key_paths: Optional[Mapping[str, Path]] = None,    identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE,    private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None,    idp_url: Optional[str] = None,    require_all_pods: bool = False,    aggregator: Optional[_BaseAggregatorFactory] = None,    project_id: Optional[str] = None,)> List[pd.DataFrame]:

Execute ResultsOnly compatible algorithm.

Syntactic sugar to allow the modeller to call .execute(...) on ResultsOnly compatible algorithms.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.private_sql_query._ModellerSide:

Returns the modeller side of the PrivateSqlQuery algorithm.

Arguments

  • ****kwargs**: Additional keyword arguments to pass to the modeller side.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.private_sql_query._WorkerSide:

Returns the worker side of the PrivateSqlQuery algorithm.

Arguments

  • **kwargs: Additional keyword arguments to pass to the worker side. hub must be one of these keyword arguments which provides aBitfountHub instance.

SqlQuery

class SqlQuery(*, query: str, table: Optional[str] = None):

Simple algorithm for running a SQL query on a table.

info

The default table for single-table datasources is the pod identifier without the username, in between backticks(``). Please ensure your SQL query operates on that table. The table name should be put inside backticks(``) in the query statement, to make sure it is correctly parsed e.g. SELECT MAX(G) AS MAX_OF_G FROM `df` . This is the standard quoting mechanism used by MySQL (and also included in SQLite).

info

If you are using a multi-table datasource, ensure that your SQL query syntax matches the syntax required by the Pod database backend.

Arguments

  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • query: The SQL query to execute.
  • table: The target table name. For single table pod datasources, this will default to the pod name.

Ancestors

  • BaseAlgorithmFactory
  • bitfount.federated.mixins._ModellessAlgorithmMixIn
  • abc.ABC
  • bitfount.federated.roles._RolesMixIn
  • bitfount.types._BaseSerializableObjectMixIn
  • bitfount.federated.types._DataLessAlgorithm

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

execute

def execute(    self,    pod_identifiers: List[str],    username: Optional[str] = None,    bitfounthub: Optional[BitfountHub] = None,    ms_config: Optional[MessageServiceConfig] = None,    message_service: Optional[_MessageService] = None,    pod_public_key_paths: Optional[Mapping[str, Path]] = None,    identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE,    private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None,    idp_url: Optional[str] = None,    require_all_pods: bool = False,    aggregator: Optional[_BaseAggregatorFactory] = None,    project_id: Optional[str] = None,)> List[pd.DataFrame]:

Execute ResultsOnly compatible algorithm.

Syntactic sugar to allow the modeller to call .execute(...) on ResultsOnly compatible algorithms.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.sql_query._ModellerSide:

Returns the modeller side of the SqlQuery algorithm.

worker

def worker(self, **kwargs: Any)> bitfount.federated.algorithms.sql_query._WorkerSide:

Returns the worker side of the SqlQuery algorithm.

TIMMFineTuning

class TIMMFineTuning(    model_id: str,    schema: Optional[BitfountSchema] = None,    datastructure: Optional[DataStructure] = None,    image_column_name: Optional[str] = None,    target_column_name: Optional[str] = None,    labels: Optional[List[str]] = None,    args: Optional[TIMMTrainingConfig] = None,    batch_transformations: Union[List[Union[str, Dict[str, Any]]], Dict[Literal['train', 'validation'], List[Union[str, Dict[str, Any]]]], ForwardRef(None)] = None,    return_weights: bool = False,    save_path: Union[str, os.PathLike, ForwardRef(None)] = None,):

HuggingFace TIMM Fine Tuning Algorithm.

Arguments

  • ****kwargs**: Additional keyword arguments passed to the Worker side.
  • args: The training configuration.
  • batch_transformations: The batch transformations to be applied to the batches. Can be a list of strings or a list of dictionaries, which will be applied to both training and validation, or a dictionary with keys "train" and "validation" mapped to a list of strings or a list of dictionaries, specifying the batch transformations to be applied at each individual step. They are only applied if datastructure is not passed. Defaults to apply DEFAULT_IMAGE_TRANSFORMATIONS to both training and validation.
  • datastructure: The datastructure relating to the dataset to be trained on. Defaults to None.
  • image_column_name: The column name of the image column used in training. Defaults to None.
  • labels: The labels of the target column. Defaults to None.
  • model_id: The Hugging Face model ID.
  • return_weights: Whether to return the weights of the model.
  • save_path: The path to save the model to.
  • schema: The schema of the dataset to be trained on. Defaults to None.
  • target_column_name: The column name of the target column. Defaults to None.

Attributes

  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})

Ancestors

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.timm_fine_tuning._ModellerSide:

Returns the modeller side of the TIMMFineTuning algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.timm_fine_tuning._WorkerSide:

Returns the worker side of the TIMMFineTuning algorithm.

TIMMInference

class TIMMInference(    model_id: str,    image_column_name: str,    num_classes: Optional[int] = None,    batch_transformations: Optional[List[Dict[str, Dict[str, Any]]]] = None,    batch_size: int = 1,    checkpoint_path: Union[str, os.PathLike, ForwardRef(None)] = None,    class_outputs: Optional[List[str]] = None,):

HuggingFace TIMM Inference Algorithm..

Arguments

  • checkpoint_path: The path to a checkpoint file local to the Pod. Defaults to None.
  • class_outputs: A list of explict class outputs to use as labels. Defaults to None.
  • image_column_name: The column name of the image paths.
  • model_id: The model id to use from the Hugging Face Hub.
  • num_classes: The number of classes in the model. Defaults to None.

Attributes

  • checkpoint_path: The path to a checkpoint file local to the Pod. Defaults to None.
  • class_name: The name of the algorithm class.
  • class_outputs: A list of explict class outputs to use as labels. Defaults to None.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • image_column_name: The column name of the image paths.
  • model_id: The model id to use from the Hugging Face Hub.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • num_classes: The number of classes in the model. Defaults to None.

Ancestors

Variables

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.timm_inference._ModellerSide:

Returns the modeller side of the TIMMInference algorithm.

worker

def worker(    self, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.timm_inference._WorkerSide:

Returns the worker side of the TIMMInference algorithm.

_BaseModelAlgorithmFactory

class _BaseModelAlgorithmFactory(    *,    model: _DistributedModelTypeOrReference,    pretrained_file: Optional[Union[str, os.PathLike]] = None,    project_id: Optional[str] = None,):

Base factory for algorithms involving an underlying model.

Arguments

  • model: The model for the federated algorithm.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Attributes

  • model: The model for the federated algorithm.
  • pretrained_file: A file path or a string containing a pre-trained model. Defaults to None.

Ancestors

Variables

  • static nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
  • model_schema : Dict[str, Any] - Returns underlying model Schema.