algorithms
Algorithms for remote processing of data.
Federated algorithm plugins can also be imported from this package.
Module
Submodules
- bitfount.federated.algorithms.base - Base classes for all algorithms.
- bitfount.federated.algorithms.column_avg - Column averaging algorithm.
- bitfount.federated.algorithms.compute_intersection_rsa - RSA Blinding Private Set intersection.
- bitfount.federated.algorithms.csv_report_algorithm - Algorithm for outputting results to CSV on the pod-side.
- bitfount.federated.algorithms.hugging_face_algorithms - Algorithms for remote Hugging Face models.
- bitfount.federated.algorithms.model_algorithms - Algorithms for remote/federated model training on data.
- bitfount.federated.algorithms.private_sql_query - Private SQL query algorithm.
- bitfount.federated.algorithms.sql_query - SQL query algorithm.
Classes
BaseAlgorithmFactory
class BaseAlgorithmFactory(**kwargs: Any):
Base algorithm factory from which all other algorithms must inherit.
Attributes
class_name
: The name of the algorithm class.
Subclasses
- ColumnAverage
- ComputeIntersectionRSA
- CSVReportAlgorithm
- HuggingFaceImageClassificationInference
- HuggingFaceImageSegmentationInference
- HuggingFacePerplexityEvaluation
- HuggingFaceTextClassificationInference
- HuggingFaceTextGenerationInference
- HuggingFaceZeroShotImageClassificationInference
- TIMMFineTuning
- TIMMInference
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- PrivateSqlQuery
- SqlQuery
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
CSVReportAlgorithm
class CSVReportAlgorithm( save_path: Optional[Union[str, os.PathLike]] = None, original_cols: Optional[List[str]] = None, filter: Optional[List[ColumnFilter]] = None, **kwargs: Any,):
Algorithm for generating the CSV results reports.
Arguments
save_path
: The folder path where the csv report should be saved. The CSV report will have the same name as the taskID.original_cols
: The tabular columns from the datasource to include in the report. If not specified it will include all tabular columns from the datasource.filter
: A list ofColumnFilter
instances on which we will filter the data on. Defaults to None. If supplied, columns will be added to the output csv indicating the records that match the specified criteria. If more than oneColumnFilter
is given, and additional column will be added to the output csv indicating the datapoints that match all given criteria (as well as the individual matches)
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.csv_report_algorithm._ModellerSide:
Modeller-side of the algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.csv_report_algorithm._WorkerSide:
Worker-side of the algorithm.
ColumnAverage
class ColumnAverage(*, field: str, table_name: str):
Simple algorithm for taking the arithmetic mean of a column in a table.
Arguments
field
: The name of the column to take the mean of.table_name
: The name of the table on which column average will be performed on.
Attributes
class_name
: The name of the algorithm class.field
: The name of the column to take the mean of.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)table_name
: The name of the table on which column average will be performed on.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.column_avg._ModellerSide:
Returns the modeller side of the ColumnAverage algorithm.
worker
def worker(self, **kwargs: Any) ‑> bitfount.federated.algorithms.column_avg._WorkerSide:
Returns the worker side of the ColumnAverage algorithm.
ComputeIntersectionRSA
class ComputeIntersectionRSA( datasource_columns: Optional[List[str]] = None, datasource_table: Optional[str] = None, pod_columns: Optional[List[str]] = None, pod_table: Optional[str] = None,):
Algorithm for computing the private set intersection with RSA blinding.
This algorithm does not work with iterable datasources such as multi-table databases.
Arguments
datasource_columns
: The modeller's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.datasource_table
: The modeller's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.pod_columns
: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.pod_table
: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pod_columns
: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.pod_table
: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
Ancestors
- BaseAlgorithmFactory
- bitfount.federated.mixins._PSIAlgorithmsMixIn
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
- bitfount.federated.types._DataLessAlgorithm
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
execute
def execute( self: _PSICompatibleAlgoFactory_, pod_identifiers: List[str], datasource: BaseSource, datasource_columns: Optional[List[str]] = None, datasource_table: Optional[str] = None, pod_columns: Optional[List[str]] = None, pod_table: Optional[str] = None, username: Optional[str] = None, bitfounthub: Optional[BitfountHub] = None, ms_config: Optional[MessageServiceConfig] = None, message_service: Optional[_MessageService] = None, pod_public_key_paths: Optional[Mapping[str, Path]] = None, identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE, private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None, idp_url: Optional[str] = None, require_all_pods: bool = False, project_id: Optional[str] = None,) ‑> List[pd.DataFrame]:
Execute PSI compatible algorithm.
Syntactic sugar to allow the modeller to call .intersect(...)
on
PrivateSetIntersection compatible algorithms.
Arguments
pod_identifiers
: The pod identifier for the Private Set Intersection as a list.datasource
: The modeller datasource on which the Private Set Intersection will be performed on.datasource_columns
: The modeller's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.datasource_table
: The modeller's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.pod_columns
: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.pod_table
: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.username
: The modeller's username. Defaults to None.bitfounthub
: The bifount hub instance. Defaults to None.ms_config
: Configuration for the message service. Defaults to None.message_service
: The message service to use. Defaults to None.pod_public_key_paths
: The path for the pod public key. Used when authentication with the pod is done with the public key. Defaults to None.identity_verification_method
: The identity verification method to use. Defaults to None.private_key_or_file
: The private key or path to the private key. Defaults to None.idp_url
: The identity provider url. Defaults to None.require_all_pods
: Whether all pods are needed for the algorithm. Defaults to False.
Returns The records from the modeller dataset which were found in the intersection.
Raises
PSIMultiplePodsError
: Ifexecute
is called on multiple pods.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.compute_intersection_rsa._ModellerSide:
Returns the modeller side of the SqlQuery algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.compute_intersection_rsa._WorkerSide:
Returns the worker side of the SqlQuery algorithm.
FederatedModelTraining
class FederatedModelTraining( *, model: _DistributedModelTypeOrReference, modeller_checkpointing: bool = True, checkpoint_filename: Optional[str] = None, pretrained_file: Optional[Union[str, os.PathLike]] = None, project_id: Optional[str] = None,):
Algorithm for training a model remotely and returning its updated parameters.
This algorithm is designed to be compatible with the FederatedAveraging
protocol.
Arguments
model
: The model to train on remote data.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Attributes
checkpoint_filename
: The filename for the last checkpoint. Defaults to the task id and the last iteration number, i.e.,{taskid}-iteration-{iteration_number}.pt
.class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model
: The model to train on remote data.modeller_checkpointing
: Whether to save the last checkpoint on the modeller side. Defaults to True.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Ancestors
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.federated_training._ModellerSide:
Returns the modeller side of the FederatedModelTraining algorithm.
worker
def worker( self, hub: BitfountHub, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.federated_training._WorkerSide:
Returns the worker side of the FederatedModelTraining algorithm.
Arguments
hub
:BitfountHub
object to use for communication with the hub.
HuggingFaceImageClassificationInference
class HuggingFaceImageClassificationInference( model_id: str, image_column_name: str, seed: int = 42, apply_softmax_to_predictions: bool = True, batch_size: int = 1, top_k: int = 5,):
Inference for pre-trained Hugging Face image classification models.
Arguments
batch_size
: The batch size for inference. Defaults to 1.image_column_name
: The image column on which the inference should be done.model_id
: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.top_k
: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.
Attributes
batch_size
: The batch size for inference. Defaults to 1.class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).image_column_name
: The image column on which the inference should be done.model_id
: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.top_k
: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._ModellerSide:
Returns the modeller side of the HuggingFaceImageClassificationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._WorkerSide:
Returns the worker side of the HuggingFaceImageClassification algorithm.
HuggingFaceImageSegmentationInference
class HuggingFaceImageSegmentationInference( model_id: str, image_column_name: str, alpha: float = 0.3, batch_size: int = 1, dataframe_output: bool = False, mask_threshold: float = 0.5, overlap_mask_area_threshold: float = 0.5, save_path: Union[str, os.PathLike] = PosixPath('.'), seed: int = 42, subtask: Optional[_Subtask] = None, threshold: float = 0.9,):
Inference for pre-trained Hugging Face image segmentation models.
Perform segmentation (detect masks & classes) in the image(s) passed as inputs.
Arguments
alpha
: the alpha for the mask overlay.batch_size
: The batch size for inference. Defaults to 1.dataframe_output
: Whether to output the prediction results in a dataframe format. Defaults toFalse
.image_column_name
: The image column on which the inference should be done.mask_threshold
: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.model_id
: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.overlap_mask_area_threshold
: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.save_path
: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.subtask
: Segmentation task to be performed, choose [semantic
,instance
andpanoptic
] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order:panoptic
,instance
,semantic
.threshold
: Probability threshold to filter out predicted masks. Defaults to 0.9.
Attributes
alpha
: the alpha for the mask overlay.batch_size
: The batch size for inference. Defaults to 1.class_name
: The name of the algorithm class.dataframe_output
: Whether to output the prediction results in a dataframe format. Defaults toFalse
.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).image_column_name
: The image column on which the inference should be done.mask_threshold
: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.model_id
: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)overlap_mask_area_threshold
: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.save_path
: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.subtask
: Segmentation task to be performed, choose [semantic
,instance
andpanoptic
] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order:panoptic
,instance
,semantic
.threshold
: Probability threshold to filter out predicted masks. Defaults to 0.9.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._ModellerSide:
Returns the modeller side of the HuggingFaceImageSegmentationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._WorkerSide:
Returns the worker side of the HuggingFaceImageSegmentationInference algorithm.
HuggingFacePerplexityEvaluation
class HuggingFacePerplexityEvaluation( model_id: str, text_column_name: str, stride: int = 512, seed: int = 42,):
Hugging Face Perplexity Algorithm.
Arguments
model_id
: The model id to use for evaluating its perplexity. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.seed
: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.stride
: Sets the stride of the algorithm. Defaults to 512.text_column_name
: The single column to query against. Should contain text for generation.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model_id
: The model id to use for evaluation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)seed
: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.stride
: Sets the stride of the algorithm. Defaults to 512.text_column_name
: The single column to query against. Should contain text for generation.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._ModellerSide:
Returns the modeller side of the HuggingFacePerplexityEvaluation algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._WorkerSide:
Returns the worker side of the HuggingFacePerplexityEvaluation algorithm.
HuggingFaceTextClassificationInference
class HuggingFaceTextClassificationInference( model_id: str, target_column_name: str, batch_size: int = 1, function_to_apply: Optional[_FunctionToApply] = None, seed: int = 42, top_k: int = 1,):
Inference for pre-trained Hugging Face text classification models.
Arguments
batch_size
: The batch size for inference. Defaults to 1.function_to_apply
: The function to apply to the model outputs in order to retrieve the scores. Accepts four different values: if this argument is not specified, then it will apply the following functions according to the number of labels - if the model has a single label, will apply thesigmoid
function on the output; if the model has several labels, will apply thesoftmax
function on the output. Possible values are:"sigmoid"
: Applies the sigmoid function on the output."softmax"
: Applies the softmax function on the output."none"
: Does not apply any function on the output. Default to None.model_id
: The model id to use for text classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.target_column_name
: The target column on which the inference should be done.top_k
: The number of top labels that will be returned by the pipeline. Defaults to 1.
Attributes
batch_size
: The batch size for inference. Defaults to 1.class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).function_to_apply
: The function to apply to the model outputs in order to retrieve the scores. Accepts four different values: if this argument is not specified, then it will apply the following functions according to the number of labels - if the model has a single label, will apply thesigmoid
function on the output; if the model has several labels, will apply thesoftmax
function on the output. Possible values are:"sigmoid"
: Applies the sigmoid function on the output."softmax"
: Applies the softmax function on the output."none"
: Does not apply any function on the output. Default to None.model_id
: The model id to use for text classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.target_column_name
: The target column on which the inference should be done.top_k
: The number of top labels that will be returned by the pipeline. Defaults to 1.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_classification._ModellerSide:
Returns the modeller side of the HuggingFaceTextClassificationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_classification._WorkerSide:
Returns the worker side of the HuggingFaceTextClassificationInference algorithm.
HuggingFaceTextGenerationInference
class HuggingFaceTextGenerationInference( model_id: str, text_column_name: Optional[str] = None, prompt_format: Optional[str] = None, max_length: int = 50, num_return_sequences: int = 1, seed: int = 42, min_new_tokens: int = 1, repetition_penalty: float = 1.0, num_beams: int = 1, early_stopping: bool = True, pad_token_id: Optional[int] = None, eos_token_id: Optional[int] = None, device: Optional[str] = None, torch_dtype: "Literal['bfloat16', 'float16', 'float32', 'float64']" = 'float32',):
Hugging Face Text Generation Algorithm.
Arguments
device
: The device to use for the model. Defaults to None. On the worker side, will be set to the environment variableBITFOUNT_DEFAULT_TORCH_DEVICE
if specified, otherwise "cpu".early_stopping
: Whether to stop the generation as soon as there arenum_beams
complete candidates. Defaults to True.eos_token_id
: The id of the token to use as the last token for each sequence. If None (default), it will default to the eos_token_id of the tokenizer.max_length
: The maximum length of the sequence to be generated. Defaults to 50.min_new_tokens
: The minimum number of new tokens to add to the prompt. Defaults to 1.model_id
: The model id to use for text generation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.num_beams
: Number of beams for beam search. 1 means no beam search. Defaults to 1.num_return_sequences
: The number of sequence candidates to return for each input. Defaults to 1.pad_token_id
: The id of the token to use as padding token. If None (default), it will default to the pad_token_id of the tokenizer.prompt_format
: The format of the prompt as a string with a single{context}
placeholder which is where the pod's input will be inserted. For example,You are a Language Model. This is the context: {context}. Please summarize it.
. This only applies iftext_column_name
is provided, it is not used for dynamic prompting. Defaults to None.repetition_penalty
: The parameter for repetition penalty. 1.0 means no penalty. Defaults to 1.0.seed
: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.text_column_name
: The single column to query against. Should contain text for generation. If not provided, the algorithm must be used with a protocol which dynamically provides the text to be used for prompting.torch_dtype
: The torch dtype to use for the model. Defaults to "float32".
Attributes
class_name
: The name of the algorithm class.device
: The device to use for the model. Defaults to None. On the worker side, will be set to the environment variableBITFOUNT_DEFAULT_TORCH_DEVICE
if specified, otherwise "cpu".early_stopping
: Whether to stop the generation as soon as there arenum_beams
complete candidates. Defaults to True.eos_token_id
: The id of the token to use as the last token for each sequence. If None (default), it will default to the eos_token_id of the tokenizer.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).max_length
: The maximum length of the sequence to be generated. Defaults to 50.min_new_tokens
: The minimum number of new tokens to add to the prompt. Defaults to 1.model_id
: The model id to use for text generation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)num_beams
: Number of beams for beam search. 1 means no beam search. Defaults to 1.num_return_sequences
: The number of sequence candidates to return for each input. Defaults to 1.pad_token_id
: The id of the token to use as padding token. If None (default), it will default to the pad_token_id of the tokenizer.prompt_format
: The format of the prompt as a string with a single{context}
placeholder which is where the pod's input will be inserted. For example,You are a Language Model. This is the context: {context}. Please summarize it.
. This only applies iftext_column_name
is provided, it is not used for dynamic prompting. Defaults to None.repetition_penalty
: The parameter for repetition penalty. 1.0 means no penalty. Defaults to 1.0.seed
: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.text_column_name
: The single column to query against. Should contain text for generation. If not provided, the algorithm must be used with a protocol which dynamically provides the text to be used for prompting.torch_dtype
: The torch dtype to use for the model. Defaults to "float32".
Raises
ValueError
: Ifprompt_format
is provided withouttext_column_name
.ValueError
: Ifprompt_format
does not contain a single{context}
placeholder.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_generation._ModellerSide:
Returns the modeller side of the HuggingFaceTextGenerationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_text_generation._WorkerSide:
Returns the worker side of the HuggingFaceTextGenerationInference algorithm.
HuggingFaceZeroShotImageClassificationInference
class HuggingFaceZeroShotImageClassificationInference( model_id: str, image_column_name: str, candidate_labels: List[str], batch_size: int = 1, class_outputs: Optional[List[str]] = None, hypothesis_template: Optional[str] = None, seed: int = 42,):
Inference for pre-trained Hugging Face zero shot image classification models.
Arguments
batch_size
: The batch size for inference. Defaults to 1.candidate_labels
: The candidate labels for this image.class_outputs
: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.hypothesis_template
: The sentence used in conjunction with candidate_labels to attempt the image classification by replacing the placeholder with the candidate_labels. Then likelihood is estimated by using logits_per_image. Defaults to None.image_column_name
: The image column on which the inference should be done.model_id
: The model id to use for zero shot image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
Attributes
batch_size
: The batch size for inference. Defaults to 1.candidate_labels
: The candidate labels for this image.class_name
: The name of the algorithm class.class_outputs
: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).hypothesis_template
: The sentence used in conjunction with candidate_labels to attempt the image classification by replacing the placeholder with the candidate_labels. Then likelihood is estimated by using logits_per_image. Defaults to None.image_column_name
: The image column on which the inference should be done.model_id
: The model id to use for zero shot image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_zero_shot_image_classification._ModellerSide:
Returns the modeller side of the HuggingFaceZeroShotImageClassificationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_zero_shot_image_classification._WorkerSide:
Returns the worker side of the HuggingFaceZeroShotImageClassificationInference algorithm.
ModelEvaluation
class ModelEvaluation( *, model: _DistributedModelTypeOrReference, pretrained_file: Optional[Union[str, os.PathLike]] = None, project_id: Optional[str] = None,):
Algorithm for evaluating a model and returning metrics.
The metrics cannot currently be specified by the user.
Arguments
model
: The model to evaluate on remote data.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model
: The model to evaluate on remote data.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Ancestors
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.evaluate._ModellerSide:
Returns the modeller side of the ModelEvaluation algorithm.
worker
def worker( self, hub: BitfountHub, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.evaluate._WorkerSide:
Returns the worker side of the ModelEvaluation algorithm.
Arguments
hub
:BitfountHub
object to use for communication with the hub.
ModelInference
class ModelInference( *, model: _DistributedModelTypeOrReference, class_outputs: Optional[List[str]] = None, pretrained_file: Optional[Union[str, os.PathLike]] = None, project_id: Optional[str] = None,):
Algorithm for running inference on a model and returning the predictions.
This algorithm could potentially return the data unfiltered so should only be used when the other party is trusted.
Arguments
class_outputs
: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.model
: The model to infer on remote data.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Attributes
class_name
: The name of the algorithm class.class_outputs
: A list of strings corresponding to prediction outputs. If provided, the model will return a dataframe of results with the class outputs list elements as columns. Defaults to None.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model
: The model to infer on remote data.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Ancestors
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.inference._ModellerSide:
Returns the modeller side of the ModelInference algorithm.
worker
def worker( self, hub: BitfountHub, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.inference._WorkerSide:
Returns the worker side of the ModelInference algorithm.
Arguments
hub
:BitfountHub
object to use for communication with the hub.
ModelTrainingAndEvaluation
class ModelTrainingAndEvaluation( *, model: _DistributedModelTypeOrReference, pretrained_file: Optional[Union[str, os.PathLike]] = None, project_id: Optional[str] = None,):
Algorithm for training a model, evaluating it and returning metrics.
The metrics cannot currently be specified by the user.
Arguments
model
: The model to train and evaluate on remote data.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model
: The model to train and evaluate on remote data.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Ancestors
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.train_and_evaluate._ModellerSide:
Returns the modeller side of the ModelTrainingAndEvaluation algorithm.
worker
def worker( self, hub: BitfountHub, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.train_and_evaluate._WorkerSide:
Returns the worker side of the ModelTrainingAndEvaluation algorithm.
Arguments
hub
:BitfountHub
object to use for communication with the hub.
PrivateSqlQuery
class PrivateSqlQuery( *, query: str, epsilon: float, delta: float, column_ranges: dict, table: Optional[str] = None, db_schema: Optional[str] = None,):
Simple algorithm for running a SQL query on a table, with privacy.
The values provided for the privacy budget (i.e. epsilon and delta) will be applied individually to all columns included in the SQL query provided. If the total values of the epsilon and delta exceed the maximum allowed by the pod, the provided values will be reduced to the maximum values required to remain within the allowed privacy budget.
Arguments
column_ranges
: A dictionary of column names and their ranges.db_schema
: The name of the schema for a database connection. If not provided, it will be set to the default schema name for the database.delta
: The target delta to use for the privacy budget.epsilon
: The maximum epsilon to use for the privacy budget.query
: The SQL query to execute.table
: The target table name. For single table pod datasources, this will default to the pod name.
Attributes
class_name
: The name of the algorithm class.column_ranges
: A dictionary of column names and their ranges.db_schema
: The name of the schema for a database connection. If not provided, it will be set to the default schema name for the database.delta
: The target delta to use for the privacy budget.epsilon
: The maximum epsilon to use for the privacy budget.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)query
: The SQL query to execute.table
: The target table name. For single table pod datasources, this will default to the pod name.
Raises
DatabaseSchemaNotFoundError
: If a non-existent db_schema name is provided.PrivateSqlError
: If there is an error executing the private SQL query (e.g. DP misconfiguration or bad query specified).ValueError
: If a pod identifier is not supplied, or if a join is attempted.
Ancestors
- BaseAlgorithmFactory
- bitfount.federated.mixins._ModellessAlgorithmMixIn
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
execute
def execute( self, pod_identifiers: List[str], username: Optional[str] = None, bitfounthub: Optional[BitfountHub] = None, ms_config: Optional[MessageServiceConfig] = None, message_service: Optional[_MessageService] = None, pod_public_key_paths: Optional[Mapping[str, Path]] = None, identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE, private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None, idp_url: Optional[str] = None, require_all_pods: bool = False, aggregator: Optional[_BaseAggregatorFactory] = None, project_id: Optional[str] = None,) ‑> List[pd.DataFrame]:
Execute ResultsOnly compatible algorithm.
Syntactic sugar to allow the modeller to call .execute(...)
on
ResultsOnly compatible algorithms.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.private_sql_query._ModellerSide:
Returns the modeller side of the PrivateSqlQuery algorithm.
Arguments
- **
**kwargs
**: Additional keyword arguments to pass to the modeller side.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.private_sql_query._WorkerSide:
Returns the worker side of the PrivateSqlQuery algorithm.
Arguments
**kwargs
: Additional keyword arguments to pass to the worker side.hub
must be one of these keyword arguments which provides aBitfountHub
instance.
SqlQuery
class SqlQuery(*, query: str, table: Optional[str] = None):
Simple algorithm for running a SQL query on a table.
The default table for single-table datasources is the pod identifier without the
username, in between backticks(``). Please ensure your SQL query operates on
that table. The table name should be put inside backticks(``) in the query
statement, to make sure it is correctly parsed e.g. SELECT MAX(G) AS MAX_OF_G FROM `df`
. This is the standard quoting mechanism used by MySQL (and also
included in SQLite).
If you are using a multi-table datasource, ensure that your SQL query syntax matches the syntax required by the Pod database backend.
Arguments
query
: The SQL query to execute.table
: The target table name. For single table pod datasources, this will default to the pod name.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)query
: The SQL query to execute.table
: The target table name. For single table pod datasources, this will default to the pod name.
Ancestors
- BaseAlgorithmFactory
- bitfount.federated.mixins._ModellessAlgorithmMixIn
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
- bitfount.federated.types._DataLessAlgorithm
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
execute
def execute( self, pod_identifiers: List[str], username: Optional[str] = None, bitfounthub: Optional[BitfountHub] = None, ms_config: Optional[MessageServiceConfig] = None, message_service: Optional[_MessageService] = None, pod_public_key_paths: Optional[Mapping[str, Path]] = None, identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE, private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None, idp_url: Optional[str] = None, require_all_pods: bool = False, aggregator: Optional[_BaseAggregatorFactory] = None, project_id: Optional[str] = None,) ‑> List[pd.DataFrame]:
Execute ResultsOnly compatible algorithm.
Syntactic sugar to allow the modeller to call .execute(...)
on
ResultsOnly compatible algorithms.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.sql_query._ModellerSide:
Returns the modeller side of the SqlQuery algorithm.
worker
def worker(self, **kwargs: Any) ‑> bitfount.federated.algorithms.sql_query._WorkerSide:
Returns the worker side of the SqlQuery algorithm.
TIMMFineTuning
class TIMMFineTuning( model_id: str, schema: Optional[BitfountSchema] = None, datastructure: Optional[DataStructure] = None, image_column_name: Optional[str] = None, target_column_name: Optional[str] = None, labels: Optional[List[str]] = None, args: Optional[TIMMTrainingConfig] = None, batch_transformations: Union[List[Union[str, Dict[str, Any]]], Dict[Literal['train', 'validation'], List[Union[str, Dict[str, Any]]]], ForwardRef(None)] = None, return_weights: bool = False, save_path: Union[str, os.PathLike, ForwardRef(None)] = None,):
HuggingFace TIMM Fine Tuning Algorithm.
Arguments
- **
**kwargs
**: Additional keyword arguments passed to the Worker side. args
: The training configuration.batch_transformations
: The batch transformations to be applied to the batches. Can be a list of strings or a list of dictionaries, which will be applied to both training and validation, or a dictionary with keys "train" and "validation" mapped to a list of strings or a list of dictionaries, specifying the batch transformations to be applied at each individual step. They are only applied ifdatastructure
is not passed. Defaults to apply DEFAULT_IMAGE_TRANSFORMATIONS to both training and validation.datastructure
: The datastructure relating to the dataset to be trained on. Defaults to None.image_column_name
: The column name of the image column used in training. Defaults to None.labels
: The labels of the target column. Defaults to None.model_id
: The Hugging Face model ID.return_weights
: Whether to return the weights of the model.save_path
: The path to save the model to.schema
: The schema of the dataset to be trained on. Defaults to None.target_column_name
: The column name of the target column. Defaults to None.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.timm_fine_tuning._ModellerSide:
Returns the modeller side of the TIMMFineTuning algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.timm_fine_tuning._WorkerSide:
Returns the worker side of the TIMMFineTuning algorithm.
TIMMInference
class TIMMInference( model_id: str, image_column_name: str, num_classes: Optional[int] = None, batch_transformations: Optional[List[Dict[str, Dict[str, Any]]]] = None, batch_size: int = 1, checkpoint_path: Union[str, os.PathLike, ForwardRef(None)] = None, class_outputs: Optional[List[str]] = None,):
HuggingFace TIMM Inference Algorithm.
Arguments
checkpoint_path
: The path to a checkpoint file local to the Pod. Defaults to None.class_outputs
: A list of explict class outputs to use as labels. Defaults to None.image_column_name
: The column name of the image paths.model_id
: The model id to use from the Hugging Face Hub.num_classes
: The number of classes in the model. Defaults to None.
Attributes
checkpoint_path
: The path to a checkpoint file local to the Pod. Defaults to None.class_name
: The name of the algorithm class.class_outputs
: A list of explict class outputs to use as labels. Defaults to None.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).image_column_name
: The column name of the image paths.model_id
: The model id to use from the Hugging Face Hub.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)num_classes
: The number of classes in the model. Defaults to None.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.timm_inference._ModellerSide:
Returns the modeller side of the TIMMInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.timm_inference._WorkerSide:
Returns the worker side of the TIMMInference algorithm.
_BaseModelAlgorithmFactory
class _BaseModelAlgorithmFactory( *, model: _DistributedModelTypeOrReference, pretrained_file: Optional[Union[str, os.PathLike]] = None, project_id: Optional[str] = None,):
Base factory for algorithms involving an underlying model.
Arguments
model
: The model for the federated algorithm.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Attributes
model
: The model for the federated algorithm.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
model_schema : Dict[str, Any]
- Returns underlying model Schema.