algorithms
Algorithms for remote processing of data.
Federated algorithm plugins can also be imported from this package.
Module
Submodules
- bitfount.federated.algorithms.base - Base classes for all algorithms.
- bitfount.federated.algorithms.column_avg - Column averaging algorithm.
- bitfount.federated.algorithms.compute_intersection_rsa - RSA Blinding Private Set intersection.
- bitfount.federated.algorithms.csv_report_algorithm - Algorithm for outputting results to CSV on the pod-side.
- bitfount.federated.algorithms.hugging_face_algorithms - Algorithms for remote Hugging Face models.
- bitfount.federated.algorithms.model_algorithms - Algorithms for remote/federated model training on data.
- bitfount.federated.algorithms.private_sql_query - Private SQL query algorithm.
- bitfount.federated.algorithms.sql_query - SQL query algorithm.
Classes
BaseAlgorithmFactory
class BaseAlgorithmFactory(**kwargs: Any):
Base algorithm factory from which all other algorithms must inherit.
Attributes
class_name
: The name of the algorithm class.
Subclasses
- ColumnAverage
- ComputeIntersectionRSA
- CSVReportAlgorithm
- HuggingFaceImageClassificationInference
- HuggingFaceImageSegmentationInference
- HuggingFacePerplexityEvaluation
- HuggingFaceTextClassificationInference
- HuggingFaceTextGenerationInference
- HuggingFaceZeroShotImageClassificationInference
- TIMMFineTuning
- TIMMInference
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- PrivateSqlQuery
- SqlQuery
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
CSVReportAlgorithm
class CSVReportAlgorithm( save_path: Optional[Union[str, os.PathLike]] = None, original_cols: Optional[List[str]] = None, filter: Optional[List[ColumnFilter]] = None, **kwargs: Any,):
Algorithm for generating the CSV results reports.
Arguments
save_path
: The folder path where the csv report should be saved. The CSV report will have the same name as the taskID.original_cols
: The tabular columns from the datasource to include in the report. If not specified it will include all tabular columns from the datasource.filter
: A list ofColumnFilter
instances on which we will filter the data on. Defaults to None. If supplied, columns will be added to the output csv indicating the records that match the specified criteria. If more than oneColumnFilter
is given, and additional column will be added to the output csv indicating the datapoints that match all given criteria (as well as the individual matches)
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.csv_report_algorithm._ModellerSide:
Modeller-side of the algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.csv_report_algorithm._WorkerSide:
Worker-side of the algorithm.
ColumnAverage
class ColumnAverage(*, field: str, table_name: str):
Simple algorithm for taking the arithmetic mean of a column in a table.
Arguments
field
: The name of the column to take the mean of.table_name
: The name of the table on which column average will be performed on.
Attributes
class_name
: The name of the algorithm class.field
: The name of the column to take the mean of.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)table_name
: The name of the table on which column average will be performed on.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.column_avg._ModellerSide:
Returns the modeller side of the ColumnAverage algorithm.
worker
def worker(self, **kwargs: Any) ‑> bitfount.federated.algorithms.column_avg._WorkerSide:
Returns the worker side of the ColumnAverage algorithm.
ComputeIntersectionRSA
class ComputeIntersectionRSA( datasource_columns: Optional[List[str]] = None, datasource_table: Optional[str] = None, pod_columns: Optional[List[str]] = None, pod_table: Optional[str] = None,):
Algorithm for computing the private set intersection with RSA blinding.
This algorithm does not work with iterable datasources such as multi-table databases.
Arguments
datasource_columns
: The modeller's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.datasource_table
: The modeller's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.pod_columns
: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.pod_table
: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pod_columns
: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.pod_table
: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.
Ancestors
- BaseAlgorithmFactory
- bitfount.federated.mixins._PSIAlgorithmsMixIn
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
- bitfount.federated.types._DataLessAlgorithm
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
execute
def execute( self: _PSICompatibleAlgoFactory_, pod_identifiers: List[str], datasource: BaseSource, datasource_columns: Optional[List[str]] = None, datasource_table: Optional[str] = None, pod_columns: Optional[List[str]] = None, pod_table: Optional[str] = None, username: Optional[str] = None, bitfounthub: Optional[BitfountHub] = None, ms_config: Optional[MessageServiceConfig] = None, message_service: Optional[_MessageService] = None, pod_public_key_paths: Optional[Mapping[str, Path]] = None, identity_verification_method: IdentityVerificationMethod = IdentityVerificationMethod.OIDC_DEVICE_CODE, private_key_or_file: Optional[Union[RSAPrivateKey, Path]] = None, idp_url: Optional[str] = None, require_all_pods: bool = False, project_id: Optional[str] = None,) ‑> List[pd.DataFrame]:
Execute PSI compatible algorithm.
Syntactic sugar to allow the modeller to call .intersect(...)
on
PrivateSetIntersection compatible algorithms.
Arguments
pod_identifiers
: The pod identifier for the Private Set Intersection as a list.datasource
: The modeller datasource on which the Private Set Intersection will be performed on.datasource_columns
: The modeller's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.datasource_table
: The modeller's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.pod_columns
: The pod's columns from their datasource on which the private set intersection will be computed as a list of strings. Defaults to None.pod_table
: The pod's table from their datasource, if the datasource is multitable, on which the private set intersection will be computed as a string. Defaults to None.username
: The modeller's username. Defaults to None.bitfounthub
: The bifount hub instance. Defaults to None.ms_config
: Configuration for the message service. Defaults to None.message_service
: The message service to use. Defaults to None.pod_public_key_paths
: The path for the pod public key. Used when authentication with the pod is done with the public key. Defaults to None.identity_verification_method
: The identity verification method to use. Defaults to None.private_key_or_file
: The private key or path to the private key. Defaults to None.idp_url
: The identity provider url. Defaults to None.require_all_pods
: Whether all pods are needed for the algorithm. Defaults to False.
Returns The records from the modeller dataset which were found in the intersection.
Raises
PSIMultiplePodsError
: Ifexecute
is called on multiple pods.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.compute_intersection_rsa._ModellerSide:
Returns the modeller side of the SqlQuery algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.compute_intersection_rsa._WorkerSide:
Returns the worker side of the SqlQuery algorithm.
FederatedModelTraining
class FederatedModelTraining( *, model: _DistributedModelTypeOrReference, modeller_checkpointing: bool = True, checkpoint_filename: Optional[str] = None, pretrained_file: Optional[Union[str, os.PathLike]] = None, project_id: Optional[str] = None,):
Algorithm for training a model remotely and returning its updated parameters.
This algorithm is designed to be compatible with the FederatedAveraging
protocol.
Arguments
model
: The model to train on remote data.pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Attributes
checkpoint_filename
: The filename for the last checkpoint. Defaults to the task id and the last iteration number, i.e.,{taskid}-iteration-{iteration_number}.pt
.class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model
: The model to train on remote data.modeller_checkpointing
: Whether to save the last checkpoint on the modeller side. Defaults to True.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)pretrained_file
: A file path or a string containing a pre-trained model. Defaults to None.
Ancestors
- bitfount.federated.algorithms.model_algorithms.base._BaseModelAlgorithmFactory
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
nested_fields : ClassVar[Dict[str, Mapping[str, Any]]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.federated_training._ModellerSide:
Returns the modeller side of the FederatedModelTraining algorithm.
worker
def worker( self, hub: BitfountHub, **kwargs: Any,) ‑> bitfount.federated.algorithms.model_algorithms.federated_training._WorkerSide:
Returns the worker side of the FederatedModelTraining algorithm.
Arguments
hub
:BitfountHub
object to use for communication with the hub.
HuggingFaceImageClassificationInference
class HuggingFaceImageClassificationInference( model_id: str, image_column_name: str, seed: int = 42, apply_softmax_to_predictions: bool = True, batch_size: int = 1, top_k: int = 5,):
Inference for pre-trained Hugging Face image classification models.
Arguments
batch_size
: The batch size for inference. Defaults to 1.image_column_name
: The image column on which the inference should be done.model_id
: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.top_k
: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.
Attributes
batch_size
: The batch size for inference. Defaults to 1.class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).image_column_name
: The image column on which the inference should be done.model_id
: The model id to use for image classification inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.top_k
: The number of top labels that will be returned by the pipeline. If the provided number is higher than the number of labels available in the model configuration, it will default to the number of labels. Defaults to 5.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._ModellerSide:
Returns the modeller side of the HuggingFaceImageClassificationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_classification._WorkerSide:
Returns the worker side of the HuggingFaceImageClassification algorithm.
HuggingFaceImageSegmentationInference
class HuggingFaceImageSegmentationInference( model_id: str, image_column_name: str, alpha: float = 0.3, batch_size: int = 1, dataframe_output: bool = False, mask_threshold: float = 0.5, overlap_mask_area_threshold: float = 0.5, save_path: Union[str, os.PathLike] = PosixPath('.'), seed: int = 42, subtask: Optional[_Subtask] = None, threshold: float = 0.9,):
Inference for pre-trained Hugging Face image segmentation models.
Perform segmentation (detect masks & classes) in the image(s) passed as inputs.
Arguments
alpha
: the alpha for the mask overlay.batch_size
: The batch size for inference. Defaults to 1.dataframe_output
: Whether to output the prediction results in a dataframe format. Defaults toFalse
.image_column_name
: The image column on which the inference should be done.mask_threshold
: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.model_id
: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.overlap_mask_area_threshold
: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.save_path
: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.subtask
: Segmentation task to be performed, choose [semantic
,instance
andpanoptic
] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order:panoptic
,instance
,semantic
.threshold
: Probability threshold to filter out predicted masks. Defaults to 0.9.
Attributes
alpha
: the alpha for the mask overlay.batch_size
: The batch size for inference. Defaults to 1.class_name
: The name of the algorithm class.dataframe_output
: Whether to output the prediction results in a dataframe format. Defaults toFalse
.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).image_column_name
: The image column on which the inference should be done.mask_threshold
: Threshold to use when turning the predicted masks into binary values. Defaults to 0.5.model_id
: The model id to use for image segmentation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts resnet models.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)overlap_mask_area_threshold
: Mask overlap threshold to eliminate small, disconnected segments. Defaults to 0.5.save_path
: The folder path where the images with masks drawn on them should be saved. Defaults to the current working directory.seed
: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.subtask
: Segmentation task to be performed, choose [semantic
,instance
andpanoptic
] depending on model capabilities. If not set, the pipeline will attempt to resolve in the following order:panoptic
,instance
,semantic
.threshold
: Probability threshold to filter out predicted masks. Defaults to 0.9.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._ModellerSide:
Returns the modeller side of the HuggingFaceImageSegmentationInference algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_segmentation._WorkerSide:
Returns the worker side of the HuggingFaceImageSegmentationInference algorithm.
HuggingFacePerplexityEvaluation
class HuggingFacePerplexityEvaluation( model_id: str, text_column_name: str, stride: int = 512, seed: int = 42,):
Hugging Face Perplexity Algorithm.
Arguments
model_id
: The model id to use for evaluating its perplexity. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.seed
: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.stride
: Sets the stride of the algorithm. Defaults to 512.text_column_name
: The single column to query against. Should contain text for generation.
Attributes
class_name
: The name of the algorithm class.fields_dict
: A dictionary mapping all attributes that will be serialized in the class to their marshamllow field type. (e.g. fields_dict ={"class_name": fields.Str()}
).model_id
: The model id to use for evaluation. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models with a causal language modeling head.nested_fields
: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry}
)seed
: Sets the seed of the algorithm. For reproducible behaviour it defaults to 42.stride
: Sets the stride of the algorithm. Defaults to 512.text_column_name
: The single column to query against. Should contain text for generation.
Ancestors
- BaseAlgorithmFactory
- abc.ABC
- bitfount.federated.roles._RolesMixIn
- bitfount.types._BaseSerializableObjectMixIn
Variables
- static
fields_dict : ClassVar[Dict[str, marshmallow.fields.Field]]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:
Create an instance representing the role specified.
modeller
def modeller( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._ModellerSide:
Returns the modeller side of the HuggingFacePerplexityEvaluation algorithm.
worker
def worker( self, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_perplexity._WorkerSide:
Returns the worker side of the HuggingFacePerplexityEvaluation algorithm.
HuggingFaceTextClassificationInference
class HuggingFaceTextClassificationInference( model_id: str, target_column_name: