hugging_face_image_text_generation

Hugging Face Image-Text Generation Algorithm.

Classes

HuggingFaceImageTextGenerationInference

class HuggingFaceImageTextGenerationInference(    datastructure: DataStructure,    model_id: str,    max_new_tokens: int = 2000,    seed: int = 42,    batch_size: int = 1,    torch_dtype: "Literal['bfloat16', 'float16', 'float32', 'float64']" = 'bfloat16',    prompt_template: Optional[str] = None,    access_token: Optional[str] = None,):

Inference for pre-trained Hugging Face image-text-to-text generation models.

This algorithm processes images with text context using vision-language models like MedGemma. It takes an image and a context (e.g., clinical notes) as input and generates text responses based on the visual content and the context.

The datastructure should include two columns: an image column and a context column. If a prompt_template is provided, the context is inserted into the template to create the final prompt. Otherwise, the context is used as-is.

Arguments

****kwargs**: Additional keyword arguments.
batch_size: The batch size for inference. Defaults to 1.
datastructure: The datastructure to use for the algorithm.
max_new_tokens: The maximum number of new tokens to generate. Defaults to 2000.
model_id: The model id to use for image-text generation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models compatible with the "image-text-to-text" pipeline task (e.g., google/medgemma-1.5-4b-it).
prompt_template: Optional template string for formatting the context column value into a prompt. Use {context} as a placeholder for where the context will be inserted. For example: "Describe this image given the notes: {context}". Defaults to None (context column value used as the prompt directly).
seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
torch_dtype: The torch dtype to use for the model. Defaults to "bfloat16".

Attributes

batch_size: The batch size for inference.
class_name: The name of the algorithm class.
fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshmallow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
max_new_tokens: The maximum number of new tokens to generate.
model_id: The model id to use for image-text generation inference.
nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
prompt_template: Optional template string to format the context into a prompt.
seed: Sets the seed of the algorithm.
torch_dtype: The torch dtype to use for the model.

Raises

ValueError: If prompt_template is provided without a {context} placeholder.

Ancestors

Variables

static fields_dict : ClassVar[T_FIELDS_DICT]

Methods

create

def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, *, context: ProtocolContext, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the algorithm.

worker

def worker(    self,    *,    context: ProtocolContext,    **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_text_generation._WorkerSide:

Returns the worker side of the algorithm.

Classes​