Skip to main content

hugging_face_image_text_generation

Hugging Face Image-Text Generation Algorithm.

Classes

HuggingFaceImageTextGenerationInference

class HuggingFaceImageTextGenerationInference(    datastructure: DataStructure,    model_id: str,    max_new_tokens: int = 2000,    seed: int = 42,    batch_size: int = 1,    torch_dtype: "Literal['bfloat16', 'float16', 'float32', 'float64']" = 'bfloat16',    prompt_template: Optional[str] = None,    access_token: Optional[str] = None,):

Inference for pre-trained Hugging Face image-text-to-text generation models.

This algorithm processes images with text context using vision-language models like MedGemma. It takes an image and a context (e.g., clinical notes) as input and generates text responses based on the visual content and the context.

The datastructure should include two columns: an image column and a context column. If a prompt_template is provided, the context is inserted into the template to create the final prompt. Otherwise, the context is used as-is.

Arguments

  • ****kwargs**: Additional keyword arguments.
  • batch_size: The batch size for inference. Defaults to 1.
  • datastructure: The datastructure to use for the algorithm.
  • max_new_tokens: The maximum number of new tokens to generate. Defaults to 2000.
  • model_id: The model id to use for image-text generation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models compatible with the "image-text-to-text" pipeline task (e.g., google/medgemma-1.5-4b-it).
  • prompt_template: Optional template string for formatting the context column value into a prompt. Use {context} as a placeholder for where the context will be inserted. For example: "Describe this image given the notes: {context}". Defaults to None (context column value used as the prompt directly).
  • seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.
  • torch_dtype: The torch dtype to use for the model. Defaults to "bfloat16".

Attributes

  • batch_size: The batch size for inference.
  • class_name: The name of the algorithm class.
  • fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshmallow field type. (e.g. fields_dict = {"class_name": fields.Str()}).
  • max_new_tokens: The maximum number of new tokens to generate.
  • model_id: The model id to use for image-text generation inference.
  • nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields = {"datastructure": datastructure.registry})
  • prompt_template: Optional template string to format the context into a prompt.
  • seed: Sets the seed of the algorithm.
  • torch_dtype: The torch dtype to use for the model.

Raises

  • ValueError: If prompt_template is provided without a {context} placeholder.

Variables

  • static fields_dict : ClassVar[T_FIELDS_DICT]

Methods


create

def create(self, role: Union[str, Role], **kwargs: Any)> Any:

Create an instance representing the role specified.

modeller

def modeller(    self, *, context: ProtocolContext, **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:

Returns the modeller side of the algorithm.

worker

def worker(    self,    *,    context: ProtocolContext,    **kwargs: Any,)> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_text_generation._WorkerSide:

Returns the worker side of the algorithm.