hugging_face_image_text_generation
Hugging Face Image-Text Generation Algorithm.
Classes
HuggingFaceImageTextGenerationInference
class HuggingFaceImageTextGenerationInference( datastructure: DataStructure, model_id: str, max_new_tokens: int = 2000, seed: int = 42, batch_size: int = 1, torch_dtype: "Literal['bfloat16', 'float16', 'float32', 'float64']" = 'bfloat16', prompt_template: Optional[str] = None, access_token: Optional[str] = None,):Inference for pre-trained Hugging Face image-text-to-text generation models.
This algorithm processes images with text context using vision-language models like MedGemma. It takes an image and a context (e.g., clinical notes) as input and generates text responses based on the visual content and the context.
The datastructure should include two columns: an image column and a context column. If a prompt_template is provided, the context is inserted into the template to create the final prompt. Otherwise, the context is used as-is.
Arguments
- **
**kwargs**: Additional keyword arguments. batch_size: The batch size for inference. Defaults to 1.datastructure: The datastructure to use for the algorithm.max_new_tokens: The maximum number of new tokens to generate. Defaults to 2000.model_id: The model id to use for image-text generation inference. The model id is of a pretrained model hosted inside a model repo on huggingface.co. Accepts models compatible with the "image-text-to-text" pipeline task (e.g., google/medgemma-1.5-4b-it).prompt_template: Optional template string for formatting the context column value into a prompt. Use{context}as a placeholder for where the context will be inserted. For example:"Describe this image given the notes: {context}". Defaults to None (context column value used as the prompt directly).seed: Sets the seed of the algorithm. For reproducible behavior it defaults to 42.torch_dtype: The torch dtype to use for the model. Defaults to "bfloat16".
Attributes
batch_size: The batch size for inference.class_name: The name of the algorithm class.fields_dict: A dictionary mapping all attributes that will be serialized in the class to their marshmallow field type. (e.g. fields_dict ={"class_name": fields.Str()}).max_new_tokens: The maximum number of new tokens to generate.model_id: The model id to use for image-text generation inference.nested_fields: A dictionary mapping all nested attributes to a registry that contains class names mapped to the respective classes. (e.g. nested_fields ={"datastructure": datastructure.registry})prompt_template: Optional template string to format the context into a prompt.seed: Sets the seed of the algorithm.torch_dtype: The torch dtype to use for the model.
Raises
ValueError: Ifprompt_templateis provided without a{context}placeholder.
Variables
- static
fields_dict : ClassVar[T_FIELDS_DICT]
Methods
create
def create(self, role: Union[str, Role], **kwargs: Any) ‑> Any:Create an instance representing the role specified.
modeller
def modeller( self, *, context: ProtocolContext, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.base._HFModellerSide:Returns the modeller side of the algorithm.
worker
def worker( self, *, context: ProtocolContext, **kwargs: Any,) ‑> bitfount.federated.algorithms.hugging_face_algorithms.hugging_face_image_text_generation._WorkerSide:Returns the worker side of the algorithm.