Skip to content

Processor

TransformersQAProcessor

class fastnn.processors.nlp.question_answering.TransformersQAProcessor(model_name_or_path='distilbert-base-cased-distilled-squad')

Question Answering Data Processor. Use this class to generate tensor inputs from human legible text/string data. This class can be used with a majority of the Bert architecture transformer models with the span-based extractive, Question Answering predictive head from Hugging Face.

Usage:

>>> processor = TRansformersQAProcessor(model_name_or_path="distilbert-base-cased-distilled-squad")
>>> processor.process(query=["string"], context["string"])

Parameters:

 model_name_or_path* - String defining HF question answering model/tokenizer's name
process(self, query, context, max_seq_length=512, doc_stride=128, max_query_length=64)

Generate torch Dataset object from query/context string pairs using specified tokenizer from HF. This provides clear tensor input representations for compatible models.

Returns a tuple Dataset and matching SquadFeatures

  • query - List of query strings, must be same length as context
  • context - List of context strings, must be same length as query
  • max_seq_length - Maximum context token length. Check model configs to see max sequence length the model was trained with
  • doc_stride - Number of token strides to take when splitting up context into chunks of size max_seq_length
  • max_query_length - Maximum token length for queries
process_batch(self, query, context, mini_batch_size=8, max_seq_length=512, doc_stride=128, max_query_length=64, use_gpu=False)

Generate torch DataLoader object from query/context string pairs using specified tokenizer from HF. This provides clear tensor input representations for compatible models in an easy to use batch

Returns a tuple of (List[SquadExample], List[SquadFeatures], DataLoader)

  • query - List of query strings, must be same length as context
  • context - List of context strings, must be same length as query
  • mini_batch_size - Batch size for inference
  • max_seq_length - Maximum context token length. Check model configs to see max sequence length the model was trained with
  • doc_stride - Number of token strides to take when splitting up context into chunks of size max_seq_length
  • max_query_length - Maximum token length for queries
  • use_gpu - Bool for using gpu or cpu. If set True but no gpu devices available, model will default to using cpu
process_output(self, outputs, examples, features, n_best_size=5, max_answer_length=10, do_lower_case=False, verbose_logging=False, version_2_with_negative=False, null_score_diff_threshold=0.0)
process_output_batch(self, outputs, examples, features, n_best_size=5, max_answer_length=64, do_lower_case=False, verbose_logging=False, version_2_with_negative=False, null_score_diff_threshold=0.0)

Process output of Transformers QA model into human legible results.

  • outputs - List of batch output tensors from a model's forward pass
  • examples - List of SquadExample objects for each original context/query pair used as input. This is returned from the built-in process() or process_batch() methods
  • features - List of SquadFeature objects for each context/query pair over the original doc_stride lengths. This is also returned from the built-in process() or process_batch() methods
  • n_best_size - Number of top n results you want
  • max_answer_length - Maximum token length for answers that are returned
  • do_lower_case - Set as True if using uncased QA models
  • verbose_logging - Set True if you want prediction verbose loggings
  • version_2_with_negative - Set as True if using QA model with SQUAD2.0
  • null_score_diff_threshold - Threshold for predicting null(no answer) in Squad 2.0 Model. Default is 0.0. Raise this if you want fewer null answers

TransformersTokenTaggingProcessor

class fastnn.processors.nlp.token_tagging.TransformersTokenTaggingProcessor(model_name_or_path='dbmdz/bert-large-cased-finetuned-conll03-english', label_strings=['O', 'B-MISC', 'I-MISC', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC'])

Token Tagging Data Processor. Use this class to generate tensor inputs from human legible text/string data. This class can be used with a majority of the Bert architecture transformer models with a token-level predictive head for token classification from Hugging Face.

Usage:

>>> processor = TransformersTokenTaggingProcessor(model_name_or_path="dbmdz/bert-large-cased-finetuned-conll03-english")
>>> processor.process(text=["string"])

Parameters:

 model_name_or_path - String defining HF token tagging model/tokenizer's name
 label_strings - List of strings that specify label strings with index as key for this specific processor
process(self, text, max_seq_length=512)

Generate torch Dataset object from query/context string pairs using specified tokenizer from HF. This provides clear tensor input representations for compatible models.

Returns a Dataset

  • text - List of text strings
  • max_seq_length - Maximum context token length. Check model configs to see max sequence length the model was trained with
process_batch(self, text, mini_batch_size=8, max_seq_length=512, use_gpu=False)

Generate torch DataLoader object from text strings using specified tokenizer from HF. This provides clear tensor input representations for compatible models in an easy to use batch

Returns a DataLoader

  • text - List of text strings
  • mini_batch_size - Batch size for inference
  • max_seq_length - Maximum context token length. Check model configs to see max sequence length the model was trained with
  • use_gpu - Bool for using gpu or cpu. If set True but no gpu devices available, model will default to using cpu
process_output(self, outputs)
process_output_batch(self, outputs)

Process output of Transformers NER model

  • outputs - List of batch output tensors from a model's forward pass

ObjectDetectionProcessor

class fastnn.processors.cv.object_detection.ObjectDetectionProcessor(label_strings)

Object Detection processor dealing with image files or 3xHxW formatted images and boxes, scores, labels out processing. Since most resizing and padding transforms are done by the object detection models in PyTorch, datasets and dataloaders willl generate batches of images as lists.

Usage:

>>> processor = ObjectDetectionProcessor()
>>> processor.process(file_paths=["file_path.png"])

Parameters:
 label_strings* - List of strings that specify label strings with index as key for this specific processor
draw_bounding_boxes(self, image, boxes, labels=None, colors=None, width=1, font='arial.ttf', font_size=10)

Added and modified from TorchVision utils. Draws bounding boxes on given image. The values of the input image should be uint8 between 0 and 255. Args: image (Tensor): Tensor of shape (C x H x W) bboxes (Tensor): Tensor of size (N, 4) containing bounding boxes in (xmin, ymin, xmax, ymax) format. Note that the boxes are absolute coordinates with respect to the image. In other words: 0 <= xmin < xmax < W and 0 <= ymin < ymax < H. labels (List[str]): List containing the labels of bounding boxes. colors (List[Union[str, Tuple[int, int, int]]]): List containing the colors of bounding boxes. The colors can be represented as str or Tuple[int, int, int]. width (int): Width of bounding box. font (str): A filename containing a TrueType font. If the file is not found in this filename, the loader may also search in other directories, such as the fonts/ directory on Windows or /Library/Fonts/, /System/Library/Fonts/ and ~/Library/Fonts/ on macOS. font_size (int): The requested font size in points.

process(self, dir_path, transforms=ConvertImageDtype())

Generate torch Dataset object from list of file paths or image Tensors. This provides clear tensor input representations for compatible models.

Returns a Dataset

  • dir_path - String path to directory of images you'd like to process
process_batch(self, dir_path, transforms=ConvertImageDtype(), mini_batch_size=8, use_gpu=False)

Generate torch Dataloader object from data directory path. This provides clear tensor input representations for compatible models.

Returns a Dataloader

  • dir_path - String path to directory of images you'd like to process
  • mini_batch_size - Batch size for inference
  • use_gpu - Bool for using gpu or cpu. If set True but no gpu devices available, model will default to using cpu
process_output(self)
process_output_batch(self, outputs, dataset)

Process output of object detection model into human legible results. Outputs from FasterRCNNModule

Returns batched results of list of list of tuples containing boxed images in tensor and numpy format

  • outputs - List of batch output tensors from a model's forward pass
  • dataset - Corresponding dataset with originial images matched with model outputs