tasks.processors

The processors exist in Pythia to make data processing pipelines in various datasets as similar as possible while allowing code reuse.

The processors also help maintain proper abstractions to keep only what matters inside the dataset’s code. This allows us to keep the dataset get_item logic really clean and no need about maintaining opinions about data type. Processors can work on both images and text due to their generic structure.

To create a new processor, follow these steps:

  1. Inherit the BaseProcessor class.
  2. Implement _call function which takes in a dict and returns a dict with same keys preprocessed as well as any extra keys that need to be returned.
  3. Register the processor using @registry.register_processor('name') to registry where ‘name’ will be used to refer to your processor later.

In processor’s config you can specify preprocessor option to specify different kind of preprocessors you want in your dataset.

Let’s break down processor’s config inside a dataset (VQA2.0) a bit to understand different moving parts.

Config:

task_attributes:
    vqa:
        datasets:
        - vqa2
        dataset_attributes:
            vqa2:
                processors:
                  text_processor:
                    type: vocab
                    params:
                      max_length: 14
                      vocab:
                        type: intersected
                        embedding_name: glove.6B.300d
                        vocab_file: vocabs/vocabulary_100k.txt
                      answer_processor:
                        type: vqa_answer
                        params:
                          num_answers: 10
                          vocab_file: vocabs/answers_vqa.txt
                          preprocessor:
                            type: simple_word
                            params: {}

BaseDataset will init the processors and they will available inside your dataset with same attribute name as the key name, for e.g. text_processor will be available as self.text_processor inside your dataset. As is with every module in Pythia, processor also accept a ConfigNode with a type and params attributes. params defined the custom parameters for each of the processors. By default, processor initialization process will also init preprocessor attribute which can be a processor config in itself. preprocessor can be then be accessed inside the processor’s functions.

Example:

from pythia.common.registry import registry
from pythia.tasks.processors import BaseProcessor


class MyProcessor(BaseProcessor):
    def __init__(self, config, *args, **kwargs):
        return

    def __call__(self, item, *args, **kwargs):
        text = item['text']
        text = [t.strip() for t in text.split(" ")]
        return {"text": text}
class pythia.tasks.processors.BBoxProcessor(config, *args, **kwargs)[source]

Generates bboxes in proper format. Takes in a dict which contains “info” key which is a list of dicts containing following for each of the the bounding box

Example bbox input:

{
    "info": [
        {
            "bounding_box": {
                "top_left_x": 100,
                "top_left_y": 100,
                "width": 200,
                "height": 300
            }
        },
        ...
    ]
}

This will further return a Sample in a dict with key “bbox” with last dimension of 4 corresponding to “xyxy”. So sample will look like following:

Example Sample:

Sample({
    "coordinates": torch.Size(n, 4),
    "width": List[number], # size n
    "height": List[number], # size n
    "bbox_types": List[str] # size n, either xyxy or xywh.
    # currently only supports xyxy.
})
class pythia.tasks.processors.BaseProcessor(config, *args, **kwargs)[source]

Every processor in Pythia needs to inherit this class for compatability with Pythia. End user mainly needs to implement __call__ function.

Parameters:config (ConfigNode) – Config for this processor, containing type and params attributes if available.
class pythia.tasks.processors.CaptionProcessor(config, *args, **kwargs)[source]

Processes a caption with start, end and pad tokens and returns raw string.

Parameters:config (ConfigNode) – Configuration for caption processor.
class pythia.tasks.processors.FastTextProcessor(config, *args, **kwargs)[source]

FastText processor, similar to GloVe processor but returns FastText vectors.

Parameters:config (ConfigNode) – Configuration values for the processor.
class pythia.tasks.processors.GloVeProcessor(config, *args, **kwargs)[source]

Inherits VocabProcessor, and returns GloVe vectors for each of the words. Maps them to index using vocab processor, and then gets GloVe vectors corresponding to those indices.

Parameters:config (ConfigNode) – Configuration parameters for GloVe same as VocabProcessor().
class pythia.tasks.processors.MultiHotAnswerFromVocabProcessor(config, *args, **kwargs)[source]
compute_answers_scores(answers_indices)[source]

Generate VQA based answer scores for answers_indices.

Parameters:answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns:tensor containing scores.
Return type:torch.FloatTensor
class pythia.tasks.processors.Processor(config, *args, **kwargs)[source]

Wrapper class used by Pythia to initialized processor based on their type as passed in configuration. It retrieves the processor class registered in registry corresponding to the type key and initializes with params passed in configuration. All functions and attributes of the processor initialized are directly available via this class.

Parameters:config (ConfigNode) – ConfigNode containing type of the processor to be initialized and params of that procesor.
class pythia.tasks.processors.SimpleSentenceProcessor(*args, **kwargs)[source]

Tokenizes a sentence and processes it.

tokenizer

Type of tokenizer to be used.

Type:function
class pythia.tasks.processors.SimpleWordProcessor(*args, **kwargs)[source]

Tokenizes a word and processes it.

tokenizer

Type of tokenizer to be used.

Type:function
class pythia.tasks.processors.SoftCopyAnswerProcessor(config, *args, **kwargs)[source]

Similar to Answer Processor but adds soft copy dynamic answer space to it. Read https://arxiv.org/abs/1904.08920 for extra information on soft copy and LoRRA.

Parameters:config (ConfigNode) – Configuration for soft copy processor.
get_true_vocab_size()[source]

Actual vocab size which only include size of the vocabulary file.

Returns:Actual size of vocabs.
Return type:int
get_vocab_size()[source]

Size of Vocab + Size of Dynamic soft-copy based answer space

Returns:Size of vocab + size of dynamic soft-copy answer space.
Return type:int
class pythia.tasks.processors.VQAAnswerProcessor(config, *args, **kwargs)[source]

Processor for generating answer scores for answers passed using VQA accuracy formula. Using VocabDict class to represent answer vocabulary, so parameters must specify “vocab_file”. “num_answers” in parameter config specify the max number of answers possible. Takes in dict containing “answers” or “answers_tokens”. “answers” are preprocessed to generate “answers_tokens” if passed.

Parameters:config (ConfigNode) – Configuration for the processor
answer_vocab

Class representing answer vocabulary

Type:VocabDict
compute_answers_scores(answers_indices)[source]

Generate VQA based answer scores for answers_indices.

Parameters:answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns:tensor containing scores.
Return type:torch.FloatTensor
get_true_vocab_size()[source]

True vocab size can be different from normal vocab size in some cases such as soft copy where dynamic answer space is added.

Returns:True vocab size.
Return type:int
get_vocab_size()[source]

Get vocab size of the answer vocabulary. Can also include soft copy dynamic answer space size.

Returns:size of the answer vocabulary
Return type:int
idx2word(idx)[source]

Index to word according to the vocabulary.

Parameters:idx (int) – Index to be converted to the word.
Returns:Word corresponding to the index.
Return type:str
word2idx(word)[source]

Convert a word to its index according to vocabulary

Parameters:word (str) – Word to be converted to index.
Returns:Index of the word.
Return type:int
class pythia.tasks.processors.VocabProcessor(config, *args, **kwargs)[source]

Use VocabProcessor when you have vocab file and you want to process words to indices. Expects UNK token as “<unk>” and pads sentences using “<pad>” token. Config parameters can have preprocessor property which is used to preprocess the item passed and max_length property which points to maximum length of the sentence/tokens which can be convert to indices. If the length is smaller, the sentence will be padded. Parameters for “vocab” are necessary to be passed.

Key: vocab

Example Config:

task_attributes:
    vqa:
        vqa2:
            processors:
              text_processor:
                type: vocab
                params:
                  max_length: 14
                  vocab:
                    type: intersected
                    embedding_name: glove.6B.300d
                    vocab_file: vocabs/vocabulary_100k.txt
Parameters:config (ConfigNode) – node containing configuration parameters of the processor
vocab

Vocab class object which is abstraction over the vocab file passed.

Type:Vocab
get_pad_index()[source]

Get index of padding <pad> token in vocabulary.

Returns:index of the padding token.
Return type:int
get_vocab_size()[source]

Get size of the vocabulary.

Returns:size of the vocabulary.
Return type:int