Quickstart ¶

Authors: Amanpreet Singh

In this quickstart, we are going to train LoRRA model on TextVQA. Follow instructions at the bottom to train other models in Pythia.

Demo¶

Installation¶

Clone Pythia repository

git clone https://github.com/facebookresearch/pythia ~/pythia

Install dependencies and setup

cd ~/pythia
python setup.py develop

Note

If you face any issues with the setup, check the Troubleshooting/FAQ section below.
You can also create/activate your own conda environments before running above commands.

Getting Data¶

Datasets currently supported in Pythia require two parts of data, features and ImDB. Features correspond to pre-extracted object features from an object detector. ImDB is the image database for the datasets which contains information such as questions and answers in case of TextVQA.

For TextVQA, we need to download features for OpenImages’ images which are included in it and TextVQA 0.5 ImDB. We assume that all of the data is kept inside data folder under pythia root folder. Table in bottom shows corresponding features and ImDB links for datasets supported in pythia.

cd ~/pythia;
# Create data folder
mkdir -p data && cd data;

# Download and extract the features
wget https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
tar xf open_images.tar.gz

# Get vocabularies
wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
tar xf vocab.tar.gz

# Download detectron weights required by some models
wget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz
tar xf detectron_weights.tar.gz

# Download and extract ImDB
mkdir -p imdb && cd imdb
wget https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
tar xf textvqa_0.5.tar.gz

Training¶

Once we have the data in-place, we can start training by running the following command:

cd ~/pythia;
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config \
configs/vqa/textvqa/lorra.yml

Inference¶

For running inference or generating predictions for EvalAI, we can download a corresponding pretrained model and then run the following commands:

cd ~/pythia/data
mkdir -p models && cd models;
wget https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth
cd ../..
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config \
configs/vqa/textvqa/lorra.yml --resume_file data/models/lorra_best.pth \
--evalai_inference 1 --run_type inference

For running inference on val set, use --run_type val and rest of the arguments remain same. Check more details in pretrained models section.

These commands should be enough to get you started with training and performing inference using Pythia.

Troubleshooting/FAQs¶

If setup.py causes any issues, please install fastText first directly from the source and then run python setup.py develop. To install fastText run following commands:

git clone https://github.com/facebookresearch/fastText.git
cd fastText
pip install -e .

Tasks and Datasets¶

Dataset	Key	Task	ImDB Link	Features Link	Features checksum	Notes
TextVQA	textvqa	vqa	TextVQA 0.5 ImDB	OpenImages	b22e80997b2580edaf08d7e3a896e324
VQA 2.0	vqa2	vqa	VQA 2.0 ImDB	COCO	ab7947b04f3063c774b87dfbf4d0e981
VizWiz	vizwiz	vqa	VizWiz ImDB	VizWiz	9a28d6a9892dda8519d03fba52fb899f
VisualDialog	visdial	dialog	Coming soon!	Coming soon!	Coming soon!
VisualGenome	visual_genome	vqa	Automatically downloaded	Automatically downloaded	Coming soon!	Also supports scene graphs
CLEVR	clevr	vqa	Automatically downloaded	Automatically downloaded
MS COCO	coco	captioning	COCO Caption	COCO	ab7947b04f3063c774b87dfbf4d0e981

After downloading the features, verify the download by checking the md5sum using

echo "<checksum>  <dataset_name>.tar.gz" | md5sum -c -

Next steps¶

To dive deep into world of Pythia, you can move on the following next topics:

Citation¶

If you use Pythia in your work, please cite:

@inproceedings{Singh2019TowardsVM,
  title={Towards VQA Models That Can Read},
  author={Singh, Amanpreet and Natarajan, Vivek and Shah, Meet and Jiang, Yu and Chen, Xinlei and Batra, Dhruv and Parikh, Devi and Rohrbach, Marcus},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

and

@inproceedings{singh2019pythia,
  title={Pythia-a platform for vision \& language research},
  author={Singh, Amanpreet and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},
  booktitle={SysML Workshop, NeurIPS},
  volume={2018},
  year={2019}
}