tasks.base_dataset_builder¶
In Pythia, for adding new datasets, dataset builder for datasets need to be
added. A new dataset builder must inherit BaseDatasetBuilder
class and
implement _load
and _build
functions.
_build
is used to build a dataset when it is not available. For e.g.
downloading the ImDBs for a dataset. In future, we plan to add a _build
to add dataset builder to ease setup of Pythia.
_load
is used to load a dataset from specific path. _load
needs to return
an instance of subclass of pythia.tasks.base_dataset.BaseDataset
.
See complete example for VQA2DatasetBuilder
here.
Example:
from torch.utils.data import Dataset
from pythia.tasks.base_dataset_builder import BaseDatasetBuilder
from pythia.common.registry import registry
@registry.register_builder("my")
class MyBuilder(BaseDatasetBuilder):
def __init__(self):
super().__init__("my")
def _load(self, dataset_type, config, *args, **kwargs):
...
return Dataset()
def _build(self, dataset_type, config, *args, **kwargs):
...
-
class
pythia.tasks.base_dataset_builder.
BaseDatasetBuilder
(dataset_name)[source]¶ Base class for implementing dataset builders. See more information on top. Child class needs to implement
_build
and_load
.Parameters: dataset_name (str) – Name of the dataset passed from child. -
_build
(dataset_type, config, *args, **kwargs)[source]¶ This is used to build a dataset first time. Implement this method in your child dataset builder class.
Parameters: - dataset_type (str) – Type of dataset, train|val|test
- config (ConfigNode) – Configuration of this dataset loaded from config.
-
_load
(dataset_type, config, *args, **kwargs)[source]¶ This is used to prepare the dataset and load it from a path. Override this method in your child dataset builder class.
Parameters: - dataset_type (str) – Type of dataset, train|val|test
- config (ConfigNode) – Configuration of this dataset loaded from config.
Returns: Dataset containing data to be trained on
Return type: dataset (BaseDataset)
-
build
(dataset_type, config, *args, **kwargs)[source]¶ Similar to load function, used by Pythia to build a dataset for first time when it is not available. This internally calls ‘_build’ function. Override that function in your child class.
Parameters: - dataset_type (str) – Type of dataset, train|val|test
- config (ConfigNode) – Configuration of this dataset loaded from config.
Warning
DO NOT OVERRIDE in child class. Instead override
_build
.
-
load
(dataset_type, config, *args, **kwargs)[source]¶ Main load function use by Pythia. This will internally call
_load
function. Callsinit_processors
andtry_fast_read
on the dataset returned from_load
Parameters: - dataset_type (str) – Type of dataset, train|val|test
- config (ConfigNode) – Configuration of this dataset loaded from config.
Returns: Dataset containing data to be trained on
Return type: dataset (BaseDataset)
Warning
DO NOT OVERRIDE in child class. Instead override
_load
.
-