vulcanai.datasets package

vulcanai.datasets.fashion module

class vulcanai.datasets.fashion.FashionData(root, train=True, transform=None, target_transform=None, download=False)

Bases: torch.utils.data.dataset.Dataset

‘MNIST <http://yann.lecun.com/exdb/mnist/>`_ Dataset.

Parameters:

root (string): Root directory of dataset where processed/training.pt and processed/test.pt exist. train (bool, optional): If True, creates dataset from training.pt,

otherwise from test.pt.

download (bool, optional): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. transform (callable, optional): A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop target_transform (callable, optional): A function/transform that takes in the target and transforms it.

__init__(root, train=True, transform=None, target_transform=None, download=False)

Initialize self. See help(type(self)) for accurate signature.

download()

Download the MNIST data if it doesn’t exist in processed_folder already.

processed_folder = 'processed'
raw_folder = 'raw'
test_file = 'test.pt'
training_file = 'training.pt'
urls = ['http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz', 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz', 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz']
vulcanai.datasets.fashion.get_int(b)
vulcanai.datasets.fashion.parse_byte(b)
vulcanai.datasets.fashion.read_image_file(path)
vulcanai.datasets.fashion.read_label_file(path)

vulcanai.datasets.multidataset module

Defines the MultiDataset Class

class vulcanai.datasets.multidataset.MultiDataset(dataset_tuples)

Bases: torch.utils.data.dataset.Dataset

Define a dataset for multi input networks.

Takes in a list of datasets, and whether or not their input_data and target data should be output.

Parameters:
dataset_tuples : list of tuples
Each tuple being (Dataset, use_data_boolean, use_target_boolean). A list of tuples, wherein each tuple should have the Dataset in the zero index, a boolean of whether to include the input_data in the first index, and a boolean of whether to include the target data in the second index. You can only specificy one target at a time throughout all incoming datasets.
Returns:
multi_dataset : torch.utils.data.Dataset
__init__(dataset_tuples)

Initialize a dataset for multi input networks.

vulcanai.datasets.tabulardataset module

vulcanai.datasets.utils module

This file contains utility methods that many be useful to several dataset classes. check_split_ration, stratify, rationed_split, randomshuffler were all copy-pasted from torchtext because torchtext is not yet packaged for anaconda and is therefore not yet a reasonable dependency. See https://github.com/pytorch/text/blob/master/torchtext/data/dataset.py

vulcanai.datasets.utils.check_split_ratio(split_ratio)

Check that the split ratio argument is not malformed

Parameters:

split_ratio: desired split ratio, either a list of length 2 or 3
depending if the validation set is desired.
Returns:
split ratio as tuple
vulcanai.datasets.utils.clean_dataframe(df)

Goes through and ensures that all nonsensical values are encoded as NaNs :param df: :return:

vulcanai.datasets.utils.rationed_split(df, train_ratio, test_ratio, validation_ratio)

Function to split a dataset given ratios. Assumes the ratios given are valid (checked using check_split_ratio).

Parameters:
df: Dataframe
The dataframe you want to split
train_ratio: int
proportion of the dataset that will go to the train split. between 0 and 1
test_ratio: int
proportion of the dataset that will go to the test split. between 0 and 1
validation_ratio: int
proportion of the dataset that will go to the val split. between 0 and 1
Returns:
indices: tuple of list of indices.