vulcanai.datasets package¶
vulcanai.datasets.fashion module¶
-
class
vulcanai.datasets.fashion.
FashionData
(root, train=True, transform=None, target_transform=None, download=False)¶ Bases:
torch.utils.data.dataset.Dataset
‘MNIST <http://yann.lecun.com/exdb/mnist/>`_ Dataset.
- Parameters:
root (string): Root directory of dataset where
processed/training.pt
andprocessed/test.pt
exist. train (bool, optional): If True, creates dataset fromtraining.pt
,otherwise fromtest.pt
.download (bool, optional): If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. transform (callable, optional): A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
target_transform (callable, optional): A function/transform that takes in the target and transforms it.
-
__init__
(root, train=True, transform=None, target_transform=None, download=False)¶ Initialize self. See help(type(self)) for accurate signature.
-
download
()¶ Download the MNIST data if it doesn’t exist in processed_folder already.
-
processed_folder
= 'processed'¶
-
raw_folder
= 'raw'¶
-
test_file
= 'test.pt'¶
-
training_file
= 'training.pt'¶
-
urls
= ['http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz', 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz', 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz', 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz']¶
-
vulcanai.datasets.fashion.
get_int
(b)¶
-
vulcanai.datasets.fashion.
parse_byte
(b)¶
-
vulcanai.datasets.fashion.
read_image_file
(path)¶
-
vulcanai.datasets.fashion.
read_label_file
(path)¶
vulcanai.datasets.multidataset module¶
Defines the MultiDataset Class
-
class
vulcanai.datasets.multidataset.
MultiDataset
(dataset_tuples)¶ Bases:
torch.utils.data.dataset.Dataset
Define a dataset for multi input networks.
Takes in a list of datasets, and whether or not their input_data and target data should be output.
- Parameters:
- dataset_tuples : list of tuples
- Each tuple being (Dataset, use_data_boolean, use_target_boolean). A list of tuples, wherein each tuple should have the Dataset in the zero index, a boolean of whether to include the input_data in the first index, and a boolean of whether to include the target data in the second index. You can only specificy one target at a time throughout all incoming datasets.
- Returns:
- multi_dataset : torch.utils.data.Dataset
-
__init__
(dataset_tuples)¶ Initialize a dataset for multi input networks.
vulcanai.datasets.tabulardataset module¶
vulcanai.datasets.utils module¶
This file contains utility methods that many be useful to several dataset classes. check_split_ration, stratify, rationed_split, randomshuffler were all copy-pasted from torchtext because torchtext is not yet packaged for anaconda and is therefore not yet a reasonable dependency. See https://github.com/pytorch/text/blob/master/torchtext/data/dataset.py
-
vulcanai.datasets.utils.
check_split_ratio
(split_ratio)¶ Check that the split ratio argument is not malformed
Parameters:
- split_ratio: desired split ratio, either a list of length 2 or 3
- depending if the validation set is desired.
- Returns:
- split ratio as tuple
-
vulcanai.datasets.utils.
clean_dataframe
(df)¶ Goes through and ensures that all nonsensical values are encoded as NaNs :param df: :return:
-
vulcanai.datasets.utils.
rationed_split
(df, train_ratio, test_ratio, validation_ratio)¶ Function to split a dataset given ratios. Assumes the ratios given are valid (checked using check_split_ratio).
- Parameters:
- df: Dataframe
- The dataframe you want to split
- train_ratio: int
- proportion of the dataset that will go to the train split. between 0 and 1
- test_ratio: int
- proportion of the dataset that will go to the test split. between 0 and 1
- validation_ratio: int
- proportion of the dataset that will go to the val split. between 0 and 1
- Returns:
- indices: tuple of list of indices.