Here is the starting code:
import datasets
dataset = datasets.load_dataset('tanganke/nyuv2')
dataset = dataset.with_format('torch') # this will convert the items into `torch.Tensor` objects
# merge the test and train splits (we'll make our own later)
dataset = datasets.concatenate_datasets([dataset["train"], dataset["val"]])
and
def make_dataloaders(dataset, batch_size, test_size):
# TODO: resize images
# TODO: test/train split
# TODO: create torch.utils.data.DataLoader objects
return train_loader, test_loader # return the new DataLoaders
test_size = 0.2 # 20% of data will be for testing
batch_size = 8
train_loader, test_loader = make_dataloaders(
dataset, batch_size, test_size
)
The first block doesn’t have any work for you to do. This downloads a copy of NYU-Depth V2 from HuggingFace and discards the silly test/train split that it comes with.
The second block is our task.
Our goal in this section is create two DataLoader
objects: one for the training set, and one for the test set.
The PyTorch DataLoader
class solves the problem of “How can we standardize the process of a model interacting with a dataset?” A DataLoader
knows about a dataset, and it knows how to produce subsets (called batches) of that dataset whenever the model needs them.
In this section we’ll do any preprocessing we need to do (like resizing images) and we’ll split our images into train/test sets. Finally, we’ll make DataLoader
objects and return them.
dataset
objectThe dataset
object that we get from HuggingFace is essentially a dictionary. It has several fields. We care about two of them: dataset['image']
and dataset['depth']
.
Our dataset
object arrives from HuggingFace in HuggingFace’s preferred format. While we’re here, HuggingFace datasets have a built-in function called train_test_split
. Check it out!
This is a bit tricky. I’d recommend approaching this by looking for sample code online. Consider a search like “huggingface transformers resize all images in dataset”. You could also try asking AI for sample code.
My solution worked like this:
Each element of
dataset
is a dictionary. Write a function that takes one such dictionary and extracts itsimage
anddepth
fields. Resize those two images. Then return a new dictionary withimage
anddepth
as it’s keys. Finally, usedataset.map()
to apply this function to each element of the dataset.
This looks something like this:
def resize_fn(batch):
# "batch" is a dict with keys "image" and "depth".
image_t = batch["image"] # (3, H, W)
depth_t = batch["depth"] # (1, H, W)
# TODO: resize image_t
# TODO: resize depth_t
return {
"image": image_t,
"depth": depth_t
}
# ...
train_dataset = train_dataset.map(resize_fn)
Please reach out if your are stuck on this!