DataLoader Hint

Here is the starting code:

import datasets

dataset = datasets.load_dataset('tanganke/nyuv2')
dataset = dataset.with_format('torch') # this will convert the items into `torch.Tensor` objects

# merge the test and train splits (we'll make our own later)
dataset = datasets.concatenate_datasets([dataset["train"], dataset["val"]])

and


def make_dataloaders(dataset, batch_size, test_size):

    # TODO: resize images
    # TODO: test/train split
    # TODO: create torch.utils.data.DataLoader objects

    return train_loader, test_loader # return the new DataLoaders

test_size = 0.2 # 20% of data will be for testing
batch_size = 8
train_loader, test_loader = make_dataloaders(
    dataset, batch_size, test_size
)

The first block doesn’t have any work for you to do. This downloads a copy of NYU-Depth V2 from HuggingFace and discards the silly test/train split that it comes with.

The second block is our task.

PyTorch DataLoader

Our goal in this section is create two DataLoader objects: one for the training set, and one for the test set.

The PyTorch DataLoader class solves the problem of “How can we standardize the process of a model interacting with a dataset?” A DataLoader knows about a dataset, and it knows how to produce subsets (called batches) of that dataset whenever the model needs them.

In this section we’ll do any preprocessing we need to do (like resizing images) and we’ll split our images into train/test sets. Finally, we’ll make DataLoader objects and return them.

Hint: the dataset object

The dataset object that we get from HuggingFace is essentially a dictionary. It has several fields. We care about two of them: dataset['image'] and dataset['depth'].

Hint: test/train splits

Our dataset object arrives from HuggingFace in HuggingFace’s preferred format. While we’re here, HuggingFace datasets have a built-in function called train_test_split. Check it out!

Hint: resizing

This is a bit tricky. I’d recommend approaching this by looking for sample code online. Consider a search like “huggingface transformers resize all images in dataset”. You could also try asking AI for sample code.

My solution worked like this:

Each element of dataset is a dictionary. Write a function that takes one such dictionary and extracts its image and depth fields. Resize those two images. Then return a new dictionary with image and depth as it’s keys. Finally, use dataset.map() to apply this function to each element of the dataset.

This looks something like this:

def resize_fn(batch):
    # "batch" is a dict with keys "image" and "depth".
    image_t = batch["image"]   # (3, H, W)
    depth_t = batch["depth"]   # (1, H, W)

    # TODO: resize image_t
    # TODO: resize depth_t

    return {
        "image": image_t,
        "depth": depth_t
    }

# ...
train_dataset = train_dataset.map(resize_fn)

Please reach out if your are stuck on this!