Added substation segementation dataset #2352

rijuld · 2024-10-17T15:01:05Z

No description provided.

adamjstewart · 2024-10-22T11:54:13Z

Hi @rijuld, thanks for the contribution! If you're new to creating PyTorch datasets, I highly recommend reading the following tutorials:

The only difference between datasets in torchvision and NonGeoDatasets in TorchGeo is that our __getitem__ returns a dictionary instead of a tuple. Other than that, they share all the same basic components.

Most of your issues seem to be due to the use of args. I think you just need to remove this and explicitly list all parameters in the function signature. This will also simplify your testing code. Take a look at other existing datasets, we have about 75 examples to choose from. If you find one that is similar to your dataset, it shouldn't actually require that many changes to get them working.

rijuld · 2024-10-22T12:05:16Z

Hi @adamjstewart , thanks a ton for the feedback! I will go through this tutorial.

torchgeo/datasets/substation_seg.py

…of ^

rijuld · 2025-01-09T15:07:12Z

@adamjstewart I'm unsure about how to test the configuration file locally. Additionally, would testing the conf file cover the datamodule as well, or would I need to write separate test cases for the datamodule to ensure adequate code coverage?

adamjstewart · 2025-01-11T18:32:52Z

I'm unsure about how to test the configuration file locally.

You can get the test name from CI and then run it like so:

> pytest tests/trainers/test_segmentation.py::TestSemanticSegmentationTask::test_trainer[True-substation]

Additionally, would testing the conf file cover the datamodule as well

Yes

torchgeo/datasets/substation.py

adamjstewart · 2025-01-26T14:47:27Z

docs/api/datasets/non_geo_datasets.csv

@@ -53,6 +53,7 @@ Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
 `SSL4EO`_-S12,T,Sentinel-1/2,"CC-BY-4.0",1M,-,264x264,10,"SAR, MSI"
 `SSL4EO-L Benchmark`_,S,Lansat & CDL,"CC0-1.0",25K,134,264x264,30,MSI
 `SSL4EO-L Benchmark`_,S,Lansat & NLCD,"CC0-1.0",25K,17,264x264,30,MSI
+`Substation`_,S,OpenStreetMap & Sentinel-2, "CC-BY-SA 2.0", 27K, 2, 228x228, 10, MSI


Where did you find the license? I don't see anything on GitHub or in the paper.

adamjstewart · 2025-01-26T14:49:25Z

tests/datasets/test_substation.py

+        """Fixture for the Substation."""
+        root = os.path.join(os.getcwd(), 'tests', 'data', 'substation')
+
+        yield Substation(


We don't normally use yield when creating our fixtures, any reason you're using a generator?

adamjstewart · 2025-01-26T14:50:31Z

tests/datasets/test_substation.py

+            num_of_timepoints=4,
+        )
+
+    @pytest.mark.parametrize(


Shouldn't you parametrize the fixture instead of the unit test? Then all other unit tests will also be parametrized.

adamjstewart · 2025-01-26T14:51:17Z

tests/datasets/test_substation.py

+from collections.abc import Generator
+from pathlib import Path
+from typing import Any
+from unittest.mock import MagicMock


We don't use unittest, we use pytest. Can you convert these to MonkeyPatch which is builtin to pytest?

adamjstewart · 2025-01-26T14:53:39Z

torchgeo/datamodules/substation.py

+
+import torch
+from torch.utils.data import Subset, random_split
+from tqdm import tqdm


tqdm isn't a dependency and we don't really want to add new deps unless we have to

adamjstewart · 2025-01-26T14:59:39Z

torchgeo/datasets/substation.py

+    def __init__(
+        self,
+        root: Path,
+        bands: list[int],


Are tuples also allowed? In this case you would use Sequence instead of list

adamjstewart · 2025-01-26T15:00:46Z

torchgeo/datasets/substation.py

+        self.load_image_filenames()
+
+    def load_image_filenames(self) -> None:
+        """Load image filenames from the image directory."""
+        self.image_filenames = os.listdir(self.image_dir)


Suggested change

self.load_image_filenames()

def load_image_filenames(self) -> None:

"""Load image filenames from the image directory."""

self.image_filenames = os.listdir(self.image_dir)

self.image_filenames = sorted(os.listdir(self.image_dir))

No need for a helper function for a single line of code. Also, sorting this list makes the dataset reproducible.

adamjstewart · 2025-01-26T15:01:34Z

torchgeo/datasets/substation.py

+            timepoint_aggregation: How to aggregate multiple timepoints.
+            num_of_timepoints: Number of timepoints to use for each image.
+            download: Whether to download the dataset if it is not found.
+            checksum: Whether to verify the dataset after downloading.


These are in a different order than the parameters. Also, can you add a transforms parameter to match all of our other datasets? It should be used at the end of __getitem__.

adamjstewart · 2025-01-26T15:02:20Z

torchgeo/datasets/substation.py

+            print(prediction.shape)
+            ncols = 3
+
+        print(mask.shape, image.shape)


Can remove debugging print statements

adamjstewart · 2025-01-26T15:03:09Z

torchgeo/datasets/substation.py

+            self.url_for_images,
+            self.root,
+            filename=self.filename_images,
+            md5='INSERT_IMAGES_MD5_HASH' if self.checksum else None,


TODO: add MD5s

Added substation segementation dataset

7dff61c

github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Oct 17, 2024

adamjstewart added this to the 0.7.0 milestone Oct 17, 2024

rijuld added 2 commits October 21, 2024 15:42

resolved bugs

10637af

a

2cb0842

rijuld force-pushed the main branch from e73392c to 2cb0842 Compare October 21, 2024 19:51

rijuld added 4 commits October 21, 2024 15:52

Resolved error

608f76a

fixed ruff errors

288e8b1

fixed mypy errors for substation seg py file

2e9bf83

removed more errors

78c494d

nilsleh reviewed Oct 24, 2024

View reviewed changes

torchgeo/datasets/substation_seg.py Outdated Show resolved Hide resolved

rijuld added 15 commits October 24, 2024 10:11

resolved ruff errors and mypy errors

75ca32c

fixed length and data size along with ruff and mypy errors

e2326cc

resolved float error

9832db4

organized imports

ef79cd7

changed to float

83f2eb4

resolved mypy errors

69f5815

resolved further tests

898e6b3

sorted imports

d14eca6

more test coverage

d6ae700

ruff format

8892f0d

increased test code coverage

3f135b4

added formatting

9a05811

removed transformations so that I can add them in data module

4e65b04

increased underline length

9a9d555

corrected csv row length

3e12e7e

rijuld and others added 12 commits January 8, 2025 16:40

Merge branch 'main' into main

7626f28

changed the datatype of bands to list[int] form int

d35b435

changed bands datatype from datamodule

173a915

changed num of bands variables

cfe800d

Added substation in datamodules.rst and resolved datasets.rst length …

ebcc36f

…of ^

added substation datamodule in init

f1fcdf0

chanded the data type of normalizing factor to Any

f00bcd2

[just for testing]

280e32a

[for testing]

743113e

Added parent class

85bb9c9

removed patch size

de5b337

removed unwanted key

3285346

rijuld and others added 7 commits January 19, 2025 03:42

resolved errors and tested data module using conf file

d1f062f

resolved some ruff issues

aebe183

Merge branch 'main' into main

a01c3b4

fixed another ruff error

5de36d4

fixed ruff issue

7c8c71a

added more test coverage for extract and verify

6c2b1cb

organized imports

d4bf9fb

adamjstewart reviewed Jan 19, 2025

View reviewed changes

torchgeo/datasets/substation.py Outdated Show resolved Hide resolved

rijuld and others added 6 commits January 19, 2025 09:26

added more tests for dataset

9a050bd

added identity for init values

8c918a8

ruff format

39668ca

removed pytest command from test file

b3af64a

ruff format

8355860

Merge branch 'main' into main

5091e16

rijuld requested a review from adamjstewart January 22, 2025 11:57

adamjstewart requested changes Jan 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added substation segementation dataset #2352

Added substation segementation dataset #2352

rijuld commented Oct 17, 2024

adamjstewart commented Oct 22, 2024

rijuld commented Oct 22, 2024

rijuld commented Jan 9, 2025

adamjstewart commented Jan 11, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

adamjstewart Jan 26, 2025

Added substation segementation dataset #2352

Are you sure you want to change the base?

Added substation segementation dataset #2352

Conversation

rijuld commented Oct 17, 2024

adamjstewart commented Oct 22, 2024

rijuld commented Oct 22, 2024

rijuld commented Jan 9, 2025

adamjstewart commented Jan 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment