Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds grains entry point #1082

Merged
merged 7 commits into from
Jan 31, 2025
Merged

Adds grains entry point #1082

merged 7 commits into from
Jan 31, 2025

Conversation

ns-rse
Copy link
Collaborator

@ns-rse ns-rse commented Jan 28, 2025

Closes #742

  • Adds processing.process_grains()
  • Adds run_modules.grains()

Together these allow the topostats grains entry point to run which loads *.topostats files (v0.2), extracts the flattened image and re-runs grains.

If any processing artefacts from previous runs are present in the .topostats files they are removed. Output is written to the specified directory so that comparisions will be possible.

Added in tests and checked that some things in the previous topstats filters step work too. Some code is in place for subsequent modules that will be added in turn and they may need refining.

In working through adding the topostats grains entry point I was confused why .topostats files had image which contained the flattened image but all the processing stages after Filters used image_flattened. After checking with
@SylviaWhittle we have opted to make things consistent across the file output and the processing.

Updates tests in light of these changes, previously the result of loadscans.get_data() left img_dict as a dictionary of the data but to align with other scan types we actually want a nested dictionary with the keys as the filenames
then the data (whether that is a single scan from most raw data or the dictionary that .topostats files hold).

For future discussion

I found that because AFMReader returns a tuple I had to add additional logic (it baffled me for a while until I realised this!)

This raises (again as I've asked a similar question before) the disconnect between a topostats object internal to TopoStats and as stored in the HDF5 file format and the value returned by AFMReader. As can be seen here
for convenience the image, pixel_to_nm_scaling are extracted from data and returned as part of a Tuple along with the data from which it was extracted. This might be convenient for users but I see no reason why we shouldn't return just data and then users can access these values via data["image"] and data["pixel_to_nm_scaling"]. This would mean the result of loading a .topostats object matches the interal representation and we can remove some logic and wrangling that has been introduced in this PR to sort that out.


Before submitting a Pull Request please check the following.

  • Existing tests pass.
  • Pre-commit checks pass.
  • New functions/methods have typehints and docstrings.
  • New functions/methods have tests which check the intended behaviour is correct. I felt testing that the
    pop() method worked seemed excessive. Perhaps when I add in steps for grainstats and others I'll check the
    correct items are removed.

Sorry, something went wrong.

ns-rse and others added 6 commits January 15, 2025 14:45

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Closes #741

Adds the "swiss-army knife" component to run just filtering on files.

This involved modifying how the `.topostats` files are loaded and extracted because of the nesting (see #1068).

Tests currently fail because the `tests/resources/test_image/minicircle_small.topostats` is version `0.1` and doesn't
therefore work with the refactored structure (surprised #1068 passed all tests actualy!).

A separate commit will be made for updating this test file and the associated tweaking of tests.
The `tests/resources/test_image/minicircle_small.topostats` was version `0.1` and failed the updates and tests that now
work with `0.2`. I've therefore updated this test file and tweaked the associated tests to work with these files.

All tests pass locally (watch them fail on CI!).
Closes #742

- Adds `processing.process_grains()`
- Adds `run_modules.grains()`

Together these allow the `topostats grains` entry point to run which loads `*.topostats` files (`v0.2`), extracts the
flattened image and re-runs grains.

If any processing artefacts from previous runs are present in the `.topostats` files they are removed. Output is written
to the specified directory so that comparisions will be possible.

Added in tests and checked that some things in the previous `topstats filters` step work too. Some code is in place for
subsequent modules that will be added in turn and they may need refining.
In working through adding the `topostats grains` entry point I was confused why `.topostats` files had `image` which
contained the flattened image but all the processing stages after `Filters` used `image_flattened`. After checking with
@SylviaWhittle we have opted to make things consistent across the file output and the processing.

Updates tests in light of these changes, previously the result of `loadscans.get_data()` left `img_dict` as a dictionary
of the _data_ but to align with other scan types we actually want a nested dictionary with the keys as the filenames
then the data (whether that is a single scan from most raw data or the dictionary that `.topostats` files hold).

This raises (again as I've asked a similar question before before) the disconnect between a `topostats` object internal
to TopoStats and as stored in the HDF5 file format and the value returned by AFMReader. As can be seen
[here](https://github.com/AFM-SPM/AFMReader/blob/022dcf286914c23a30da42e4ea401aa577b0b193/AFMReader/topostats.py#L55)
for convenience the `image`, `pixel_to_nm_scaling` are extracted from `data` and returned as part of a Tuple along with
the `data` from which it was extracted. This _might_ be convenient for users but I see no reason why we shouldn't return
just `data` and then users can access these values via `data["image"]` and `data["pixel_to_nm_scaling"]`. This would
mean the result of loading a `.topostats` object matches the interal representation and we can remove some logic and
wrangling that has been introduced in this PR to sort that out.
@SylviaWhittle
Copy link
Collaborator

Might be incorrect, but should the WIP DO NOT USE flag be removed from the grains entry in the help output?
image

&

image

@SylviaWhittle
Copy link
Collaborator

SylviaWhittle commented Jan 31, 2025

I've successfully used the grains program to re-process a pre-flattened .topostats image, resulting in the expected grains being found.

topostats -c config.yaml grains
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] The YAML configuration file is valid.
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] The YAML plotting configuration file is valid.
[Fri, 31 Jan 2025 13:47:21] [ERROR   ] [topostats] Splining enabled but Filters disabled. Please check your configuration file.
[Fri, 31 Jan 2025 13:47:21] [ERROR   ] [topostats] [processing.py] [1444] Splining enabled but Filters disabled. Please check your configuration file.
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Configuration file loaded from      : config.yaml
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Scanning for images in              : data
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Output directory                    : output
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Looking for images with extension   : .topostats
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Images with extension .topostats in data : 1
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Thresholding method (Filtering)     : std_dev
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Thresholding method (Grains)        : std_dev
[Fri, 31 Jan 2025 13:47:21] [INFO    ] [topostats] Extracting image from data/minicircle_small_orignal.topostats
13:47:21 | INFO |topostats.py:topostats:load_topostats:38 | Loading image from : data/minicircle_small_orignal.topostats
13:47:21 | INFO |topostats.py:topostats:load_topostats:46 | [minicircle_small_orignal] TopoStats file version : 0.2
Processing images from data, results are under output:   0%|                                     | 0/1 [00:00<?, ?it/s][Fri, 31 Jan 2025 13:47:24] [INFO    ] [topostats] Processing : minicircle_small
[Fri, 31 Jan 2025 13:47:24] [INFO    ] [topostats] [minicircle_small] : *** Grain Finding ***
[Fri, 31 Jan 2025 13:47:24] [INFO    ] [topostats] [minicircle_small] : Grains found for direction above : 3
[Fri, 31 Jan 2025 13:47:24] [INFO    ] [topostats] [minicircle_small] : Plotting Grain Finding Images
[Fri, 31 Jan 2025 13:47:25] [INFO    ] [topostats] [minicircle_small] : Grain Finding stage completed successfully.
[Fri, 31 Jan 2025 13:47:25] [INFO    ] [topostats] [minicircle_small] : Saving image to .topostats file
Processing images from data, results are under output: 100%|█████████████████████████████| 1/1 [00:04<00:00,  4.24s/it][Fri, 31 Jan 2025 13:47:25] [INFO    ] [topostats] [minicircle_small] Grain detection completed (NB - Filtering was *not* re-run).
Processing images from data, results are under output: 100%|█████████████████████████████| 1/1 [00:04<00:00,  4.24s/it]


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


  _______      _____      __ __       _____     ______    _______      _____      _______    ______
/\_______)\   ) ___ (    /_/\__/\    ) ___ (   / ____/\ /\_______)\   /\___/\   /\_______)\ / ____/\
\(___  __\/  / /\_/\ \   ) ) ) ) )  / /\_/\ \  ) ) __\/ \(___  __\/  / / _ \ \  \(___  __\/ ) ) __\/
  / / /     / /_/ (_\ \ /_/ /_/ /  / /_/ (_\ \  \ \ \     / / /      \ \(_)/ /    / / /      \ \ \
 ( ( (      \ \ )_/ / / \ \ \_\/   \ \ )_/ / /  _\ \ \   ( ( (       / / _ \ \   ( ( (       _\ \ \
  \ \ \      \ \/_\/ /   )_) )      \ \/_\/ /  )____) )   \ \ \     ( (_( )_) )   \ \ \     )____) )
  /_/_/       )_____(    \_\/        )_____(   \____\/    /_/_/      \/_/ \_\/    /_/_/     \____\/


[Fri, 31 Jan 2025 13:47:25] [INFO    ] [topostats]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COMPLETE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  TopoStats Version           : 2.3.1.dev93+g51269f793.d20250124
  Base Directory              : data
  File Extension              : .topostats
  Files Found                 : 1
  Successfully Processed^1    : 1 (100.0%)
  All statistics              : output/all_statistics.csv
  Distribution Plots          : Disabled. Enable in config 'summary_stats/run' if needed.

  Configuration               : output/config.yaml

  Email                       : topostats@sheffield.ac.uk
  Documentation               : https://afm-spm.github.io/topostats/
  Source Code                 : https://github.com/AFM-SPM/TopoStats/
  Bug Reports/Feature Request : https://github.com/AFM-SPM/TopoStats/issues/new/choose
  Citation File Format        : https://github.com/AFM-SPM/TopoStats/blob/main/CITATION.cff

  ^1 Successful processing of an image is detection of grains and calculation of at least
     grain statistics. If these have been disabled the percentage will be 0.

  If you encounter bugs/issues or have feature requests please report them at the above URL
  or email us.

  If you have found TopoStats useful please consider citing it. A Citation File Format is
  linked above and available from the Source Code page.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

using a barebones config:

# Config file generated 2025-01-21 12:04:47
# # For more information on configuration and how to use it:
# https://afm-spm.github.io/TopoStats/main/configuration.html
base_dir: ./data # Directory in which to search for data files
output_dir: ./output # Directory to output results to
log_level: info # Verbosity of output. Options: warning, error, info, debug
cores: 1 # Number of CPU cores to utilise for processing multiple files simultaneously.
file_ext: .topostats # File extension of the data files.
loading:
  channel: Height # Channel to pull data from in the data files.
  extract: all # Array to extract when loading .topostats files.
filter:
  run: false # Options : true, false
grains:
  run: true
plotting:
  run: true # Options : true, false
  image_set: all # Options : all, core

Result:
image

Copy link
Collaborator

@SylviaWhittle SylviaWhittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well for me, (tested locally)

Just a couple documentation things (the comment and the thing here:)

Remove the WIP tag from both the topostats -h command and also the topostats grains -h command (I think?)

topostats/run_modules.py Outdated Show resolved Hide resolved
@ns-rse
Copy link
Collaborator Author

ns-rse commented Jan 31, 2025

Brilliant, thanks for testing @SylviaWhittle

Might be incorrect, but should the WIP DO NOT USE flag be removed from the grains entry in the help output?

Good spot thanks and not at all incorrect, thanks for picking that up.

Corrected along with the comment.

Linting error in pre-commit isn't from this Pull Request as I wrote the documentation separately and it was merged the other day. Not sure why or how its crept into main without being picked up1, will address separately.

Footnotes

  1. I see now, its because I edited it as a "suggestion" rather than in my editor so no automatic line wrapping and I didn't wait for reapproval. Incoming fix shortly.

@ns-rse ns-rse added Grains Issues pertaining to the Grains class refactor Refactoring of code labels Jan 31, 2025
@ns-rse ns-rse merged commit e9cad80 into main Jan 31, 2025
10 of 11 checks passed
@ns-rse ns-rse deleted the ns-rse/742-grains-entry-point branch January 31, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Grains Issues pertaining to the Grains class refactor Refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grains entry point
3 participants