Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MeanStdScaling is a bad name #87

Closed
glennmoy opened this issue Apr 26, 2021 · 8 comments · Fixed by #107
Closed

MeanStdScaling is a bad name #87

glennmoy opened this issue Apr 26, 2021 · 8 comments · Fixed by #107

Comments

@glennmoy
Copy link
Member

As noted by @eperim

"Standardised" data is a pretty common nomenclature, while "MeanStdScaling" sounds like it could be anything (and the data is not even scaled by the mean, it is shifted by it, only the std is used for scaling).

@bencottier
Copy link
Contributor

bencottier commented May 4, 2021

I think it's fine! "Standardised" sounds to me like it could be many things 😅 With MeanStdScaling I know immediately what it means among all the common ways of scaling data in ML (and I thought so before we wrote this Transform).

I do think "Standardised" is better than "Normalised" (which I considered originally) because while that has "Normal" in the name, it can refer to scaling by any kind of norm.

@eperim
Copy link

eperim commented May 4, 2021

I disagree. "Standardisation" is quite common nomenclature. There's even a special tag for it in Stats Exchange, and you can find the term frequently in glossaries, multiple blog posts, teaching materials, and in other packages (e.g. sklearn). These are all from the first page of a Google search.

With MeanStdScaling I know immediately what it means among all the common ways of scaling data in ML

Naturally you do, because you chose the name, but is it common nomenclature that you expect people in general to be familiar with?

@bencottier
Copy link
Contributor

"Standardisation" is quite common nomenclature.

Thanks for gathering those links, that makes me favour "Standardisation" more.

Naturally you do, because you chose the name, but is it common nomenclature that you expect people in general to be familiar with?

I copied the name from FeatureEngineering, but yes I had the chance to rename in the original PR and decided to keep it. I think more in terms of "how to identify what this literally does with a relatively short name?" than the nomenclature. You might even disagree that "MeanStdScaling" is best for that purpose, but that's my reasoning.

I commented to show there's a difference of opinion, but I don't feel strongly about this. Nomenclature is important and it's very reasonable to rename it.

@molet
Copy link
Member

molet commented Jun 18, 2021

I would also vote for "Standardization" instead of "MeanStdScaling".
Also, would it be possible to add "Normalization" as well to the package?
I.e. ( x - min(x) ) / ( max(x) - min(x) )

@xiaodaigh
Copy link

MeanStd is more precise

@juliohm
Copy link

juliohm commented Oct 26, 2021

Who is maintaining the package? There are many low-hanging fruit issues like this one that could be easily fixed, and many other issues that could be closed. Can I take responsibility for some of these as an external collaborator?

I've finished my first Transform here and as I mentioned in another issue, we could certainly benefit from some refactoring of the API.

@juliohm
Copy link

juliohm commented Oct 26, 2021

After experimenting with the API a bit more, I realized that it is too rigid and unlikely to change anytime soon due to the internal usage of the package at Invenia. I will start my own package to experiment with some ideas, and hopefully things can be merged in the future.

@mzgubic
Copy link
Contributor

mzgubic commented Feb 18, 2022

How does StandardScaler/ing sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants