Skip to content
This repository has been archived by the owner on Nov 23, 2024. It is now read-only.

Default value 2**20 is not recognized, wrongly annotated as required #26

Open
jofaul opened this issue Jul 1, 2022 · 1 comment
Open
Labels
bug 🪲 Something isn't working @optional Related to the @optional annotation @required Related to the @required annotation wrong annotation An annotation was generated automatically but is incorrect

Comments

@jofaul
Copy link
Contributor

jofaul commented Jul 1, 2022

URL Hash

#/sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features

Actual Annotation Type

@required

Actual Annotation Inputs

{
    "target": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features",
    "authors": [
        "$autogen$"
    ]
}

Expected Annotation Type

@optional

Expected Annotation Inputs

2**20

Minimal API Data (optional)

Minimal API Data for `sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features`
{
    "schemaVersion": 1,
    "distribution": "scikit-learn",
    "package": "sklearn",
    "version": "1.1.1",
    "modules": [
        {
            "id": "sklearn/sklearn.feature_extraction",
            "name": "sklearn.feature_extraction",
            "imports": [],
            "from_imports": [
                {
                    "module": "sklearn.feature_extraction",
                    "declaration": "text",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction._dict_vectorizer",
                    "declaration": "DictVectorizer",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction._hash",
                    "declaration": "FeatureHasher",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction.image",
                    "declaration": "grid_to_graph",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction.image",
                    "declaration": "img_to_graph",
                    "alias": null
                }
            ],
            "classes": [
                "sklearn/sklearn.feature_extraction._hash/FeatureHasher"
            ],
            "functions": []
        }
    ],
    "classes": [
        {
            "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher",
            "name": "FeatureHasher",
            "qname": "sklearn.feature_extraction._hash.FeatureHasher",
            "decorators": [],
            "superclasses": [
                "TransformerMixin",
                "BaseEstimator"
            ],
            "methods": [
                "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__"
            ],
            "is_public": true,
            "reexported_by": [
                "sklearn/sklearn.feature_extraction"
            ],
            "description": "Implements feature hashing, aka the hashing trick.\n\nThis class turns sequences of symbolic feature names (strings) into\nscipy.sparse matrices, using a hash function to compute the matrix column\ncorresponding to a name. The hash function employed is the signed 32-bit\nversion of Murmurhash3.\n\nFeature names of type byte string are used as-is. Unicode strings are\nconverted to UTF-8 first, but no Unicode normalization is done.\nFeature values must be (finite) numbers.\n\nThis class is a low-memory alternative to DictVectorizer and\nCountVectorizer, intended for large-scale (online) learning and situations\nwhere memory is tight, e.g. when running prediction code on embedded\ndevices.\n\nRead more in the :ref:`User Guide <feature_hashing>`.\n\n.. versionadded:: 0.13",
            "docstring": "Implements feature hashing, aka the hashing trick.\n\n    This class turns sequences of symbolic feature names (strings) into\n    scipy.sparse matrices, using a hash function to compute the matrix column\n    corresponding to a name. The hash function employed is the signed 32-bit\n    version of Murmurhash3.\n\n    Feature names of type byte string are used as-is. Unicode strings are\n    converted to UTF-8 first, but no Unicode normalization is done.\n    Feature values must be (finite) numbers.\n\n    This class is a low-memory alternative to DictVectorizer and\n    CountVectorizer, intended for large-scale (online) learning and situations\n    where memory is tight, e.g. when running prediction code on embedded\n    devices.\n\n    Read more in the :ref:`User Guide <feature_hashing>`.\n\n    .. versionadded:: 0.13\n\n    Parameters\n    ----------\n    n_features : int, default=2**20\n        The number of features (columns) in the output matrices. Small numbers\n        of features are likely to cause hash collisions, but large numbers\n        will cause larger coefficient dimensions in linear learners.\n    input_type : str, default='dict'\n        Choose a string from {'dict', 'pair', 'string'}.\n        Either \"dict\" (the default) to accept dictionaries over\n        (feature_name, value); \"pair\" to accept pairs of (feature_name, value);\n        or \"string\" to accept single strings.\n        feature_name should be a string, while value should be a number.\n        In the case of \"string\", a value of 1 is implied.\n        The feature_name is hashed to find the appropriate column for the\n        feature. The value's sign might be flipped in the output (but see\n        non_negative, below).\n    dtype : numpy dtype, default=np.float64\n        The type of feature values. Passed to scipy.sparse matrix constructors\n        as the dtype argument. Do not set this to bool, np.boolean or any\n        unsigned integer type.\n    alternate_sign : bool, default=True\n        When True, an alternating sign is added to the features as to\n        approximately conserve the inner product in the hashed space even for\n        small n_features. This approach is similar to sparse random projection.\n\n        .. versionchanged:: 0.19\n            ``alternate_sign`` replaces the now deprecated ``non_negative``\n            parameter.\n\n    See Also\n    --------\n    DictVectorizer : Vectorizes string-valued features using a hash table.\n    sklearn.preprocessing.OneHotEncoder : Handles nominal/categorical features.\n\n    Examples\n    --------\n    >>> from sklearn.feature_extraction import FeatureHasher\n    >>> h = FeatureHasher(n_features=10)\n    >>> D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}]\n    >>> f = h.transform(D)\n    >>> f.toarray()\n    array([[ 0.,  0., -4., -1.,  0.,  0.,  0.,  0.,  0.,  2.],\n           [ 0.,  0.,  0., -2., -5.,  0.,  0.,  0.,  0.,  0.]])\n    "
        }
    ],
    "functions": [
        {
            "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__",
            "name": "__init__",
            "qname": "sklearn.feature_extraction._hash.FeatureHasher.__init__",
            "decorators": [],
            "parameters": [
                {
                    "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features",
                    "name": "n_features",
                    "qname": "sklearn.feature_extraction._hash.FeatureHasher.__init__.n_features",
                    "default_value": "2**20",
                    "assigned_by": "POSITION_OR_NAME",
                    "is_public": true,
                    "docstring": {
                        "type": "int, default=2**20",
                        "description": "The number of features (columns) in the output matrices. Small numbers\nof features are likely to cause hash collisions, but large numbers\nwill cause larger coefficient dimensions in linear learners."
                    },
                    "type": {}
                }
            ],
            "results": [],
            "is_public": true,
            "reexported_by": [],
            "description": "",
            "docstring": ""
        }
    ]
}

Minimal Usage Store (optional)

Minimal Usage Store for `sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features`
{
    "schemaVersion": 1,
    "module_counts": {
        "sklearn/sklearn.feature_extraction": 693
    },
    "class_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher": 26
    },
    "function_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__": 22
    },
    "parameter_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features": 11
    },
    "value_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features": {
            "6": 1,
            "8": 2,
            "12": 2,
            "20": 1,
            "100": 1,
            "hash_vector_size": 2,
            "m": 1,
            "2**18": 1,
            "2**20": 11
        }
    }
}

Suggested Solution (optional)

No response

Additional Context (optional)

image

@jofaul jofaul added bug 🪲 Something isn't working wrong annotation An annotation was generated automatically but is incorrect @required Related to the @required annotation @optional Related to the @optional annotation labels Jul 1, 2022
@Aclrian
Copy link
Contributor

Aclrian commented Jul 1, 2022

It should also work for boundaries. See closed Safe-DS/API-Editor#869 for details

@lars-reimann lars-reimann transferred this issue from Safe-DS/API-Editor Mar 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug 🪲 Something isn't working @optional Related to the @optional annotation @required Related to the @required annotation wrong annotation An annotation was generated automatically but is incorrect
Projects
Status: Backlog
Development

No branches or pull requests

2 participants