Default value 2**20 is not recognized, wrongly annotated as required #26

jofaul · 2022-07-01T11:45:53Z

URL Hash

#/sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features

Actual Annotation Type

@required

Actual Annotation Inputs

{
    "target": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features",
    "authors": [
        "$autogen$"
    ]
}

Expected Annotation Type

@optional

Expected Annotation Inputs

2**20

Minimal API Data (optional)

Minimal API Data for `sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features`

{
    "schemaVersion": 1,
    "distribution": "scikit-learn",
    "package": "sklearn",
    "version": "1.1.1",
    "modules": [
        {
            "id": "sklearn/sklearn.feature_extraction",
            "name": "sklearn.feature_extraction",
            "imports": [],
            "from_imports": [
                {
                    "module": "sklearn.feature_extraction",
                    "declaration": "text",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction._dict_vectorizer",
                    "declaration": "DictVectorizer",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction._hash",
                    "declaration": "FeatureHasher",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction.image",
                    "declaration": "grid_to_graph",
                    "alias": null
                },
                {
                    "module": "sklearn.feature_extraction.image",
                    "declaration": "img_to_graph",
                    "alias": null
                }
            ],
            "classes": [
                "sklearn/sklearn.feature_extraction._hash/FeatureHasher"
            ],
            "functions": []
        }
    ],
    "classes": [
        {
            "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher",
            "name": "FeatureHasher",
            "qname": "sklearn.feature_extraction._hash.FeatureHasher",
            "decorators": [],
            "superclasses": [
                "TransformerMixin",
                "BaseEstimator"
            ],
            "methods": [
                "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__"
            ],
            "is_public": true,
            "reexported_by": [
                "sklearn/sklearn.feature_extraction"
            ],
            "description": "Implements feature hashing, aka the hashing trick.\n\nThis class turns sequences of symbolic feature names (strings) into\nscipy.sparse matrices, using a hash function to compute the matrix column\ncorresponding to a name. The hash function employed is the signed 32-bit\nversion of Murmurhash3.\n\nFeature names of type byte string are used as-is. Unicode strings are\nconverted to UTF-8 first, but no Unicode normalization is done.\nFeature values must be (finite) numbers.\n\nThis class is a low-memory alternative to DictVectorizer and\nCountVectorizer, intended for large-scale (online) learning and situations\nwhere memory is tight, e.g. when running prediction code on embedded\ndevices.\n\nRead more in the :ref:`User Guide <feature_hashing>`.\n\n.. versionadded:: 0.13",
            "docstring": "Implements feature hashing, aka the hashing trick.\n\n    This class turns sequences of symbolic feature names (strings) into\n    scipy.sparse matrices, using a hash function to compute the matrix column\n    corresponding to a name. The hash function employed is the signed 32-bit\n    version of Murmurhash3.\n\n    Feature names of type byte string are used as-is. Unicode strings are\n    converted to UTF-8 first, but no Unicode normalization is done.\n    Feature values must be (finite) numbers.\n\n    This class is a low-memory alternative to DictVectorizer and\n    CountVectorizer, intended for large-scale (online) learning and situations\n    where memory is tight, e.g. when running prediction code on embedded\n    devices.\n\n    Read more in the :ref:`User Guide <feature_hashing>`.\n\n    .. versionadded:: 0.13\n\n    Parameters\n    ----------\n    n_features : int, default=2**20\n        The number of features (columns) in the output matrices. Small numbers\n        of features are likely to cause hash collisions, but large numbers\n        will cause larger coefficient dimensions in linear learners.\n    input_type : str, default='dict'\n        Choose a string from {'dict', 'pair', 'string'}.\n        Either \"dict\" (the default) to accept dictionaries over\n        (feature_name, value); \"pair\" to accept pairs of (feature_name, value);\n        or \"string\" to accept single strings.\n        feature_name should be a string, while value should be a number.\n        In the case of \"string\", a value of 1 is implied.\n        The feature_name is hashed to find the appropriate column for the\n        feature. The value's sign might be flipped in the output (but see\n        non_negative, below).\n    dtype : numpy dtype, default=np.float64\n        The type of feature values. Passed to scipy.sparse matrix constructors\n        as the dtype argument. Do not set this to bool, np.boolean or any\n        unsigned integer type.\n    alternate_sign : bool, default=True\n        When True, an alternating sign is added to the features as to\n        approximately conserve the inner product in the hashed space even for\n        small n_features. This approach is similar to sparse random projection.\n\n        .. versionchanged:: 0.19\n            ``alternate_sign`` replaces the now deprecated ``non_negative``\n            parameter.\n\n    See Also\n    --------\n    DictVectorizer : Vectorizes string-valued features using a hash table.\n    sklearn.preprocessing.OneHotEncoder : Handles nominal/categorical features.\n\n    Examples\n    --------\n    >>> from sklearn.feature_extraction import FeatureHasher\n    >>> h = FeatureHasher(n_features=10)\n    >>> D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}]\n    >>> f = h.transform(D)\n    >>> f.toarray()\n    array([[ 0.,  0., -4., -1.,  0.,  0.,  0.,  0.,  0.,  2.],\n           [ 0.,  0.,  0., -2., -5.,  0.,  0.,  0.,  0.,  0.]])\n    "
        }
    ],
    "functions": [
        {
            "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__",
            "name": "__init__",
            "qname": "sklearn.feature_extraction._hash.FeatureHasher.__init__",
            "decorators": [],
            "parameters": [
                {
                    "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features",
                    "name": "n_features",
                    "qname": "sklearn.feature_extraction._hash.FeatureHasher.__init__.n_features",
                    "default_value": "2**20",
                    "assigned_by": "POSITION_OR_NAME",
                    "is_public": true,
                    "docstring": {
                        "type": "int, default=2**20",
                        "description": "The number of features (columns) in the output matrices. Small numbers\nof features are likely to cause hash collisions, but large numbers\nwill cause larger coefficient dimensions in linear learners."
                    },
                    "type": {}
                }
            ],
            "results": [],
            "is_public": true,
            "reexported_by": [],
            "description": "",
            "docstring": ""
        }
    ]
}

Minimal Usage Store (optional)

Minimal Usage Store for `sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features`

{
    "schemaVersion": 1,
    "module_counts": {
        "sklearn/sklearn.feature_extraction": 693
    },
    "class_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher": 26
    },
    "function_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__": 22
    },
    "parameter_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features": 11
    },
    "value_counts": {
        "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features": {
            "6": 1,
            "8": 2,
            "12": 2,
            "20": 1,
            "100": 1,
            "hash_vector_size": 2,
            "m": 1,
            "2**18": 1,
            "2**20": 11
        }
    }
}

Additional Context (optional)

The text was updated successfully, but these errors were encountered:

Aclrian · 2022-07-01T13:06:21Z

It should also work for boundaries. See closed Safe-DS/API-Editor#869 for details

jofaul added bug 🪲 Something isn't working wrong annotation An annotation was generated automatically but is incorrect @required Related to the @required annotation @optional Related to the @optional annotation labels Jul 1, 2022

Aclrian mentioned this issue Jul 1, 2022

Missing annotation for keyword 'in the range' and upper boundary with 2 asterisks Safe-DS/API-Editor#869

Closed

lars-reimann transferred this issue from Safe-DS/API-Editor Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default value 2**20 is not recognized, wrongly annotated as required #26

Default value 2**20 is not recognized, wrongly annotated as required #26

jofaul commented Jul 1, 2022

Aclrian commented Jul 1, 2022

Default value 2**20 is not recognized, wrongly annotated as required #26

Default value 2**20 is not recognized, wrongly annotated as required #26

Comments

jofaul commented Jul 1, 2022

URL Hash

Actual Annotation Type

Actual Annotation Inputs

Expected Annotation Type

Expected Annotation Inputs

Minimal API Data (optional)

Minimal Usage Store (optional)

Suggested Solution (optional)

Additional Context (optional)

Aclrian commented Jul 1, 2022