Hardened unify data API calls to take in lists. Updated CHANGELOG.md

interpretml · May 22, 2019 · bcc4d52 · bcc4d52
1 parent 3eef5ca
commit bcc4d52
Show file tree

Hide file tree

Showing 2 changed files with 32 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,16 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and the versioning is mostly derived from [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [v0.1.3] - 2019-05-21
+### Added
+- Model fit can now support lists of lists as instance data.
+- Model fit can now support lists for label data.
+### Fixed
+- Various internal C++ fixes.
+### Changed
+- Removed hypothesis as public test dependency.
+- C++ logging introduced (no public access)
+
 ## [v0.1.2] - 2019-05-17
 ### Added
 - EBM can now disable early stopping with run length set to -1.
@@ -30,6 +40,7 @@ and the versioning is mostly derived from [Semantic Versioning](https://semver.o
 - Libraries are statically linked where possible.
 - Code now conforms to Python Black and its associated flake8.
 
+[v0.1.3]: https://github.com/microsoft/interpret/releases/tag/v0.1.3
 [v0.1.2]: https://github.com/microsoft/interpret/releases/tag/v0.1.2
 [v0.1.1]: https://github.com/microsoft/interpret/releases/tag/v0.1.1
 [v0.1.0]: https://github.com/microsoft/interpret/releases/tag/v0.1.0
diff --git a/src/python/interpret/utils/all.py b/src/python/interpret/utils/all.py
@@ -195,6 +195,8 @@ def unify_vector(data):
         new_data = data.values
     elif isinstance(data, np.ndarray):
         new_data = data
+    elif isinstance(data, list):
+        new_data = np.array(data)
     elif isinstance(data, NDFrame) and data.shape[1] == 1:
         new_data = data.iloc[:, 0].values
     else:
@@ -219,6 +221,7 @@ def unify_data(data, labels=None, feature_names=None, feature_types=None):
     Returns:
 
     """
+    # TODO: Clean up code to have less duplication.
     if isinstance(data, NDFrame):
         new_data = data.to_numpy()
 
@@ -235,6 +238,24 @@ def unify_data(data, labels=None, feature_names=None, feature_types=None):
             ]
         else:
             new_feature_types = feature_types
+    elif isinstance(data, list):
+        new_data = np.array(data)
+
+        if feature_names is None:
+            new_feature_names = ["feature_" + str(i) for i in range(new_data.shape[1])]
+        else:
+            new_feature_names = feature_names
+
+        if feature_types is None:
+            unique_counts = np.apply_along_axis(lambda a: len(set(a)), axis=0, arr=new_data)
+            new_feature_types = [
+                _assign_feature_type(feature_type, unique_counts[index])
+                for index, feature_type in enumerate(
+                    [new_data.dtype] * len(new_feature_names)
+                )
+            ]
+        else:
+            new_feature_types = feature_types
     elif isinstance(data, np.ndarray):
         new_data = data