-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-78] Proposal for Advanced Types #17852
Comments
Thank you for this detailed proposal! I think this includes a LOT of fantastic ideas/potential. I may be misunderstanding some intent/approach here, but I'd love to see these business types (and validators, etc) contributed to the core codebase, rather than as some form of configuration, since I can see how a library of these business types might grow/evolve and add value to all sorts of users/orgs. I feel like changes to config should be limited to enabling/disabling/overriding/extending these business types, but the default should be to define them in a way that adds value for everyone, unless it's something super obscure/proprietary. Thinking ahead further on that, it may be worth considering (at some point) making this a plugin-based architecture, if we expect there to be a proliferation of business types that Superset maintainers might want to add/remove/contribute. The only reason I consider this worth mentioning here and now is that we may have a similar problem to consider with things like formatters (e.g. supporting all the currencies of the world) which could use some organization/scalability efforts in a similar way. One tiny nit you're probably way ahead of - the Business Type selector in the Dataset modal should probably be a Select component, and display those enabled business types (presumably populated by My biggest question is how all of this fits into any known/proposed/speculated plans for Superset's semantic layer, to make sure it intersects well with any potentially related efforts (e.g. how we build drill-down/across/through). I'll solicit some feedback on this front, and hopefully more folks will chime in on the thread. Again, this is great work... thank you for all of the hard work and consideration that's already gone into this! |
This is great! A couple comments:
|
Thanks for the feedback @betodealmeida @rusackas I really appreciate the interest! So first addressing the idea of having a plugin system, we are 100% on board and had that in mind as well. We were just unsure if introducing this idea might be a separate SIP, extra feature or something long those lines. This could also address the idea of having some of these validators / types / etc be pre-canned and but also extensible as you both have mentioned, I think this is certainly a good idea. in in addition I was unaware about the changes to at this time I do not have a ton of insight into how this fits into known/proposed/speculated plans for Superset's semantic layer but am willing to take on any feedback in regards to this. Thanks again for the feedback! |
What is proposed here seems to be an extension of data types. I'd probably call it |
For sure, I'm fine with the name change. I think the original name was in reference to business logic, however |
While this will definitely be a very useful feature, we need to be careful with the implementation here. From a user experience point of view, I think it'd be a better experience if we simply combine the Data Type and Advanced Data Type column into one dropdown and allow users to coerce column types to any standard and advanced types (similar to Tableau). We should also make it very clear when values are displayed as originals and when are they translated, otherwise it could cause confusions. For example, it seems currently the filter search and dropdown values can be a mix of raw values and the translated values, which feels fragile and could produce unexpected results depending on the data types. However, I do agree current implementation in the POC is the most straightforward and we can always iterate as long as this is behind a feature flag. |
@ktmud I agree that there is a lot of complexity in this feature which needs to be iterated on to make sure we get it fully right (@zhaoyongjie raised similar concerns). For that reason we'll keep the API and general functionality open to breaking changes until the full set of features has been completed. Which reminds me, I'll open a PR today to migrate the feature flag docs to the official documentation and add a few lines warning about the possibility of breaking changes during development: https://github.com/apache/superset/blob/master/RESOURCES/FEATURE_FLAGS.md |
Closing as done/implemented, since it's already merged. If there's more to do on this on the immediate horizon, just say the word, and we can re-open it if that makes sense. |
[SIP-78] Proposal for Advanced Types
Motivation
Our users are leveraging Superset to explore and visualize network security logs. These logs often contain IP, CIDR range, port, hostname, and/or protocol fields, among others. While such fields might be stored as simple INTEGERS or VARCHARS in the backend database, these fields also have a “advanced” type which, if leveraged, can provide additional context and improve the user experience.
User-Friendly Display
A field’s advanced type can influence how that field is interacted with by the users. Consider, for example, an IP field. For performance reasons, we store IP values as INTEGERs in our DW e.g. the IP “127.0.0.1” would be stored as its integer value 2130706433. However, the string representation is substantially more human-readable and, as such, we wish to display IP fields in dot-decimal notation in visualizations and filters. This numeric format is fairly standard (link)and having this display functionality would be valuable in superset in case the underlying DB does not support this conversion functionality out of the box. As another example, a Port field stored as an integer (e.g., 443) could be displayed as a protocol name where such mapping is known (e.g., HTTPS).
Applicable Operators for filters
Certain operators (e.g.,
LIKE
,=
,<
,IS NULL
,IN
, etc.) may be valid for a database type, but may not be applicable to a advanced type. For example, an organization might decide that theLIKE
operator is not relevant to Port fields. Ideally, we want to be able to configure a list of applicable operators that will appear for a field of a certain advanced type.Input Validation
Advanced type can be associated with (configurable) input validation and type conversion rules. For example, an input validator for a Port field could validate that the filter string is a parsable number between 0 and 65536, or that it contains known port names such as HTTPS, SSH, FTP.
Human Readable input values
As with displaying the human readable format of a value to the user (i.e the dot notation of an IP) users should be able to enter the more readable form of a value and have it be applied properly to a query. This would mean a user could create a filter such as
{col: "ip_col", op: "IN", val: [“1.1.1.1”, “1.1.1.2”]}
and have the query be executed asip_col IN [16842752, 16842753]
. The functionality for a user to simply enter{col: "ip_col", op: "IN", val: [16842752, 16842753]}
should also be preserved.Behavior specified in the config
The code to specify the behavior of each advanced type should be defined outside of the superset codebase (i.e the config). This way those who wish to implement a advanced type may do so freely without having to make any code changes to superset.
Proof of concept
Figure 1: a video displaying a proof of concept for this proposal. a version of the modified filters popover is shown. The user inputs both the human readable and numeric values for ports and the conversion is shown. Also it is shown that the user cannot save an invalid value,
abc
in this case. The user then saves and submits a valid filter and runs the query. the query is then displayed.Demo.Port.mp4
Proposed Changes
We propose that the concept of an “advanced type” be introduced to dataset columns in Superset and that this advanced type be leveraged to allow display formats, applicable operators, and validation rules to be added to fields of a particular advanced type. Advanced types would be a purely optional attribute which could be specified on columns; if a advanced type is not specified for a particular column, there would be no changes to how a column is displayed, what operators are available, or what validation is performed on filter inputs.
Adding the advanced type to dataset
Introduce the concept of a advanced type on dataset columns
Figure 2: Edit Datasource modal with Advanced Type field
Allow applicable operator sets to be defined for each advanced type
The Explorer UI has an ad-hoc filter which lets the user enter a condition on a column, a value and an operator (>, =, LIKE, IS NULL, etc).
Currently all operators are available to the user. We propose returning a list of applicable operators in the response payload. The list of applicable operators should be configurable for each custom advanced type. For maximum flexibility, the applicable operators should be tied to the advanced type.
Figure 3: Current list of operators (left) vs customized list of operators (right)
Allow input validation and value conversion on advanced type
Currently any input string can be entered into an ad-hoc filter. We propose exposing a server REST endpoint to validate input strings and also return the "normalized" representation. The normalized representation should be accessible to the user so they can verify the conversion. This could be displayed via a tooltip in order to minimize UI changes to the control.
Example inputs
An input validator for a PORT could validate that the string is a parsable number between 0 and 65536 or that it contains known port names such as HTTPS, SSH, FTP. An input validator for a DOMAIN could validate that the string is parsable as a domain and contains only alphanumeric characters, periods (.), and dashes (-).
It should be possible to add additional input validators via a configuration like config.py.
Figure 3: Error message for a filter value that fails validation for a field of type Port
Allow custom behavior for each operator on a advanced type
Ad-hoc filters currently work as would be expected on the backend: a filter is translated directly into an SQL statement that matches the filter, so the filter
{col: "ip_col", op: "IN", val: "1.1.1.1/16"}
is translated to the statementip_col IN '1.1.1.1/16'
. Based on the idea of advanced types, we want the value'1.1.1.1/16'
to first be converted to a decimal range (16842752 - 16908287) as covered above, but just doing the conversion is not enough for this range type as we would get something like this in the best case:ip_col IN [16842752, 16842753, ..., 16908287]
. We would more likely want this:ip_col >= 16842752 AND ip_col <= 16908287
. In order to achieve this, we propose making it possible to specify custom behavior for each defined operator associated with an advanced type. This would allow a lot of flexibility for each type.Allow for custom display values
Columns with advanced types should (optionally) be displayed in their human readable format (i.e dot decimal notation for IPs even though they are stored as integers). In order to accomplish this, we should generate a list of user-friendly values based on the target column and append it to the result set or replace the original column with these new values. This could be accomplished in the
filter
insuperset/views/core.py
. We should optimally clean up some technical debt in the backend code by porting it to the v1 API, and while doing so we can introduce the additional functionality needed for this feature (two birds with one stone).New or Changed Public Interfaces
Changes to the config
In order to implement advanced logic for advanced types we propose adding a collection of AdvancedType objects. These will define the available advanced types and their functionalities. The proposed class will be as follows are as follows:
description:
verbose_name:
Applicatable_types
GenericDataType
types that this advanced type applies to. Using the earlier example of IPs stored as ints, that advanced type would only function on integer value columnstype_translation:
filter_translation:
Model changes
BaseColumn model
Change SqlaTable's get_sql_query function inside the filter loop so that if a advanced type is present on a column, the operator logic is overridden by an advanced type’s operator logic (via callback defined in advanced_type_filter_translations). This callback’s response will then be appended to the where_and list to produce the query.
Addition of the v1/advanced_types/ endpoints
In order to use any of this logic in the front end we will need to define a set of API endpoints for advanced types. Under this section we would define the following endpoints:
[GET] v1/advanced_type/convert
[GET] v1/advanced_type/types
UI changes
The AdhocFilterEditPopoverSimpleTabContent will need to be modified. If an advanced type is present on a selected column, we use the
v1/advanced_type/convert
endpoint to retrieve: an input’s valid/invalid status, the list of operators to display, and the currently selected comparator’s converted value. Thevalid_operators
list will need to override the current operators list defined by theisOperatorRevelant
function. A tooltip component will also need to be added to the component to display the returned converted value. No behavioral changes should be present if a column without a advanced type is selected.Migration Plan and Compatibility
A database migration is required. This will only effect the BaseColumn model. We will need to add a varchar field to the columns table.
in order to maintain compatibility we will insure the advanced type column in nullable as well as hiding changes behind the ENABLE_ADVANCED_TYPES feature flag.
Rejected Alternatives
no alternatives have been strongly considered at this time
Note
Please note that the API endpoints are subject to change during the development process. This was the first logical attempt
The text was updated successfully, but these errors were encountered: