Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow registering events to timelines using regexes #4375

Merged
merged 2 commits into from
Aug 20, 2018

Conversation

tadeboro
Copy link
Contributor

@tadeboro tadeboro commented Jul 29, 2018

Up until now, in order to register event type to the timeline, we needed to list all of the event types in settings yaml.

This commit adds another option: specify event type to group/level mapping using regular expressions, which should make mapping definitions much shorter for providers with many different event types
that follow a certain pattern.

Changes were made in two areas: in database querying and in event to category assignment.

When creating database condition for events, we now use three sources:

  1. a set of event types that should be included in result,
  2. a set of event types that should not be present in result and
  3. a set of regular expressions that event types can match.

An event is part of the result if its type is in include set or matches any of the regular expressions and is not part of the exclude set. For example, take this fragment of settings file:

:ems:
  :ems_dummy:
    :event_handling:
      :event_groups:
        :addition:
          :critical:
          - !ruby/regexp /^dummy_add_.+$/
        :update:
          :critical:
          - dummy_123_update
          - !ruby/regexp /^dummy_.+_update$/
          :detail:
          - dummy_abc_update
          - dummy_def_update
          - !ruby/regexp /^dummy_detail_.+_update$/

This translates into following three sources for :update category, assuming user does not want to display details:

  • include_set: dummy_123_update
  • exclude_set: dummy_abc_update, dummy_def_update
  • regexes: ^dummy_.+_updates$

In this case, events of type dummy_abc_update will not be displayed, since they match details and are thus put in exclude set. On the other hand, if user does want to see the details, things are a bit
different:

  • include_set: dummy_123_update, dummy_abc_update, dummy_def_update
  • exclude_set:
  • regexes: ^dummy_.+_updates$, ^dummy_detail_.+_update$

All this is done in order to preserve the existing behavior.

Category assignment is done in two phases. First, we try to assign category only looking at the full name of event. If this yields no category, we try to assign one by matching against regular expressions. If we continue with categories from example above, we would get those mappings:

  • dummy_123_update is mapped to :update. It also matches :addition category, but since explicit names have higher priority, that regular expression is not checked at all.
  • dummy_nil_update is mapped to :update because it matches one of the :critical expressions.
  • dummy_add_event is mapped to :addition.

This pull depends on ManageIQ/manageiq#17772.

@miq-bot add_label wip

/cc @miha-plesko @gberginc

@miq-bot miq-bot changed the title Allow registering events to timelines using regexes [WIP] Allow registering events to timelines using regexes Jul 29, 2018
@miq-bot miq-bot added the wip label Jul 29, 2018
@tadeboro tadeboro force-pushed the regex-timelines branch 2 times, most recently from 4c2c2c6 to 3bfbed0 Compare July 29, 2018 20:34
@miha-plesko
Copy link
Contributor

Copy link
Contributor

@miha-plesko miha-plesko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great, only two small suggestions

params2 = where_clause.slice(1, where_clause.length - 1)
params = params.concat(params2)
@report.where_clause = [cond, *params]
condition = "(" + conditions.reduce { |a, c| a + ") and (" + c } + ")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A matter of preference, but I prefer meaningful argument names to a, b, c as well as string interpolation to suming up strings. For example:

.reduce { |cond, res| "(#{cond}) and #{res}" }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think giving variables a longer names would be counter-productive here (like naming iterator variable in C for loop iterator), since their whole scope is a single line. Besides, using a for accumulator seems to be a standard practice in such short reduce statements.

As for the implementation of reduce operation, I can use interpolation. I have no preference whatsoever.

to_dt]
end
cond << where_clause[0]
tl_add_event_type_conditions(conditions, parameters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename method to tl_add_event_type_conditions! to emphasise it modifies arguments inplace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not feel the need to add ! at the end of the method name, since there is no alternative method that would not modify its arguments. But I can add it if that would make things better.

@miq-bot
Copy link
Member

miq-bot commented Jul 30, 2018

This pull request is not mergeable. Please rebase and repush.

@tadeboro
Copy link
Contributor Author

@miq-bot remove_label wip

@miq-bot miq-bot changed the title [WIP] Allow registering events to timelines using regexes Allow registering events to timelines using regexes Aug 10, 2018
@miq-bot miq-bot removed the wip label Aug 10, 2018
@@ -102,42 +102,54 @@ def tl_build_timeline_report_options
end

temp_clause = @tl_record.event_where_clause(@tl_options.evt_type)
conditions = [temp_clause[0], "timestamp >= ?", "timestamp <= ?"]
parameters = temp_clause[1..-1] + [from_dt, to_dt]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but instead of temp_clause[0] and temp_clause[1], for readability let's give them names, and set them with decomposition, so maybe...

conditions, *parameters = @tl_record.event_where_clause(@tl_options.evt_type)
conditions += ["timestamp >= ?", "timestamp <= ?"]
parameters += [from_dt, to_dt]

params2 = where_clause.slice(1, where_clause.length - 1)
params = params.concat(params2)
@report.where_clause = [cond, *params]
condition = "(" + conditions.reduce { |a, c| "#{a}) and (#{c}" } + ")"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the SQL building...there has to be a way to leverage ActiveRecord relation chains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I do recognize that the original code also did some level of building SQL as well, which is also not good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will dig deeper and determine what report does with that SQL where clause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked and report is just an MiqReport instance... MiqReport#where_clause is a string column that must get used later. I'm fine with this for now, considering your PR gets the SQL building down to basically just 2 lines. I still think there;s gotta be a way to build it using ActveRecord though :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer... 😄

query = @report.db.constantize.where("timestamp >= ?", from_dt).where("timestamp <= ?", to_dt)
query.arel.where_sql
# => "WHERE (timestamp >= '2018-08-05 15:55:43.491733') AND (timestamp <= '2018-08-10 15:55:43.492248')"

def tl_add_event_type_inclusions(conditions, parameters)
expressions = []
@tl_options.get_set(:regexes).each do |regex|
expressions << "event_type ~ ?"
Copy link
Member

@Fryguy Fryguy Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that all Ruby regexps can be SQL regexps, which I don't believe is true. But maybe that's acceptable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL's support is quite powerful. I will compare the capabilities of the Ruby's vs. PostgreSQL's regular expressions and report back. If nothing else, we can document the limitations somewhere and inform settings authors what can and what cannot be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked this and regular expressions that PostgreSQL accepts are a subset of those that Ruby accepts. Maybe simply documenting that regular expressions in settings.yml must be valid PostgreSQL expressions would be enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things...

  1. If possible, we may be able to ensure this via specs. That is, we can create a spec in core that goes through all event_groups definitions, and 1-by-1 and makes a query. I would expect it to blow up for invalid regexes. Then we can inform the developer that they've built a regex that cannot be used via SQL. So somethig like...

    describe EventStream do
      describe ".event_groups" do
        it "regexes must be usable in SQL queries" do
          ...
          strings, regexes = group_data[lvl].partition { |typ| typ.kind_of?(String) }
          ...
          regexes.each do |r|
            expect { EventStream.where("event_type ~ ?", r.source).to_a }.to_not raise_error
          end
        end
      end
    end
  2. In writing the above, I found that you cannot just embed a regex directly as a parameter, so the code as is will likely not work. That is, this code does not work:

    EventStream.where("event_type ~ ?", /\w+/).first
    # => TypeError: can't quote Regexp

    Instead, ActiveRecord says you have to give a String. I found that calling .source gives the guts of the Regexp but ignores the modifiers. I think ignoring the modifiers might be ok for a first pass, but we may have to write our own Ruby Regexp to PostgreSQL regexp converter to deal with the modifiers. In particular, case insensitivity might be important (though PostgreSQL can also handle that with an alternate operator": ~*). So, anyway, this works:

    EventStream.where("event_type ~ ?", /^\w+/.source).first
    #  EventStream Load (0.7ms)  SELECT  "event_streams".* FROM "event_streams" WHERE (event_type ~ '^\w+') ORDER BY "event_streams"."id" ASC LIMIT $1  [["LIMIT", 1]]
    #  EventStream Inst Including Associations (68.0ms - 1rows)
    # => #<MiqEvent:0x00007f83bbbb8118 id: 21000000000001, event_type: "login_failed", ... >

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should I put the specs: in the core repo where the EventStream lives or in the UI repo?

As for the regexp modifiers, Ruby currently supports four, but the only one that would be of any use in our case is i that controls case sensitivity. And adding support for this option is quite easy, we simply need to use proper PostgreSQL operator as you already stated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "check the event_groups spec" should be in the core repo. Ultimately it should be in the provider repos so that provider authors get a failure when they add it, but I'm not sure how to do that. Maybe we can have the spec in core, but run it in the provider repos somehow?

parameters << includes
end

condition = expressions.reduce { |a, e| a + ") or (" + e }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but isn't this just expressions.join(") or (")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the above reduce with ") and ("

end

condition = expressions.reduce { |a, e| a + ") or (" + e }
conditions << "(" + condition + ")" unless condition.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer interpolation... conditions << "(#{condition})" if condition

}

group_levels.each do |lvl|
parts = group_data[lvl]&.partition { |typ| typ.kind_of?(String) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give the parts names, so that the later parts[0], parts[1] are readable...

strings, regexes = group_data[lvl]&.partition { |typ| typ.kind_of?(String) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a, b = nil would break, but I was wrong.

end

def get_set(name)
categories.values.reduce([]) { |a, v| a.push(*v[name]) }
Copy link
Member

@Fryguy Fryguy Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer categories.values.flat_map { |v| v["name"] }

end&.last.try(:[], :display_name)

bucket_name ||= mri.rpt_options[:categories].find do |_, options|
options[:regexes].find { |regex| regex.match(event.event_type) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other PR, prefer detect.

@tadeboro tadeboro force-pushed the regex-timelines branch 3 times, most recently from b44f2a3 to f6a6c75 Compare August 13, 2018 11:45
def tl_add_event_type_inclusions(conditions, parameters)
expressions = []
@tl_options.get_set(:regexes).each do |regex|
expressions << "event_type ~ ?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things...

  1. If possible, we may be able to ensure this via specs. That is, we can create a spec in core that goes through all event_groups definitions, and 1-by-1 and makes a query. I would expect it to blow up for invalid regexes. Then we can inform the developer that they've built a regex that cannot be used via SQL. So somethig like...

    describe EventStream do
      describe ".event_groups" do
        it "regexes must be usable in SQL queries" do
          ...
          strings, regexes = group_data[lvl].partition { |typ| typ.kind_of?(String) }
          ...
          regexes.each do |r|
            expect { EventStream.where("event_type ~ ?", r.source).to_a }.to_not raise_error
          end
        end
      end
    end
  2. In writing the above, I found that you cannot just embed a regex directly as a parameter, so the code as is will likely not work. That is, this code does not work:

    EventStream.where("event_type ~ ?", /\w+/).first
    # => TypeError: can't quote Regexp

    Instead, ActiveRecord says you have to give a String. I found that calling .source gives the guts of the Regexp but ignores the modifiers. I think ignoring the modifiers might be ok for a first pass, but we may have to write our own Ruby Regexp to PostgreSQL regexp converter to deal with the modifiers. In particular, case insensitivity might be important (though PostgreSQL can also handle that with an alternate operator": ~*). So, anyway, this works:

    EventStream.where("event_type ~ ?", /^\w+/.source).first
    #  EventStream Load (0.7ms)  SELECT  "event_streams".* FROM "event_streams" WHERE (event_type ~ '^\w+') ORDER BY "event_streams"."id" ASC LIMIT $1  [["LIMIT", 1]]
    #  EventStream Inst Including Associations (68.0ms - 1rows)
    # => #<MiqEvent:0x00007f83bbbb8118 id: 21000000000001, event_type: "login_failed", ... >

@tadeboro
Copy link
Contributor Author

I created new pull request ManageIQ/manageiq#17867 that adds tests for event type mapping. Tests should catch most of the issues that provider authors introduce into codebase. Unfortunately, those tests are blacklisted in provider repos, so the provider authors will need to manually run those tests for now.

@tadeboro
Copy link
Contributor Author

As for the regular expression options, I added support for the case insensitive expressions, since those may be useful for provider authors and they do not introduce much complexity. Support for case sensitivity option is already present in event type mapping tests.

@Fryguy
Copy link
Member

Fryguy commented Aug 16, 2018

LGTM, need someone from UI team to review.

@skateman
Copy link
Member

skateman commented Aug 17, 2018

@tadeboro as far as I see you're doing some refactoring before adding the regexp support. Could you please split this up into at least two separate commits so I can untangle it a little bit easier? Thanks...

Tadej Borovšak added 2 commits August 17, 2018 10:02
We construct filter in two steps:

 1. we collect the conditions and
 2. join them using and operator.

This reduces the amount of string manipulation and conditional
statements considerably.
Up until now, in order to register event type to the timeline, we
needed to list all of the event types in settings yaml.

This commit adds another option: specify event type to group/level
mapping using regular expressions, which should make mapping
definitions much shorter for providers with many different event types
that follow a certain pattern.

Changes were made in to areas: in database querying and in event to
category assignment.

When creating database condition for events, we now use three sources:

 1. a set of event types that should be included in result,
 2. a set of event types that should not be present in result and
 3. a set of regular expressions that event types can match.

An event is part of the result if its type is in include set or
matches any of the regular expressions and is not part of the exclude
set. For example, take this fragment of settings file:

    :ems:
      :ems_dummy:
        :event_handling:
          :event_groups:
            :addition:
              :critical:
              - !ruby/regexp /^dummy_add_.+$/
            :update:
              :critical:
              - dummy_123_update
              - !ruby/regexp /^dummy_.+_update$/i
              :detail:
              - dummy_abc_update
              - dummy_def_update
              - !ruby/regexp /^dummy_detail_.+_update$/

This translates into following three sources for :update category,
assuming user selected :critical level in UI:

  * include_set: dummy_123_update
  * exclude_set: dummy_abc_update, dummy_def_update
  * regexes: ^dummy_.+_updates$

In this case, events of type dummy_abc_update will not be displayed,
since they match details and are thus put in exclude set. On the
other hand, if the user selected :critical and :detail levels, things
are a bit different:

  * include_set: dummy_123_update, dummy_abc_update, dummy_def_update
  * exclude_set:
  * regexes: ^dummy_.+_updates$, ^dummy_detail_.+_update$

Regular expressions should be accepted by the PostreSQL DBMS and
should not contain Ruby-specific functionality. The only exception to
this rule is case sensitivity, which can make regular expressions a lot
simpler in certain situations.

Category assignment is done in two phases. First, we try to assign
category only looking at the full name of event. If this yields no
category, we try to assign one by matching against regular
expressions. If we continue with categories from example above, we
would get those mappings:

  * dummy_123_update is mapped to :update. It also matches :addition
    category, but since explicit names have higher priority, that
    regular expression is not checked at all.
  * dummy_nil_update is mapped to :update because it matches one of
    the :critical expressions.
  * dummy_add_event is mapped to :addition.

All this is done in order to give fully spelled-out event type
mappings higher priority than regular expression ones and to be
consistent with how EventStream classifies events into groups and
levels.
@miq-bot
Copy link
Member

miq-bot commented Aug 17, 2018

Some comments on commits xlab-si/manageiq-ui-classic@5e54ac2~...d961141

spec/lib/report_formater/timeline_spec.rb

  • ⚠️ - 238 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.
  • ⚠️ - 246 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.
  • ⚠️ - 253 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.

@miq-bot
Copy link
Member

miq-bot commented Aug 17, 2018

Checked commits xlab-si/manageiq-ui-classic@5e54ac2~...d961141 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
5 files checked, 0 offenses detected
Everything looks fine. 🍪

@tadeboro
Copy link
Contributor Author

@skateman Any refactoring was just a side-effect of trying to get the thing working;) But you are right, the changes indeed do look like they should be in two different commits.

@skateman
Copy link
Member

@tadeboro can we schedule a meeting where you show me this in action?

Copy link
Member

@skateman skateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Seal of Approval

@mzazrivec mzazrivec self-assigned this Aug 20, 2018
@mzazrivec mzazrivec added this to the Sprint 93 Ending Aug 27, 2018 milestone Aug 20, 2018
@mzazrivec mzazrivec merged commit 8c82b71 into ManageIQ:master Aug 20, 2018
@tadeboro tadeboro deleted the regex-timelines branch August 29, 2018 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants