-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Irqstat input plugin #2494
Irqstat input plugin #2494
Conversation
4968c00
to
0593ac4
Compare
Well, kinda. The proposed output in #2469 is very different from the output implemented here. To me the output here would not be very useful as you would have to know what each interrupt is on every host. A device might be IRQ X on host A, and IRQ Y on host B. This is the reason for including the |
I was considering making the IRQ values integers for that reason and I think it possible to just convert the strings to int before adding it to the map. I was mostly trying to mimic the behavior produced by the collectd IRQ plugin. |
@calerogers they should be integers |
bdb349d
to
e42b752
Compare
@sparrc I committed the change to convert the IRQ values to ints. @phemmer I didn't notice you put in a feature request around this with more in-depth plans on how to go about structuring the plugin, so sorry if this seems like it came out of nowhere. I have been kind of working on this (along with learning golang) on and off for last month when I've had free time. |
The schema proposed by @phemmer sees more useful to me, @calerogers do you want to switch over to this format? |
@danielnelson I agree the schema presented in #2469 is more useful, especially the individual IRQs as separate series and the IRQ values split up by CPU and total as fields. After testing this a bit I think there could be issues parsing the type and devices due to the fact that there is not a standardized delimiter between each column in the /proc/interrupts and /proc/softirqs files. It is an arbitrary amount of spaces seemingly based on the length of the values in the columns. I wouldn't be surprised if this was a reason the collectd IRQ plugin does not parse these. @phemmer did you have an idea around reliably parsing the varying degree of types and devices on different systems? Perhaps instead of trying to separate type and device, make one string |
Parsing should be very easy.
|
Yes doing that would be easy and what you have outlined was how I was going about it, but that is assuming the
Admittedly, I am not sure how varied |
The plugin reports IRQ metrics to telegraf based on IRQ instead of CPU. It now parses `/proc/softirqs` as a separate measurement. It also collects additional metrics including IRQ type, device and IRQ value total.
The latest commit updates the irqstat plugin to collect metrics based on the specification in #2469 |
032e704
to
8443251
Compare
1087fda
to
23e42e6
Compare
@phemmer any thoughts on the current code? |
Functionally I think it's sound. Mostly the stuff that was jumping out at me was code hygeine. But I'll leave that to @danielnelson, at least for the first round, as I tend to get a little pedantic. |
@danielnelson any feedback you have on the plugin would be welcomed. I definitely want to continue work on merging this in. Thanks! |
plugins/inputs/irqstat/irqstat.go
Outdated
for _, file := range files { | ||
data, err := ioutil.ReadFile(file) | ||
if err != nil { | ||
log.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use acc.AddError here
plugins/inputs/irqstat/irqstat.go
Outdated
var irqval, irqtotal int64 | ||
var irqtype, irqdevice string | ||
fields := strings.Fields(scanner.Text()) | ||
ff := fields[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check the length of fields to ensure this doesn't cause a panic.
plugins/inputs/irqstat/irqstat.go
Outdated
cpucount = len(fields) | ||
} | ||
|
||
if ff[len(ff)-1:] == ":" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use strings.HasSuffix
. Instead of doing the rest of the code in a block, use continue
:
plugins/inputs/irqstat/irqstat.go
Outdated
|
||
if ff[len(ff)-1:] == ":" { | ||
fields = fields[1:len(fields)] | ||
irqid := ff[:len(ff)-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use strings.TrimRight
plugins/inputs/irqstat/irqstat.go
Outdated
_, err := strconv.ParseInt(irqid, 10, 64) | ||
if err == nil { | ||
irqtype = fields[cpucount] | ||
irqdevice = strings.Join(fields[cpucount+1:], " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could panic if the expected fields are missing. Since we sliced off the first value, cpucount no longer represents what it is named, I think it needs to be renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I agree. In my mind cpucount still represents exactly what it is, which is how many CPUs are present on a system, regardless whether we sliced off the first value or not. The count of how many CPUs is how one determines the position of IRQ type and device depending on whether the IRQ parses as a number or not.
I think moving the first value slicing down to after where we set the irqtype and irqdevice and instead doing irqtype = fields[cpucount+1]
and increment by 1 the other places where fields[cpucount]
is referenced as well. So like something like this:
if err == nil && len(fields) > cpucount {
irqtype = fields[cpucount+1]
irqdevice = strings.Join(fields[cpucount+2:], " ")
} else if len(fields) > cpucount {
irqtype = strings.Join(fields[cpucount+1:], " ")
}
fields = fields[1:len(fields)]
for i := 0; i < cpucount; i++ {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about cpucount
, just make sure you do not index past where you have tested for.
plugins/inputs/irqstat/irqstat.go
Outdated
if i < len(fields) { | ||
irqval, err = strconv.ParseInt(fields[i], 10, 64) | ||
if err != nil { | ||
log.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error should either be returned from Gather, or use acc.AddError
plugins/inputs/irqstat/irqstat.go
Outdated
irq.Tags["type"] = irqtype | ||
irq.Tags["device"] = irqdevice | ||
irq.Fields["total"] = irqtotal | ||
if len(include) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine the conditions if len(include) == 0 || if stringInSlice(...) {
or have a function that handles the inclusion logic includeIRQ(IRQ) bool {
. We know if we want to include it earlier in the function so it could be nice to skip the work.
1: 7 3 IO-APIC-edge i8042 | ||
NMI: 0 0 Non-maskable interrupts | ||
LOC: 2338608687 2334309625 Local timer interrupts | ||
MIS: 0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some softirqs entries.
|
||
got := parseInterrupts(interruptStr, include) | ||
if len(got) == 0 { | ||
t.Fatalf("want %+v, got %+v", parsed, got) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorten up the assertions using testify: require(t, 0, len(got))
. Do this throughout the function.
plugins/inputs/irqstat/irqstat.go
Outdated
ID string | ||
Fields map[string]interface{} | ||
Tags map[string]string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type should use the domain appropriate names for an IRQ (device, type, total) or it should be removed and Accumulator should be passed into the parseInterrupts function. You might think this will make testing difficult, but you can use testutil.Accumulator for that.
Side note: this type is essentially an internal type called Metric, but unfortunately Accumulator doesn't accept it.
cfc37a3
to
84100b1
Compare
@danielnelson thank you for reviewing the code and for the feedback. Not sure if I should reply to each individual comment of yours but the latest commit I just pushed should address them. |
plugins/inputs/irqstat/irqstat.go
Outdated
const sampleConfig = ` | ||
## A list of IRQs to include for metric ingestion, if not specified | ||
## will default to collecting all IRQs. | ||
# include = ["0", "1", "30", "NET_RX"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use the telegraf tagpass
/tagdrop
feature. Just to keep the number of ways to do something down to a minimum.
plugins/inputs/irqstat/irqstat.go
Outdated
for _, file := range files { | ||
data, err := ioutil.ReadFile(file) | ||
if err != nil { | ||
acc.AddError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably do acc.AddError(fmt.Errorf("reading %s: %s", file, err))
. So that we know which file the error is for.
It also seems like we need a continue
here, to skip to the next file.
Ditto for line 107
plugins/inputs/irqstat/irqstat.go
Outdated
if i < len(fields) { | ||
irqval, err = strconv.ParseInt(fields[i], 10, 64) | ||
if err != nil { | ||
return irqs, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add context to the error, so that we know what failed to parse.
Maybe something like fmt.Errorf("unable to parse %q from %q: %s", fields[i], scanner.Text(), err)
.
plugins/inputs/irqstat/irqstat.go
Outdated
} | ||
|
||
func parseInterrupts(irqdata string, include []string) ([]IRQ, error) { | ||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look like this variable ever gets used. The only err
is on line 59, and that's a new variable that's defined within scope of the for
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is returned at the end of the parseInterrupts function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you're not using it. You might as well just return irqs, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now. Thanks. Updated the code and re-committed.
@phemmer I agree with just using tagpass / tagdrop (didn't know this feature existed) instead of keeping a different list, I call this out in |
d04978c
to
8f8d16d
Compare
@danielnelson and @phemmer are there any other things that should be addressed before this can be merged in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay, I haven't forgotten I'm just very busy.
These files are fairly tricky to parse, but I still feel like the complexity of the parse loop is too high and this makes it difficult to understand. I hope we can simplify it with some of these suggestions.
TASKLET: 205 0` | ||
|
||
parsed := []IRQ{ | ||
IRQ{ID: "0", Type: "IO-APIC-edge", Device: "timer", Values: map[string]interface{}{"CPU0": int64(134), "CPU1": int64(0), "total": int64(134)}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reformat these lines so they are closer to 78 chars.
plugins/inputs/irqstat/irqstat.go
Outdated
ID string | ||
Type string | ||
Device string | ||
Values map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this always use int64 as the value type? I think what I would rather see here is a []int64
for per cpu values and a Total int64
plugins/inputs/irqstat/irqstat.go
Outdated
} | ||
|
||
func init() { | ||
inputs.Add("irqstat", func() telegraf.Input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I should have raised earlier, but I'm unsure about this name. Maybe it should be "interrupts", what do you think @phemmer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No argument from me on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like consensus has been made, but since I was pinged directly: I think I would favor "interrupts". Name should be reflective of what it provides, not where it gets the data. Though one could argue that "irqstat" is short for "IRQ statistics", which is reflective of what it provides.
plugins/inputs/irqstat/irqstat.go
Outdated
_, err := strconv.ParseInt(irqid, 10, 64) | ||
if err == nil && len(fields) > cpucount { | ||
irqtype = fields[cpucount+1] | ||
irqdevice = strings.Join(fields[cpucount+2:], " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check that this is a valid index.
plugins/inputs/irqstat/irqstat.go
Outdated
var irqval, irqtotal int64 | ||
var irqtype, irqdevice string | ||
fields := strings.Fields(scanner.Text()) | ||
if len(fields) > 0 && fields[0] == "CPU0" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should scan the first line outside of the loop. This will reduce the complexity of this loop and prevent any strange effects if this matches somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the for scanner.Scan()
loop here? If so, how do you think I should go about parsing just the first line before actually initiating the Scan() function on the scanner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion on this matter, but you can call scanner.Scan()
once outside the loop.
plugins/inputs/irqstat/irqstat.go
Outdated
irq := NewIRQ(irqid) | ||
_, err := strconv.ParseInt(irqid, 10, 64) | ||
if err == nil && len(fields) > cpucount { | ||
irqtype = fields[cpucount+1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can all be directly assigned to the IRQ struct
plugins/inputs/irqstat/irqstat.go
Outdated
irqtype = strings.Join(fields[cpucount+1:], " ") | ||
} | ||
fields = fields[1:len(fields)] | ||
for i := 0; i < cpucount; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this loop before parsing type and device, this way the file is parsed left to right.
plugins/inputs/irqstat/irqstat.go
Outdated
for _, irq := range irqs { | ||
tags := map[string]string{"irq": irq.ID, "type": irq.Type, "device": irq.Device} | ||
fields := irq.Values | ||
if file == "/proc/softirqs" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this, write a function that takes a path, have it parse the file and return the IRQ struct, then add the fields with the appropriate measurement name. Something sort of like this:
irqs := gather("/proc/interrupts")
for irq := range irqs {
fields := fields(irq)
tags := tags(irq)
acc.AddFields("interrupts", fields, tags)
}
irqs = gather("/proc/interrupts")
for irq := range irqs {
fields := fields(irq)
tags := tags(irq)
acc.AddFields("soft_interrupts", fields, tags)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielnelson I am slightly confused on this comment. Which part of this should be avoided? The current function parseInterrupts()
takes in the content of each file passed to it in the first for loop as a string (irqdata
, instead of taking in a path string) and returns the IRQ struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part in particular makes me feel like this loop is not structured very well if file == "/proc/softirqs" {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielnelson ok, my only concern is that this will cause a duplication of several lines instead using the for loop over a list of file names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats okay, the pseudocode above could be improved on as well with a function AddIRQs(measurement, irqs)
37addcb
to
ad9ad8f
Compare
Reformatting interrupts_test.go. Updating IRQ struct
ad9ad8f
to
cddcd40
Compare
@danielnelson I tried to address your comments in this last commit. I decided not to do a function since there we were only two file to parse. Let me know if there is anything else. Thanks for the help and guidance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this to the CHANGELOG too
if err != nil { | ||
acc.AddError(fmt.Errorf("Reading %s: %s", "/proc/interrupts", err)) | ||
} | ||
irqdata := string(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ReadFile errors we need to skip processing the file. To avoid a lot of nesting you will probably want the function... The function could return an empty []IRQ so nothing will be ranged over.
Thanks! |
Required for all PRs: