-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows: changed method for retrieving Windows network statistics in case of awsvpc network mode #3425
Conversation
|
||
// We would be using GetIfTable2Ex and FreeMibTable Win32 APIs from IP Helper. | ||
moduleIPHelper := windows.NewLazySystemDLL("iphlpapi.dll") | ||
procGetIfTable2Ex := moduleIPHelper.NewProc("GetIfTable2Ex") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we retrieving the entire table and not the specific interface row?
https://learn.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-getifentry2ex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah we have been searching for this particular method all along. Unfortunately, we were not aware about the same.
I'll make the changes to use GetIfEntry2Ex
API.
agent/stats/task_windows.go
Outdated
cancel() | ||
return nil, errors.Wrapf(err, "failed to initialise stats task container") | ||
} | ||
ifaceLUID[index] = ifEntry.InterfaceLUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still dont quite understand. Cant we just use ConvertInterfaceIndexToLuid
for this and then be done? We dont need to do an entire mib table and filter it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the thing is that we do not store interface index for the task network adapters. There can be two cases here-
- We query the interface index first using
net.Interfaces
and then useConvertInterfaceIndexToLuid
in the stats module itself. This leads to two Win32 calls. Now,net.Interfaces
would in turn callGetAdaptersAddresses
which fetches all the addresses associated with the adapters. I feel that a single call toGetIfTable2Ex
might be more performant than these two calls. - We store the ENI's interface index along with Link name during the initial task attachment workflow. Now, the issue here is that if the interface index becomes stale for some reason say, instance reboot, we would not have any mechanism to identify or refresh the same.
@@ -87,6 +86,7 @@ func newTestWatcher(ctx context.Context, primaryMAC string, state dockerstate.Ta | |||
notificationChannel := make(chan int) | |||
err := eniMonitor.Start(notificationChannel) | |||
if err != nil { | |||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine but we dont even use ctx until the &ENIWatcher below .If we did not early alloc it we would avoid this case entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically move ln 84 to 92
// GetMIBIfEntryFromDeviceName return the MIB_IF_ROW2 object for the queried device. | ||
// This method would internally invoke the GetIfTable2Ex Windows API to get complete interface list | ||
// from which the queried device would be filtered. | ||
func (utils *networkUtils) GetMIBIfEntryFromDeviceName(device string) (*MibIfRow2, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at my other comment using the LUID. I think we can remove this entirely.
} | ||
|
||
// getIfEntry2Ex invokes GetIfEntry2Ex Win32 API to retrieve the MIB_IF_ROW2 for the interface with given LUID. | ||
func getIfEntry2Ex(ifRow *MibIfRow2) (*MibIfRow2, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we add this extra function just for unit testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we did :)
} | ||
|
||
// TestGetMIBIfEntryFromLUID tests the GetMIBIfEntryFromLUID method. | ||
func TestGetMIBIfEntryFromLUID(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we went to all the trouble here lets add the negative test if we return error from the call too.
agent/eni/watcher/watcher_windows.go
Outdated
@@ -65,6 +65,7 @@ func newWatcher(ctx context.Context, | |||
notificationChannel := make(chan int) | |||
err := eniMonitor.Start(notificationChannel) | |||
if err != nil { | |||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, move ln 62 to 72 and then we dont have to handle this case.
devices := make([]string, len(taskENIs)) | ||
ifaceLUID := make([]uint64, len(taskENIs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have multiple eni's per task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the case with trunk ENI I think.
@@ -108,14 +85,16 @@ func TestTaskStatsCollection(t *testing.T) { | |||
TaskMetadata: &TaskMetadata{ | |||
TaskArn: taskId, | |||
ContainerPID: containerPID, | |||
DeviceName: []string{"Ethernet 3"}, | |||
DeviceName: []string{deviceName}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using DeviceName? Isnt the ACS api based on mac and then we find index? Once we find Index we can get LUID. And from there we are good to go for basically all win32 api's. Why use name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that is the current workflow which is shared with the Linux. We store the stats corresponding to each device name. Reference:
amazon-ecs-agent/agent/stats/task.go
Line 151 in 67240c1
networkStats, err := taskStat.retrieveNetworkStatistics() |
Regarding fetching interface index, I have mentioned the same in earlier comment.
Presently, we have setter methods present in the networkutils and watcher package instead of having mock methods for the same. This is inconsistent with rest of the agent package and goes against Go guidelines.
Currently, we are running a Powershell method (Get-NetAdapterStatistics) to retrieve the network statistics in case of awsvpc network mode. However, this can cause CPU spikes on smaller instance types. Therefore, we are switching the method to use native Win32 API for retrieving the same. As part of this change, we have added two new APIs to existing NetworkUtils interface applicable for Windows. - ConvertInterfaceAliasToLUID: Returns the LUID corresponding to the given interface alias. - GetMIBIfEntryFromLUID: Returns the MIB_IF_ROW for the specified LUID. Windows API reference: https://learn.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-convertinterfacealiastoluid https://learn.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-getifentry2ex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
Previously, we were running a Powershell method (Get-NetAdapterStatistics) to retrieve the network statistics when the task is using
awsvpc network mode
. However, this can cause CPU spikes on smaller instance types.Another method to obtain the network statistics would be access them using Win32 API. This method is significantly more performant than using Powershell cmdlet. Therefore, we are switching the agent implementation to use native Win32 API for retrieving the same.
Implementation
When we initialise the stats module for a given task, we will query and store the
interface LUID
of the task ENI. During the subsequent calls to retrieve the network statistics, we will query the statistics specific to the interface using it's LUID.In order to accomplish the same, we are introducing two new APIs in
NetworkUtils
interface would be the preferred way to access Windows networking APIs in the agent. The two APIs are-ConvertInterfaceAliasToLuid
Win32 API to convert interface alias into LUID.GetIfEntry2Ex
Win32 API to obtain the specific interface which has given LUID.Reference:
Note: As part of this PR, refactoring has been done to improve the code quality of existing Windows workflow.
Setter
method fromnetworkUtils
packageSetter
method fromwatcher
packagenetworkUtils
package and used the same inwatcher
tests instead of object injection.Testing
Tested the new agent using a custom AMI.
Tested it out on smaller instance types for any periodic spikes.
New tests cover the changes:
Yes
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.