Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix improve transport setup logic issue #818

Merged
merged 1 commit into from
Jun 30, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 24 additions & 30 deletions pkg/transport/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ func (tm *Manager) initTransports(ctx context.Context) {
tpID = entry.Entry.ID
)
isInitiator := tm.n.LocalPK() == entry.Entry.Edges[0]
if _, err := tm.saveTransport(remote, isInitiator, tpType, entry.Entry.Label); err != nil {
if _, err := tm.saveTransport(ctx, remote, isInitiator, tpType, entry.Entry.Label); err != nil {
Copy link
Contributor

@i-hate-nicknames i-hate-nicknames Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, you moved dialing part from SaveTransport to saveTransport. The naming is probably not the best, but SaveTransport is called via rpc and api, and is essentially "I want to add new transport to there".
saveTransport is actually "save this transport and run it". Here it is saved when a transport entry is obtained from transport discovery. Original implementation includes measures to prevent both ends of transport of dialing each other (initially by using PK value and recently by using initiator logic). Currently, it seems that both ends can get into dialing state easily. I'm not sure if this won't introduce weird connectivity issues when a visor comes online with a number of transports for him in TD, while other ends of them are also online.
Can you please elaborate on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation that you explain is same as that I have a visor with a served transport (for example a VPN as dmsg), and get offline, and get back online after two hours for example. For now in initTransport, that called when visor get back online, the Serve run again, but without dialing discovery, that means transport(s) added to visor both in online or offline status.

tm.Logger.Warnf("INIT: failed to init tp: type(%s) remote(%s) tpID(%s)", tpType, remote, tpID)
} else {
tm.Logger.Debugf("Successfully initialized TP %v", *entry.Entry)
Expand Down Expand Up @@ -304,39 +304,14 @@ func (tm *Manager) SaveTransport(ctx context.Context, remote cipher.PubKey, tpTy
}

for {
mTp, err := tm.saveTransport(remote, true, tpType, label)
if err != nil {
return nil, fmt.Errorf("save transport: %w", err)
}
mTp, err := tm.saveTransport(ctx, remote, true, tpType, label)

tm.Logger.Debugf("Dialing transport to %v via %v", mTp.Remote(), mTp.netName)

if err = mTp.Dial(ctx); err != nil {
tm.Logger.Debugf("Error dialing transport to %v via %v: %v", mTp.Remote(), mTp.netName, err)
// This occurs when an old tp is returned by 'tm.saveTransport', meaning a tp of the same transport ID was
// just deleted (and has not yet fully closed). Hence, we should close and delete the old tp and try again.
if err != nil {
if err == ErrNotServing {
if closeErr := mTp.Close(); closeErr != nil {
tm.Logger.WithError(err).Warn("Closing mTp returns non-nil error.")
}
tm.deleteTransport(mTp.Entry.ID)
continue
}

// This occurs when the tp type is STCP and the requested remote PK is not associated with an IP address in
// the STCP table. There is no point in retrying as a connection would be impossible, so we just return an
// error.
if isSTCPTableError(remote, err) {
if closeErr := mTp.Close(); closeErr != nil {
tm.Logger.WithError(err).Warn("Closing mTp returns non-nil error.")
}
tm.deleteTransport(mTp.Entry.ID)
return nil, err
}

tm.Logger.WithError(err).Warn("Underlying transport connection is not established, will retry later.")
return nil, fmt.Errorf("save transport: %w", err)
}

return mTp, nil
}
}
Expand All @@ -347,7 +322,7 @@ func isSTCPTableError(remotePK cipher.PubKey, err error) bool {
return err.Error() == fmt.Sprintf("pk table: entry of %s does not exist", remotePK.String())
}

func (tm *Manager) saveTransport(remote cipher.PubKey, initiator bool, netName string, label Label) (*ManagedTransport, error) {
func (tm *Manager) saveTransport(ctx context.Context, remote cipher.PubKey, initiator bool, netName string, label Label) (*ManagedTransport, error) {
tm.mx.Lock()
defer tm.mx.Unlock()
if !snet.IsKnownNetwork(netName) {
Expand Down Expand Up @@ -388,6 +363,25 @@ func (tm *Manager) saveTransport(remote cipher.PubKey, initiator bool, netName s
}
}
}

tm.Logger.Debugf("Dialing transport to %v via %v", mTp.Remote(), mTp.netName)
if err := mTp.Dial(ctx); err != nil {
tm.Logger.Debugf("Error dialing transport to %v via %v: %v", mTp.Remote(), mTp.netName, err)
// The first occurs when an old tp is returned by 'tm.saveTransport', meaning a tp of the same transport ID was
// just deleted (and has not yet fully closed). Hence, we should close and delete the old tp and try again.
// The second occurs when the tp type is STCP and the requested remote PK is not associated with an IP address in
// the STCP table. There is no point in retrying as a connection would be impossible, so we just return an
// error.
if err == ErrNotServing || isSTCPTableError(remote, err) {
if closeErr := mTp.Close(); closeErr != nil {
tm.Logger.WithError(err).Warn("Closing mTp returns non-nil error.")
}
tm.deleteTransport(mTp.Entry.ID)
}
tm.Logger.WithError(err).Warn("Underlying transport connection is not established.")
return nil, err
}

go func() {
mTp.Serve(tm.readCh)
tm.deleteTransport(mTp.Entry.ID)
Expand Down