Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

optimize: reduce the waiting time for starting dfget server #1088

Merged
merged 1 commit into from
Nov 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions dfget/core/uploader/peer_server_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,10 @@ func (pe *peerServerExecutor) checkPeerServerExist(cfg *config.Config, port int)
if port <= 0 {
port = getPortFromMeta(cfg.RV.MetaPath)
}
if port <= 0 {
// port 0 is invalid
return 0
}

// check the peer server whether is available
result, err := checkServer(cfg.RV.LocalIP, port, cfg.RV.DataDir, taskFileName, int(cfg.TotalLimit))
Expand Down
64 changes: 44 additions & 20 deletions dfget/core/uploader/uploader.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,27 +120,51 @@ func launch(cfg *config.Config, p2pPtr *unsafe.Pointer) error {
return fmt.Errorf("start peer server error and retried at most %d times", retryCount)
}

func waitForStartup(result chan error, p2pPtr *unsafe.Pointer) error {
select {
case err := <-result:
tmp := loadSrvPtr(p2pPtr)
if err == nil {
logrus.Infof("reuse exist server on port:%d", tmp.port)
tmp.setFinished()
}
return err
case <-time.After(100 * time.Millisecond):
// The peer server go routine will block and serve if it starts successfully.
// So we have to wait a moment and check again whether the peer server is
// started.
tmp := loadSrvPtr(p2pPtr)
if tmp == nil {
return fmt.Errorf("initialize peer server error")
}
if !uploaderAPI.PingServer(tmp.host, tmp.port) {
return fmt.Errorf("can't ping port:%d", tmp.port)
// waitForStartup It's a goal to start 'dfget server' process and make it working
// within 300ms, such as in the case of downloading very small files, especially
// in parallel.
// The ticker which has a 5ms period can test the server whether is working
// successfully as soon as possible.
// Actually, it costs about 70ms for 'dfget client' to start a `dfget server`
// process if everything goes right without any failure. So the remaining time
// for retrying to launch server internal is about 230ms. And '233' is just
// right the smallest number which is greater than 230, a prime, and not a
// multiple of '5'.
// And there is only one situation which should be retried again: the address
// already in use. The remaining time is enough for it to retry 10 times to find
// another available address in majority of cases.
func waitForStartup(result chan error, p2pPtr *unsafe.Pointer) (err error) {
ticker := time.NewTicker(5 * time.Millisecond)
defer ticker.Stop()
timeout := time.After(233 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to use this value 233 * time.Millisecond. How about declaring it a const value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a multiple of 5, is a prime number. And it's enough for retry to start a http server. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it's weird for 233ms. Maybe other contributors will feel the same way. Could you please add a comment for it? In addition, how about declaring it a const value as @allencloud said?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is 233 ms, not others?

It's weird for us because of unfamiliarity, but it's really the one. There are many things in the world that are unexpected. I have already commented on this function to describe why this number jumps out by itself. It jumps into my brain, and I have to write it down.

why not declare it as a const value

I think that maybe it's unnecessary to declare it as a const value. The code here can explain its meaning by a very simple and clear way by itself: it's the timeout of waiting for starting server. Just like the variable ticker with 5ms above. It will lose its meaning when it appears in any other places. And the important thing is that it's very hard to come up with a name of the constant value which can explain itself without context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Man!


for {
select {
case <-ticker.C:
tmp := loadSrvPtr(p2pPtr)
if tmp != nil && uploaderAPI.PingServer(tmp.host, tmp.port) {
return nil
}
case err = <-result:
tmp := loadSrvPtr(p2pPtr)
if err == nil {
logrus.Infof("reuse exist server on port:%d", tmp.port)
tmp.setFinished()
}
return err
case <-timeout:
// The peer server go routine will block and serve if it starts successfully.
// So we have to wait a moment and check again whether the peer server is
// started.
tmp := loadSrvPtr(p2pPtr)
if tmp == nil {
return fmt.Errorf("initialize peer server error")
}
if !uploaderAPI.PingServer(tmp.host, tmp.port) {
return fmt.Errorf("can't ping port:%d", tmp.port)
}
return nil
}
return nil
}
}

Expand Down