Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSGO-45 Return error instead of panic when host address is invalid #1858

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

tengu-alt
Copy link
Contributor

Fix for the #1370

ConnectAddress() was refactored and returns an error instead of panic.

@tengu-alt tengu-alt force-pushed the refactor-connect-address-method branch from 6f8773a to 8aae39b Compare January 14, 2025 08:25
Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. The host selection / load balancing policies need some changes, we shouldn't assume the address is always valid on those implementations.

if hi.host.ConnectAddress().String() == host {
connAddr, err := hi.host.ConnectAddress()
if err != nil {
t.Error(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be t.Fatal so the behavior of this test is consistent after the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fixed

control.go Outdated
@@ -261,16 +261,21 @@ func (c *controlConn) connect(hosts []*HostInfo) error {
var conn *Conn
var err error
for _, host := range hosts {
connAddr, err := host.ConnectAddress()
if err != nil {
c.session.logger.Printf("gocql: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we add a message for this error before we add the actual error string so something like:
c.session.logger.Printf("gocql: unable to use host for control connection, skipping it: %v\n", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

control.go Outdated
@@ -423,16 +432,21 @@ func (c *controlConn) attemptReconnectToAnyOfHosts(hosts []*HostInfo) (*Conn, er
var conn *Conn
var err error
for _, host := range hosts {
connAddr, err := host.ConnectAddress()
if err != nil {
c.session.logger.Printf("gocql: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as comment above

control_test.go Outdated
t.Errorf("expected ip %v got %v for addr %q", test.ip, host.ConnectAddress(), test.addr)
connAddr, err := host.ConnectAddress()
if err != nil {
t.Errorf("%d: %v", i, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Errorf("could not get connect address of host %q to compare with expected: %v", test.addr, err)

filters.go Outdated
@@ -72,10 +72,13 @@ func WhiteListHostFilter(hosts ...string) HostFilter {

m := make(map[string]bool, len(hostInfos))
for _, host := range hostInfos {
m[host.ConnectAddress().String()] = true
connAddr, _ := host.ConnectAddress()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil I don't think it should be whitelisted, it should just continue

policies.go Outdated
@@ -862,10 +870,11 @@ func (d *dcAwareRR) AddHost(host *HostInfo) {
}

func (d *dcAwareRR) RemoveHost(host *HostInfo) {
connAddr, _ := host.ConnectAddress()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return if unable to get conn addr but also need to add a check for this in AddHost so hosts with invalid connect addresses are not added

policies.go Outdated
@@ -970,7 +979,8 @@ func (d *rackAwareRR) AddHost(host *HostInfo) {

func (d *rackAwareRR) RemoveHost(host *HostInfo) {
dist := d.HostTier(host)
d.hosts[dist].remove(host.ConnectAddress())
connAddr, _ := host.ConnectAddress()
d.hosts[dist].remove(connAddr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, check in AddHost so we don't add hosts with invalid addresses and then we can just return here when unable to get connAddr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

session.go Outdated
@@ -394,7 +394,8 @@ func (s *Session) reconnectDownedHosts(intv time.Duration) {
if gocqlDebug {
buf := bytes.NewBufferString("Session.ring:")
for _, h := range hosts {
buf.WriteString("[" + h.ConnectAddress().String() + ":" + h.State().String() + "]")
connAddr, _ := h.ConnectAddress()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't really ignore these errors, it's better to handle them than assuming that connAddr is always valid, just continue if unable to get connAddr

session.go Outdated
@@ -847,11 +848,12 @@ func (qm *queryMetrics) hostMetrics(host *HostInfo) *hostMetrics {
// hostMetricsLocked gets or creates host metrics for given host.
// It must be called only while holding qm.l lock.
func (qm *queryMetrics) hostMetricsLocked(host *HostInfo) *hostMetrics {
metrics, exists := qm.m[host.ConnectAddress().String()]
connAddr, _ := host.ConnectAddress()
metrics, exists := qm.m[connAddr.String()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should not be creating metrics objects for hosts with invalid connect addresses since they won't be used anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fixed

token.go Outdated
buf.WriteString(sep)
sep = ","
buf.WriteString("\n\t[")
buf.WriteString(strconv.Itoa(i))
buf.WriteString("]")
buf.WriteString(th.token.String())
buf.WriteString(":")
buf.WriteString(th.host.ConnectAddress().String())
buf.WriteString(connAddr.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this print an empty string if err != nil ? Printing empty string on the address part could be fine but we could also print nil instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have checked, it will print a nil

@joao-r-reis
Copy link
Contributor

Also can you create a JIRA for this?

@tengu-alt
Copy link
Contributor Author

Also can you create a JIRA for this?

Of course, currently working on refactoring.

@tengu-alt tengu-alt force-pushed the refactor-connect-address-method branch from 8aae39b to c884b90 Compare January 16, 2025 09:43
@tengu-alt tengu-alt changed the title ConnectAddress() was refactored CASSGO-45 - ConnectAddress() was refactored Jan 16, 2025
@tengu-alt tengu-alt changed the title CASSGO-45 - ConnectAddress() was refactored CASSGO-45 ConnectAddress() was refactored Jan 16, 2025
@joao-r-reis
Copy link
Contributor

I just had a realization that we are probably going about this the wrong way. Instead of changing ConnectAddress() to return an error we should probably ensure that we never create Host objects with invalid addresses in the first place, I think that would result in a clearer API and less intrusive changes to users.

Sorry for not realizing this earlier, I know you already worked a bit on this approach but I really think we should change the approach here.

First step is to find all occurences of straight Host struct initialization and replace them with a newHostInfo(addr net.IP, port int) (*Host, error) so that the ip validation occurs here in this method instead of host.ConnectAddress(). The only place where it will require some more changes is in func (r *ringDescriber) getClusterPeerInfo(localHost *HostInfo) ([]*HostInfo, error) because here the object is created without an IP address since it will be set after reading the columns from system.peers. A possible approach is to move the logic from h.connectAddressLocked() to a function outside of the hostinfo type, use it to compute the connect address and then use the result to create the host info object afterwards. After this change, I think h.ConnectAddress() can always read the value from the h.connectAddress field. After this change, h.SetConnectAddress should be changed to return an error in case the provided address is invalid (this is only in case a user is using this function because the driver itself doesn't use it).

@tengu-alt
Copy link
Contributor Author

Understood, I will work on it.

@tengu-alt
Copy link
Contributor Author

tengu-alt commented Jan 29, 2025

A possible approach is to move the logic from h.connectAddressLocked() to a function outside of the hostinfo type, use it to compute the connect address and then use the result to create the host info object afterwards.

I don't clearly understand how we can use the logic from the h.connectAddressLocked() if it computes connectAddress according to the IP's from the host which needs to be filled. As I see those IP's are filled inside of the hostInfoFromMap() method which requires *HostInfo.

I think it is better to create raw structure here as it is now, and add the validation inside of the hostInfoFromMap()
something like this:

	ip, port := s.cfg.translateAddressPort(host.ConnectAddress(), host.port)
	if !validIpAddr(ip) {
		return nil, errors.New("invalid connect address")
	}
	host.connectAddress = ip
	host.port = port

or validate it inside of getClusterPeerInfo()

@tengu-alt tengu-alt changed the title CASSGO-45 ConnectAddress() was refactored WIP CASSGO-45 ConnectAddress() was refactored Jan 29, 2025
@joao-r-reis
Copy link
Contributor

I don't clearly understand how we can use the logic from the h.connectAddressLocked() if it computes connectAddress according to the IP's from the host which needs to be filled. As I see those IP's are filled inside of the hostInfoFromMap() method which requires *HostInfo.

You would have those IPs as parameters of the function instead for example.

I think it is better to create raw structure here as it is now, and add the validation inside of the hostInfoFromMap()
something like this:

This can also work, as long as we guarantee that we don't have HostInfo objects going around with invalid IPs then I'm ok with it. I suggested the NewHostInfo function and replacing direct struct initialization with this function call because it would be a guaranteed way to ensure HostInfo objects always have valid IP addresses in the future

@tengu-alt tengu-alt force-pushed the refactor-connect-address-method branch from c884b90 to ffe9660 Compare January 30, 2025 15:29
@tengu-alt
Copy link
Contributor Author

@joao-r-reis I've made an update. I decided to add validation into hostInfoFromMap() for cases when we can't provide connectAddress for the constructor func (I have found another place inside of func (r *ringDescriber) getLocalHostInfo() (*HostInfo, error))

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about existing HostInfo struct initialization references in the codebase

host := &HostInfo{}
host.hostname = addr.String()
host.port = port
if !validIpAddr(addr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this check to the top

@@ -584,6 +598,9 @@ func (s *Session) hostInfoFromMap(row map[string]interface{}, host *HostInfo) (*
}

ip, port := s.cfg.translateAddressPort(host.ConnectAddress(), host.port)
if !validIpAddr(ip) {
return nil, errors.New("invalid connect address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostAddr := host.ConnectAddress()
return nil, fmt.Errorf("invalid connect address after translating %v:%v", hostAddr.String(), host.port)

return addr
}
panic(fmt.Sprintf("no valid connect address for host: %v. Is your cluster configured correctly?", h))
addr, _ := h.connectAddressLocked()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no case where the connectAddressLocked() function should ever return a non nil err now so it can be changed to just return the address (basically remove the validation).

@@ -1697,7 +1697,11 @@ func (c *Conn) awaitSchemaAgreement(ctx context.Context) (err error) {
}

for _, row := range rows {
host, err := c.session.hostInfoFromMap(row, &HostInfo{connectAddress: c.host.ConnectAddress(), port: c.session.cfg.Port})
h, err := newHostInfo(c.host.connectAddress, c.session.cfg.Port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.host.ConnectAdress()

@@ -172,7 +176,12 @@ func hostInfo(addr string, defaultPort int) ([]*HostInfo, error) {
}

for _, ip := range ips {
hosts = append(hosts, &HostInfo{hostname: host, connectAddress: ip, port: port})
h, err := newHostInfo(ip, port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestion to use newHostInfo was so that we could standardize every creation of a host info object to use this function so we ensure there is no host info object with an invalid connect address but as it stands in this PR we still have a few places where the host info struct is being created and initialized directly.

E.g. in hostInfoFromIter the host info can be created with a nil connect address but it is provided immediately to hostInfoFromMap which will fix it. This is fine with me but we should make it a bit more consistent.

As I see it, there's currently 3 different ways to create a host info object:

  1. create an "empty" host info object with a nil connect address + default port and call hostInfoFromMap to "fill" it
  2. create a host info object from a contact point
  3. update existing host info object with new values from system.peers/system.local (also done with hostInfoFromMap) - this is usually done on a host info created by 2)

For 1) I think we can add a new method session.newHostInfoFromMap(addr net.IP, port int, row map[string]interface{}) that just wraps around hostInfoFromMap:

func (s *Session) newHostInfoFromMap(addr net.IP, port int, row map[string]interface{}) (*HostInfo, error) {
	return s.hostInfoFromMap(row, &HostInfo{connectAddress: addr , port: port})
}

For 2. we can use the new newHostInfo function you created.

For 3) we can use hostInfoFromMap or we can use session.newHostInfoFromMap just like in 1)

In summary we will have hostInfo struct being created in 2 places only: session.newHostInfoFromMap() and newHostInfo() and in both of these 2 places we are ensuring that the object being returned has a valid address. Any host info struct initialization in the codebase should be replaced by a call to either session.newHostInfoFromMap or newHostInfo()

@joao-r-reis joao-r-reis changed the title WIP CASSGO-45 ConnectAddress() was refactored CASSGO-45 Do not create hosts from invalid ip addresses Jan 31, 2025
@joao-r-reis joao-r-reis changed the title CASSGO-45 Do not create hosts from invalid ip addresses CASSGO-45 Return error instead of panic when host address is invalid Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants