Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/kafka-auth failed #118525

Closed
cockroach-teamcity opened this issue Jan 31, 2024 · 24 comments · Fixed by #119077
Closed

roachtest: cdc/kafka-auth failed #118525

cockroach-teamcity opened this issue Jan 31, 2024 · 24 comments · Fixed by #119077
Assignees
Labels
A-cdc Change Data Capture A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 31, 2024

roachtest.cdc/kafka-auth failed with artifacts on master @ ed3a25e3c9459cede2f80babbfc9d44a836b6c12:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2293).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-35771

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-cdc labels Jan 31, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Jan 31, 2024
@blathers-crl blathers-crl bot added the A-cdc Change Data Capture label Jan 31, 2024
@wenyihu6
Copy link
Contributor

The kafka log file contains a bunch of failure messages like below:

[2024-01-31 07:59:47,041] WARN [RequestSendThread controllerId=1001] Controller 1001's connection to broker teamcity-13762710-1706682693-17-n1cpu4-0001.c.cockroach-ephemeral.internal:9094 (id: 1001 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching teamcity-13762710-1706682693-17-n1cpu4-0001.c.cockroach-ephemeral.internal found.
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:360)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:303)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:298)
	at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357)
	at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232)
	at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175)
	at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)

@wenyihu6
Copy link
Contributor

I was able to reproduce this on master pretty consistently. Likely due to #117544.

@wenyihu6 wenyihu6 self-assigned this Jan 31, 2024
@wenyihu6
Copy link
Contributor

Removing release blocker since it seems to be a test issue. It works on cockroach binary but not on roachtests.

[email protected]:26257/demoapp/movr> CREATE TABLE auth_test_table(t1 INT);                                                                                                                                                                                            
CREATE TABLE

Time: 5ms total (execution 5ms / network 0ms)

[email protected]:26257/demoapp/movr> CREATE CHANGEFEED FOR TABLE auth_test_table INTO                                                                                                                                              
                                -> "kafka://wenyitest.servicebus.windows.net:9093?tls_enabled=true&sasl_enabled=true&sasl_user=$ConnectionString&sasl_password=<redacted>&sasl_mechanism=PLAIN" WITH updated, format=json;                           
        job_id
----------------------
  939284689234853889
(1 row)

NOTICE: changefeed will emit to topic auth_test_table
Time: 396ms total (execution 396ms / network 0ms)

@wenyihu6 wenyihu6 removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jan 31, 2024
@wenyihu6
Copy link
Contributor

Likely the same issue as https://cockroachlabs.slack.com/archives/C065X5307U3/p1702915552046409 but it is now surfacing up after the upgrade.

@wenyihu6 wenyihu6 added the P-2 Issues/test failures with a fix SLA of 3 months label Jan 31, 2024
@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ cc4fdffa8532d16544c48ef036689763f737dc6b:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2293).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ fce4d4723519bc4ca6e9ef5da0ae19960c84752c:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 15961a19faca0e2b66df2d01a547549523ca70c7:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 3c41c509a87cba7a1fd3f5cfdb0f6badb78e3704:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ d272e9ef5589deff570efc023db6c70edfde311c:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 715628abd134abfd2c0d966f9b7220a6715cc299:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ d7d442e4a3c9dca7e01c4c6f4f00e2f28faa4374:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 7042601857042a057b1d4676735576cfbd37f36a:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Feb 8, 2024
From kafka 2.0 onwards, host name verification of servers is enabled by default.
ssl.endpoint.identification.algorithm defaults to `https` which validates server
host name to match the host name in the certificate. This patch fixes the
failure by pre-pending https to the sink connection URL.

Fixes: cockroachdb#118525
Release note: none
@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 353fded9fe270b3eee4c85480ac1b9ec819f23b0:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2290).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ b2e31876366324c2ebe5c2ad8bbd644997e90864:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ b2e31876366324c2ebe5c2ad8bbd644997e90864:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ b2e31876366324c2ebe5c2ad8bbd644997e90864:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 814a375d4c0e79d875c42452725f05f6c27294e3:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 254dbd247fb8ed352a11439063b29f23a0767f28:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ cc6ca026319024800395293b0fb18f05dd8eb50e:

(cdc.go:1079).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Feb 15, 2024
From [kafka
2.0](https://kafka.apache.org/20/documentation.html#security_confighostname)
onwards, host name verification of servers is enabled by default.

This means that the "fake" certificate we generate and use for kafka-auth is no
longer valid and missing the `DNSNames` field. Since then, the verification had
been failing. But this error message was never surfaced back to us until sarama
upgrade happened. This patch fixes the failure by adding the missing fields in
the certificate.

Test history

1. Kafka-auth was working as expected. In this test, we generate and pass "fake"
certificates for inter-broker communication within the Kafka cluster.
2. Some changes were made in the java environment or kafka cluster
(https://kafka.apache.org/20/documentation.html#security_confighostname),
resulting in hostname verification which wasn't previously enforced. This means
that the "fake" certificate we generated before is no longer valid and missing
the `DNSNames` field. Since then, we’ve always been getting an error message in
our kafka server logs. But this error was never surfaced up in sarama code
during Dial() AND kafka-auth only checks the success of the CREATE stmt but not
emitting messages. So our test has always been passing.
3. Sarama upgrade changed how Dial() works and is now invoking some untouched
kafka code and surfacing the error.

Overall, this issue pertains to test misconfiguration and not directly
user-facing. But the sarama upgrade may lead to similar issues for customers due
to the wide possibilities of kafka configurations. In this case, we don't think
a release note is necessary because customers should have encountered this error
message. This issue has been around for a while and should be surfaced once the
customer uses anything beyond Dial() - when they try to emit messages to kafka
sink.

Fixes: cockroachdb#118525
Release note: none
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Feb 15, 2024
From [kafka
2.0](https://kafka.apache.org/20/documentation.html#security_confighostname)
onwards, host name verification of servers is enabled by default.

This means that the "fake" certificate we generate and use for kafka-auth is no
longer valid and missing the `DNSNames` field. Since then, the verification had
been failing. But this error message was never surfaced back to us until sarama
upgrade happened. This patch fixes the failure by adding the missing fields in
the certificate.

Test history

1. Kafka-auth was working as expected. In this test, we generate and pass "fake"
certificates for inter-broker communication within the Kafka cluster.
2. Some changes were made in the java environment or kafka cluster
(https://kafka.apache.org/20/documentation.html#security_confighostname),
resulting in hostname verification which wasn't previously enforced. This means
that the "fake" certificate we generated before is no longer valid and missing
the `DNSNames` field. Since then, we’ve always been getting an error message in
our kafka server logs. But this error was never surfaced up in sarama code
during Dial() AND kafka-auth only checks the success of the CREATE stmt but not
emitting messages. So our test has always been passing.
3. Sarama upgrade changed how Dial() works and is now invoking some untouched
kafka code and surfacing the error.

Overall, this issue pertains to test misconfiguration and not directly
user-facing. But the sarama upgrade may lead to similar issues for customers due
to the wide possibilities of kafka configurations. In this case, we don't think
a release note is necessary because customers should have encountered this error
message. This issue has been around for a while and should be surfaced once the
customer uses anything beyond Dial() - when they try to emit messages to kafka
sink.

Fixes: cockroachdb#118525
Release note: none
@wenyihu6
Copy link
Contributor

wenyihu6 commented Feb 15, 2024

Summary:
Test history

  1. Kafka-auth was working as expected. In this test, we generate and pass
    self-signed test certificates for inter-broker communication within the Kafka
    cluster.
  2. Some changes were made in the java environment or kafka cluster
    (https://kafka.apache.org/20/documentation.html#security_confighostname),
    resulting in hostname verification which wasn't previously enforced. This means
    that the certificate we generated before is no longer valid and missing the
    DNSNames field. Since then, we’ve always been getting an error message in our
    kafka server logs. But this error was never raised in sarama code during
    Dial() AND kafka-auth only checks the success of the CREATE stmt but not
    emitting messages. So our test has always been passing.
  3. Sarama upgrade changed how Dial() works and is now invoking some untouched
    kafka code and surfacing the error.

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 7d0697b632066ee78735fc57e8150222d5576d0d:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ 0b7ae19e2b94b851ed8812914f57032aab699811:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ e39dafe6d8c153301ff43ed2b3ed3e13af9ec72a:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.cdc/kafka-auth failed with artifacts on master @ e39dafe6d8c153301ff43ed2b3ed3e13af9ec72a:

(cdc.go:1081).runCDCKafkaAuth: create changefeed with insecure TLS transport and no auth: pq: kafka: client has run out of available brokers to talk to
(cluster.go:2298).Run: context canceled
test artifacts and logs in: /artifacts/cdc/kafka-auth/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Feb 19, 2024
119077: roachtest/cdc: fix cdc/kafka-auth r=stevendanna a=wenyihu6

From [kafka
2.0](https://kafka.apache.org/20/documentation.html#security_confighostname)
onwards, host name verification of servers is enabled by default.

Previously, the self-signed test certificate we generated for kafka-auth only
included “localhost” in the list of subject alternative names. However, kafka
appears to make internal connections using the fully qualified domain name. As a
result, some inter-broker communication has been failing with a hostname
verification error for some time. But the failure wasn’t raised to the user
until the sarama upgrade happened. This patch fixes the failure by adding the
proper hostname of the kafka node to the certificate.

We don’t believe this represents a meaningful customer-facing issue. The
misconfiguration of the test kafka cluster would have surfaced even with older
sarama versions if the test had involved more than just connecting to the kafka
cluster.

Fixes: #118525
Release note: none

Co-authored-by: Wenyi Hu <[email protected]>
@craig craig bot closed this as completed in 7b70a3a Feb 19, 2024
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Feb 21, 2024
From [kafka
2.0](https://kafka.apache.org/20/documentation.html#security_confighostname)
onwards, host name verification of servers is enabled by default.

Previously, the self-signed test certificate we generated for kafka-auth only
included “localhost” in the list of subject alternative names. However, kafka
appears to make internal connections using the fully qualified domain name. As a
result, some inter-broker communication has been failing with a hostname
verification error for some time. But the failure wasn’t raised to the user
until the sarama upgrade happened. This patch fixes the failure by adding the
proper hostname of the kafka node to the certificate.

We don’t believe this represents a meaningful customer-facing issue. The
misconfiguration of the test kafka cluster would have surfaced even with older
sarama versions if the test had involved more than just connecting to the kafka
cluster.

Fixes: cockroachdb#118525
Release note: none
@rharding6373 rharding6373 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-cdc
Projects
None yet
3 participants