Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring a Kafka based Jaeger architecture #95

Closed
objectiser opened this issue Nov 7, 2018 · 9 comments
Closed

Configuring a Kafka based Jaeger architecture #95

objectiser opened this issue Nov 7, 2018 · 9 comments

Comments

@objectiser
Copy link
Contributor

Currently Kafka support is being added to Jaeger in two places, as a storage plugin and an ingester.

The aim of this approach is to have a collector configured with Kafka as storage, to publish spans to Kafka, and then ingesters that can consume those messages and store the spans in a real storage backend (e.g. elasticsearch/cassandra).

We need to consider how such a configuration would be defined in the operator's CR?

Currently kafka is being listed as a storage type - but an operator CR can only support a single storage type - so either

  1. We need to treat this kafka based configuration as something else - i.e. the storage type is specified as the real storage used by the ingester, but the collector using kafka and the ingester need to be configured from a different spec?

  2. There would be two separate CRs - one defining the collector with Kafka storage, and the other defining the ingester with real storage. Issue with this approach is that only a subset of the components may need to be configured in each CR - so query will only be defined in the second CR (as it will also use the same real storage), and agent may potentially be defined in the first, as it will use the collector.

Although Kafka not yet fully supported, we need to consider how its introduction may impact the spec structure.

@objectiser
Copy link
Contributor Author

My preference would be to go with option 1 - treating this Kafka based configuration as a virtual collector - so the current config still applies to the collector and backend storage used by the ingester.

The additional part is a Kafka based configure that connects the two parts - and if defined, it would result in the collector being configured to use Kafka storage, and the ingester being deployed.

@objectiser
Copy link
Contributor Author

There is also the option to have multiple storage plugins configured within the collector - e.g. kafka and elasticsearch.

So possibly we need more than two strategy values - one for allInOne but three or more 'production' type strategies. One for simple collector/storage, one for collector/multistorage and one for "distributed collector" using kafka.

The first two could potentially be collapsed into a single 'strategy', by simply allowing a comma separated list of storage types, with kafka options listed in the storage.options section.

@jpkrohling
Copy link
Contributor

jpkrohling commented Nov 8, 2018

How about having a JaegerKafkaStorageSpec as a child of JaegerStorageSpec, like CassandraCreateSchema currently is?

This spec could then hold an Options object, related to the configuration of the backing storage.

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: with-kafka
spec:
  strategy: all-in-one
  storage:
    type: kafka
    kafka:
      options:
        es:
          server-urls: http://elasticsearch:9200
          username: elastic
          password: changeme

@objectiser
Copy link
Contributor Author

I think we just need to look at the different backend configurations (as in ways the components are organised) and see if there is a natural way to structure the information in the CR to provide the appropriate flexibility, but also clarity. I'll put together some examples soon.

@objectiser
Copy link
Contributor Author

Possible suggestions. First for the use of kafka as a secondary storage plugin within the collector:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: with-es-and-kafka-storage
spec:
  strategy:production
  storage:
    type: es,kafka
    options:
      es:
        server-urls: http://elasticsearch:9200
        username: elastic
        password: changeme
      kafka:
        brokers: xyz
        topic: spans

and using the ingester approach:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: with-ingester
spec:
  strategy:production
  ingester:
    enabled: true
  storage:
    type: es
    options:
      es:
        server-urls: http://elasticsearch:9200
        username: elastic
        password: changeme
      kafka:
        brokers: xyz
        topic: spans

this means that, as the ingester has been enabled, then the collector will use the kafka storage plugin (using the config from storage.options), and the 'enabled' ingester will use the actual storage type (i.e. elasticsearch in this case). So the change in deployment structure is triggered by the ingester,enabled being true.

Note: the kafka options are optional, if defaults are appropriate - although likely that the kafka.brokers option would be required in practice.

@jpkrohling
Copy link
Contributor

I like your suggestions. Just one thing to think about: what would happen if a user emits/forgets the ingester: enabled from the second example?

To me, it's still clear that the user intends to use the ingester there. In that case, the ingester option wouldn't be necessary.

@objectiser
Copy link
Contributor Author

Two reasons why it wouldn't work - the current approach allows multiple storage configurations to be defined, and only used if the storage.type is specified - this is used in the istio helm chart as a way to define configurations for multiple potential storage which is then selected based on the storage.type parameter.

The other reason is that the kafka storage/ingester options have default values - so technically no options need to be specified - so we shouldn't rely on someone specifying storage.options.kafka.xxx as a signal that an ingester should be used.

Not sure having enabled elements is a bad thing - we might also want to support it under (for example) the query and collector components, to enable a Jaeger instance to be deployed with only some of the components. For example - in a particular namespace, we may only want to deploy query servers.

@jpkrohling
Copy link
Contributor

Two reasons why it wouldn't work

Agree with both. I guess there's no easy way to detect when the user wants to use the ingester without the flag, then.

@objectiser
Copy link
Contributor Author

objectiser commented Jan 31, 2019

Implemented in #168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants