Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add vector with defaults #93

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions charts/harmony-chart/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,8 @@ dependencies:
- name: openfaas
repository: https://openfaas.github.io/faas-netes
version: 14.2.34
digest: sha256:548b955e15c04e3eb4a270b4f6a869675d20becad8e75702cf945b214809a63e
generated: "2024-10-07T15:49:37.65074391-05:00"
- name: vector
repository: https://helm.vector.dev
version: 0.37.0
digest: sha256:ef4a8b227d14b9a28f93fac701ab63a52374b05996d4fcc584cfbd54e5320d3d
generated: "2024-11-15T11:17:36.416891511-05:00"
5 changes: 5 additions & 0 deletions charts/harmony-chart/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,8 @@ dependencies:
version: "14.2.34"
repository: https://openfaas.github.io/faas-netes
condition: openfaas.enabled

- name: vector
version: 0.37.0
repository: https://helm.vector.dev
condition: vector.enabled
122 changes: 122 additions & 0 deletions charts/harmony-chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -364,3 +364,125 @@ velero:

openfaas:
enabled: false

vector:
enabled: false
role: "Agent"
podDisruptionBudget: # Optional, but recommended
enabled: true
minAvailable: 1
# Configures a PodMonitor CRD used by the Prometheus operator
podMonitor:
enabled: false
tolerations:
- operator: Exists
logLevel: "info"
customConfig:
data_dir: /vector-data-dir
api:
enabled: false
address: 0.0.0.0:8686
playground: false

sources:
kubernetes_tutor_logs:
type: kubernetes_logs
extra_namespace_label_selector: app.kubernetes.io/managed-by=tutor
kubernetes_global_logs:
type: kubernetes_logs
extra_namespace_label_selector: app.kubernetes.io/managed-by!=tutor

transforms:

# Filter out application and global logs whose message is empty to prevent Vector process crash when sending logs to Cloudwatch
# More details in https://github.com/vectordotdev/vector/issues/15539
openedx_logs:
type: remap
inputs:
- kubernetes_tutor_logs
source: |-
if !includes(["lms", "cms", "cms-worker", "lms-worker", "lms-job", "cms-job"], .kubernetes.pod_labels."app.kubernetes.io/name"){
abort
}
if contains(string!(.message), "[tracking]") {
abort
}
.type = "application"
drop_on_abort: true
drop_on_error: true

# Group multiline logs for better observabitlity
grouped_openedx_logs:
type: reduce
merge_strategies:
message: concat_newline
inputs:
- openedx_logs
starts_when:
type: "vrl"
source: |-
match(string!(.message), r'^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}.*')
operation_openedx_logs:
type: remap
inputs:
- kubernetes_tutor_logs
source: |-
if includes(["lms", "cms", "cms-worker", "lms-worker", "lms-job", "cms-job"], .kubernetes.pod_labels."app.kubernetes.io/name"){
abort
}
.type = "application"
drop_on_abort: true
drop_on_error: true
global_logs:
type: filter
inputs:
- kubernetes_global_logs
condition: 'includes(["ingress-nginx"], .kubernetes.pod_labels."app.kubernetes.io/name")'
typed_global_logs:
type: remap
inputs:
- global_logs
source: |-
.type = "global"
drop_on_error: true
drop_on_abort: true
# Appplication logs (OpenedX, ingress-nginx, cert-manager) can be send to cloudwatch
# or to s3. It will depend on user needs.
application_logs:
type: remap
inputs:
- grouped_openedx_logs
- operation_openedx_logs
- typed_global_logs
source: |-
if is_empty(string!(.message)) {
log("Events with empty message are discarded", level: "info")
abort
}
# Extract tracking logs from Open edX applications
tracking_logs:
type: remap
inputs:
- kubernetes_tutor_logs
source: |-
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to work with the existing Aspects vector settings? Are the two going to be exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both need to be exclusive, for resource consumption and pipeline tracking. However, another filter and a sink can be added to filter specifics of any namespace.

However, keep in mind that Ralph is the recommended pipeline to use.

I can leave examples for it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a ClickHouse filter and sink example. Assuming there are multiple OpenedX installations in the same cluster, a filter, a transformer, and a sink must be implemented per namespace.

Basically what we do is:

  • Get all logs from a specific namespace (filter)
  • Get all xapi_tracking logs. (transformer)
  • Sink them to the specific ClickHouse instance/cluster. (sink)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ian2012 Can you please give an example for multiple instance per cluster configuration setup in the https://github.com/openedx/openedx-k8s-harmony/blob/main/values-example.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the values-example.yaml with the elements for one installation filtered by namespace. It just needs to be replicated for every openedx namespace with Aspects installed.

parsed, err_regex = parse_regex(.message, r'^.* \[tracking\] [^{}]* (?P<tracking_message>\{.*\})$')
if err_regex != null {
abort
}
message = parsed.tracking_message
parsed_json, err_json = parse_json(parsed.tracking_message)
if err_json != null {
log("Unable to parse JSON from tracking log message: " + err_json, level: "info")
abort
}
time, err_timestamp = parse_timestamp(parsed_json.time, "%+")
if err_timestamp != null {
log("Unable to parse timestamp from tracking log message: " + err_timestamp, level: "info")
abort
}
.time = time
.message = message
.type = "tracking"

# Make sure to check out values-example.yml to now how to sink logs to S3, CloudWatch and other services
sinks: {}
102 changes: 102 additions & 0 deletions values-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,105 @@ velero:

openfaas:
enabled: false

# ClickHouse Vector Sink

vector:
enabled: false
customConfig:
transforms:
# Events should be separated per namespace, and a different sink should be
# implemented for every namespace with Aspects
logs_openedx_demo:
type: filter
inputs:
- kubernetes_tutor_logs
condition: '.kubernetes.pod_namespace == "openedx_demo"' # Mkae sure to update the namespace

xapi_openedx_demo:
type: remap
inputs:
- logs_openedx_demo
drop_on_error: true
drop_on_abort: true
source: |-
parsed, err_regex = parse_regex(.message, r'^.* \[xapi_tracking\] [^{}]*
(?P<tracking_message>\{.*\})$')
if err_regex != null {
abort
}
message, err = strip_whitespace(parsed.tracking_message)
parsed_json, err_json = parse_json(parsed.tracking_message)
if err_json != null {
log("Unable to parse JSON from xapi tracking log message: " + err_json, level: "error")
abort
}
time, err_timestamp = parse_timestamp(parsed_json.timestamp, "%+")
if err_timestamp != null {
log("Unable to parse timestamp from tracking log 'time' field: " + err_timestamp, level: "warn")
time, err_timestamp = parse_timestamp(parsed_json.timestamp, "%+")
if err_timestamp != null {
log("Unable to parse timestamp from tracking log 'timestamp' field: " + err_timestamp, level: "error")
abort
}
}
event_id = parsed_json.id
. = {"event_id": event_id, "emission_time": format_timestamp!(time,
format: "%+"), "event": encode_json(parsed_json)}

sinks:
# Example ClickHouse Sink
clickhouse_openedx_demo:
type: clickhouse
auth:
strategy: basic
user: 'ch_vector'
password: 'password'
encoding:
timestamp_format: unix
date_time_best_effort: true
inputs:
- xapi_openedx_demo
# http://{{CLICKHOUSE_HOST }}.{{CLICKHOUSE_NAMESPACE}}:{{ CLICKHOUSE_INTERNAL_HTTP_PORT }}
endpoint: http://clickhouse-clickhouse.openedx-harmony:8123
# ASPECTS_VECTOR_DATABASE
database: 'openedx'
table: 'xapi_events_all'
healthcheck: true

tracking_logs_to_s3:
type: aws_s3
inputs:
- tracking_logs
filename_append_uuid: true
filename_time_format: "log-%Y%m%d-%H"
# Helm tries to render the .type and .kubernetes variables. We need to escape them to avoid errors
# See> https://github.com/helm/helm/issues/2798
key_prefix: |
{{ `{{ .kubernetes.pod_namespace }}/{{ .type }}/{{ .kubernetes.container_name }}/date=%F/` }}
compression: gzip
encoding:
codec: text
bucket: "set_me"
auth:
access_key_id: "set_me"
secret_access_key: "set_me"
region: "set_me"
# When using AWS-compatible services like MinIO, set the endpoint and tweak SSL if necessary
# endpoint: "http://minio.{namespace}:9000"
# region: none
healthcheck:
enabled: false

logs_to_cloudwatch:
type: aws_cloudwatch
inputs:
- application_logs
group_name: my-cluster
stream_name: |-
{{ `{{ .kubernetes.pod_namespace }}/{{ .kubernetes.container_name }}` }}
auth:
access_key_id: "set_me"
secret_access_key: "set_me"
encoding:
codec: json