Skip to content

Commit

Permalink
Merge branch 'risingwavelabs:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
agiron123 authored May 3, 2024
2 parents 6d58915 + 5a757cf commit 9522546
Show file tree
Hide file tree
Showing 18 changed files with 362 additions and 65 deletions.
1 change: 1 addition & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -201,3 +201,4 @@ EMQX
HiveMQ
MQTT
RabbitMQ
Standalone's
171 changes: 171 additions & 0 deletions cloud/cluster-create-byoc-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
id: cluster-create-byoc-cluster
title: Bring your own cloud
description: You can use the BYOC cluster types to create custom clouds.
slug: /create-byoc-cluster
---

The Bring Your Own Cloud (BYOC) plan offers you the flexibility to tailor your cloud infrastructure instead of depending on a hosted service. It allows you to utilize the advantages of your chosen cloud provider, maintain full control over your environment, and adjust configurations to suit your specific needs. This guide outlines the services that RisingWave deploys in a BYOC environment and walks you through the process of enabling BYOC in a step-by-step manner.

:::note
We currently support AWS and GCS as the cloud platform. Azure integration is in development and will be available soon.
:::

## Architecture overview

Before creating a BYOC deployment, familiarize yourself with the following architecture. In the BYOC environment, the entire data plane is deployed in the user's space. To manage the RisingWave clusters within this environment, we deploy two key services for operation delegation:

- **Agent Service**: This service manages Kubernetes (K8s) and cloud resources. It handles tasks such as managing RisingWave Pods, Google Cloud Storage (GCS) buckets, IAM roles/accounts associated with the RisingWave cluster, network endpoints, etc.

- **RWProxy**: This is a TCP proxy that routes SQL statements from the control plane to the appropriate RisingWave instances.

## Procedure

Follow the steps below to create your own cloud environment using RisingWave's BYOC plan.

1. Navigate to the [**Clusters**](https://cloud.risingwave.com/clusters/) page and click **Create cluster**.

2. On the right-side panel, choose **Enterprise** and enter your invitation code. If you do not have an invitation code, please contact our [support team](mailto:[email protected]) or [sales team](mailto:[email protected]) to obtain one.

3. Once you've redeemed the invitation code, select **BYOC** as the deployment type, and select your cloud platform as AWS or GCP (see [Resource and permission](#resource-and-permission) for more details), region, and ID as necessary.

4. After configuring these settings, you'll see additional instructions on your screen. Follow these steps to establish your BYOC environment. Please be aware that the final command `rwc byoc apply --name xxx` may take 30 to 40 minutes to complete, and a progress bar will be shown to keep you updated. During this time, it's crucial to ensure a stable internet connection. If the command is interrupted or fails due to network instability, you can safely retry it.

:::tip
If you encounter any issues during this process, please contact our [support team](mailto:[email protected]).
:::

5. Click **Next** to continue the configuration of cluster size and nodes. To learn more about the nodes, see the [architecture of RisingWave](/docs/current/architecture).

6. Click **Next**, name your cluster, and execute the command that pops up to establish a BYOC cluster in your environment.

Once the cluster is successfully created, you can manage it through the portal just like hosted clusters.

## Resource and permission

When you customize your cloud platform, refer to the following notes to see what we've set up for you and the permissions you need to enable.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

<Tabs queryString="method">

<TabItem value="AWS" label="AWS">

- **Required service-linked role**

The role `AWSServiceRoleForAutoScaling` needs to be in place. If it is not ready yet, you need to create it manually. See [Create a service-linked role](https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-service-linked-role.html#create-service-linked-role-manual) for detailed steps.

- **Required quota increase**

For optimal performance, the quota for managed node groups per cluster should be increased to 36 or more. See [Service quotas](https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html#sq-text) for more details.

- **Required permissions for BYOC environment creation/deletion**

We recommend using an IAM role/user with Administrator permissions for the AWS account to deploy the infrastructure.

- **Resources provisioned in BYOC environment**

We will set up the following resources in a BYOC environment:

- 1 VPC: including VPC, its subnets, security, and IPs to host all BYOC resources.
- 1 EKS cluster: to host all service and RisingWave clusters workloads.
- 2 S3 buckets: for RisingWave cluster data and infra state data respectively.
- 2 Internal network load balancer: to expose Agent Service and RWProxy.
- 1 External network load balancer (optional): to expose RWProxy to the Internet.
- A few IAM roles for EKS and K8s workloads, and Each role is granted the least privilege it requires.

- **Required permission for deployed services**

- ec2:DescribeVpcEndpoints
- ec2:DescribeVpcEndpointServices
- ec2:DescribeSubnets
- s3:*
- aps:GetLabels
- aps:GetMetricMetadata
- aps:GetSeries
- aps:QueryMetrics

</TabItem>

<TabItem value="GCP" label="GCP">

- **Required APIs for BYOC environment creation/deletion**

You need to enable the following APIs to create or delete a BYOC environment:

- **Compute Engine API** for VPC resources provision.
- **Cloud DNS API** for VPC private service connect setup.
- **Kubernetes Engine API** for provisioning the GKE cluster the data plane is hosted.
- **Cloud Resource Manager API** for IAM provisioning.

- **Required permission for BYOC environment creation/deletion**

Before running the command-line interface to create or delete a BYOC environment, you need to have a Google IAM (IAM user/Service account) with the following roles.

- [Kubernetes Engine Admin](https://cloud.google.com/iam/docs/understanding-roles#container.admin)
- [Compute Network Admin](https://cloud.google.com/iam/docs/understanding-roles#compute.networkAdmin)
- [Compute Security Admin](https://cloud.google.com/iam/docs/understanding-roles#compute.securityAdmin)
- [Storage Admin](https://cloud.google.com/iam/docs/understanding-roles#storage.admin)
- [Security Admin](https://cloud.google.com/iam/docs/understanding-roles#iam.securityAdmin)
- [Service Account Admin](https://cloud.google.com/iam/docs/understanding-roles#iam.serviceAccountAdmin)

:::note
These permissions are only required for creating or deleting a BYOC environment. Once the environment is up and running, limited permissions are needed to operate the services.
:::

- **Resources provisioned in BYOC environment**

We will set up the following resources in a BYOC environment:

- 1 VPC: including VPC, its subnets, firewalls, IPs to host all BYOC resources.
- 1 GKE cluster: to host all service and RisingWave clusters workloads.
- 2 GCS buckets: for RisingWave cluster data and infra state data respectively.
- 2 Internal network load balancer: to expose Agent Service and RWProxy.
- 1 External network load balancer (optional): to expose RWProxy to the Internet.
- A few IAM roles for EKS and K8s workloads, and each role is granted the least privilege it requires.

- **Required permission for deployed services**

We will provision a Google Service Account for the deployed services. The services require the following permissions:

- [Storage Admin](https://cloud.google.com/iam/docs/understanding-roles#storage.admin) to manage GCS objects and bucket access for RisingWave clusters.
- [Compute Network Admin](https://cloud.google.com/iam/docs/understanding-roles#compute.networkAdmin) to manage private links for the source/sink of RisingWave clusters.
- [Service Account Admin](https://cloud.google.com/iam/docs/understanding-roles#iam.serviceAccountAdmin) to manage the IAM service account RisingWave clusters.

</TabItem>

<TabItem value="Azure" label="Azure (coming soon)">

- **Required feature flags**

You need to enable the feature flag `EnableAPIServerVnetIntegrationPreview` for the subscription to deploy a BYOC environment. See [Feature flag](https://learn.microsoft.com/en-us/azure/aks/api-server-vnet-integration#register-the-enableapiservervnetintegrationpreview-feature-flag) for more details.

- **Required permission for BYOC environment creation/deletion**

We recommend utilizing a service principal or user with owner permissions of the Azure subscription to provision the infrastructure.

- **Resources provisioned in BYOC environment**

We will set up the following resources in a BYOC environment:

- 1 VPC: including VPC, its subnets, firewalls, IPs to host all BYOC resources.
- 1 AKS cluster: to host all service and RisingWave clusters workloads.
- 2 Azure storage account with on blob container in it: for RisingWave cluster data and infra state data respectively.
- 2 Internal network load balancer: to expose Agent Service and RWProxy.
- 1 External network load balancer (optional): to expose RWProxy to the Internet.
- A few user-assigned identities for AKS workloads, and each identity is granted the least privilege it requires.

- **Required permission for deployed services**

- Role [Storage Blob Data Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/storage#storage-blob-data-contributor)
- Role [Network Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/networking#network-contributor)
- Role [Managed Identity Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/identity#managed-identity-contributor)
- Role [Role Based Access Control Administrator](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/general#role-based-access-control-administrator)
- Role [Monitoring Reader](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/monitor#monitoring-reader)
- A custom role with `Microsoft.Network/networkInterfaces/read` permission
- A custom role with `Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials/*` permission

</TabItem>

</Tabs>
38 changes: 37 additions & 1 deletion docs/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This guide aims to provide a quick and easy way to get started with RisingWave.
## Step 1: Start RisingWave

:::info
The following options start RisingWave in the standalone mode. In this mode, data is stored in the file system and the metadata is stored in the embedded SQLite database.
The following options start RisingWave in the standalone mode. In this mode, data is stored in the file system and the metadata is stored in the embedded SQLite database. See [About RisingWave standalone mode](#about-risingwave-standalone-mode) for more details.

For extensive testing or single-machine deployment, consider [starting RisingWave via Docker Compose](/deploy/risingwave-docker-compose.md). For production environments, consider [RisingWave Cloud](/deploy/risingwave-cloud.md), our fully managed service, or [deployment on Kubernetes using the Operator](/deploy/risingwave-kubernetes.md) or [Helm Chart](/deploy/deploy-k8s-helm.md).
:::
Expand Down Expand Up @@ -143,6 +143,42 @@ SELECT * FROM average_exam_scores;
(2 rows)
```

## About RisingWave standalone mode

RisingWave standalone mode is a simplified deployment mode for RisingWave. It is designed to be minimal, easy to install, and configure.

Unlike other deployment modes, for instance [Docker Compose](/deploy/risingwave-docker-compose.md) or [Kubernetes](/deploy/risingwave-kubernetes.md), RisingWave standalone mode starts the cluster as a single process. This means that services like `compactor`, `frontend`, `compute` and `meta` are all embedded in this process.

For state store, we will use the embedded `LocalFs` Object Store, eliminating the need for an external service like `minio` or `s3`; for meta store, we will use the embedded `SQLite` database, eliminating the need for an external service like `etcd`.

By default, the RisingWave standalone mode will store its data in `~/risingwave`, which includes both `Metadata` and `State Data`.

For a batteries-included setup, with `monitoring` tools and external services like `kafka` fully included, you can use [Docker Compose](/deploy/risingwave-docker-compose.md) instead. If you would like to set up these external services manually, you may check out RisingWave's [Docker Compose](https://github.com/risingwavelabs/risingwave/blob/main/docker/docker-compose.yml), and run these services using the same configurations.

## Configure RisingWave standalone mode

The instance of RisingWave standalone mode can run without any configuration. However, there are some options available to customize the instance.

The main options which new users may require would be the state store directory (`--state-store-directory`) and in-memory mode (`--in-memory`).

`--state-store-directory` specifies the new directory where the cluster's `Metadata` and `State Data` will reside. The default is to store it in the `~/risingwave` folder.

```sh
# Reconfigure RisingWave to be stored under 'projects' folder instead.
risingwave --state-store-directory ~/projects/risingwave
```

`--in-memory` will run an in-memory instance of RisingWave, both `Metadata` and `State Data` will not be persisted.

```sh
risingwave --in-memory
```

You can view other options with:
```sh
risingwave single --help
```

## What's next?

Congratulations! You've successfully started RisingWave and conducted some initial data analysis. To explore further, you may want to:
Expand Down
4 changes: 0 additions & 4 deletions docs/guides/ingest-from-neon-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,3 @@ CREATE TABLE orders (
);
```
After the table is created, you can view and transform the CDC data from Neon based on your needs.

:::note
RisingWave supports creating a single PostgreSQL source that allows you to read CDC data from multiple tables located in the same database. However, please note that this feature is currently under development for Neon. For further information, refer to the [PostgreSQL CDC](https://docs.risingwave.com/docs/current/ingest-from-postgres-cdc/) in the documentation.
:::
1 change: 1 addition & 0 deletions docs/guides/ingest-from-postgres-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ Unless specified otherwise, the fields listed are required. Note that the value
|schema.name| Optional. Name of the schema. By default, the value is `public`. |
|table.name| Name of the table that you want to ingest data from. |
|slot.name| Optional. The [replication slot](https://www.postgresql.org/docs/14/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS) for this PostgreSQL source. By default, a unique slot name will be randomly generated. Each source should have a unique slot name.|
|ssl.mode| Optional. The `ssl.mode` parameter determines the level of SSL/TLS encryption for secure communication with Postgres. It accepts three values: `disable`, `prefer`, and `require`. The default value is `prefer`. When set to `require`, it enforces TLS for establishing a connection.
|publication.name| Optional. Name of the publication. By default, the value is `rw_publication`. For more information, see [Multiple CDC source tables](#multiple-cdc-source-tables). |
|publication.create.enable| Optional. By default, the value is `'true'`. If `publication.name` does not exist and this value is `'true'`, a `publication.name` will be created. If `publication.name` does not exist and this value is `'false'`, an error will be returned. |
|transactional| Optional. Specify whether you want to enable transactions for the CDC table that you are about to create. By default, the value is `'true'` for shared sources, and `'false'` otherwise. This feature is also supported for shared CDC sources for multi-table transactions. For details, see [Transaction within a CDC table](/concepts/transactions.md#transactions-within-a-cdc-table).|
Expand Down
6 changes: 6 additions & 0 deletions docs/guides/sink-to-clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,12 @@ In ClickHouse, the `Nested` data type doesn't support multiple levels of nesting

:::

:::note

Previously, when inserting data into a ClickHouse sink, an error would be reported if the values were "nan (not a number)", "inf (infinity)", or "-inf (-infinity)". However, we have made a change to this behavior. If the ClickHouse column is nullable, we will insert null values in such cases. If the column is not nullable, we will insert `0` instead.

:::

Please be aware that the range of specific values varies among ClickHouse types and RisingWave types. Refer to the table below for detailed information.

| ClickHouse type | RisingWave type | ClickHouse range | RisingWave range |
Expand Down
5 changes: 3 additions & 2 deletions docs/guides/sink-to-delta-lake.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,11 @@ WITH (
| Parameter Names | Description |
| --------------- | ---------------------------------------------------------------------- |
| type | Required. Currently, only `append-only` is supported. |
| location | Required. The file path that the Delta Lake table is reading data from, as specified when creating the Delta Lake table. |
| s3.endpoint | Required. Endpoint of the S3. <ul><li>For MinIO object store backend, it should be <http://${MINIO_HOST}:${MINIO_PORT>}. </li><li>For AWS S3, refer to [S3](https://docs.aws.amazon.com/general/latest/gr/s3.html) </li></ul> |
| location | Required. The file path that the Delta Lake table is reading data from, as specified when creating the Delta Lake table. <ul><li>For AWS, start with `s3://` or `s3a://`;</li><li>For GCS, start with `gs://`; </li><li>For local files, start with `file://`.</li></ul>|
| s3.endpoint | Required. Endpoint of the S3. <ul><li>For MinIO object store backend, it should be <http://${MINIO_HOST}:${MINIO_PORT>}. </li><li>For AWS S3, refer to [S3](https://docs.aws.amazon.com/general/latest/gr/s3.html). </li></ul> |
| s3.access.key | Required. Access key of the S3 compatible object store.|
| s3.secret.key | Required. Secret key of the S3 compatible object store.|
| gcs.service.account | Required for GCS. Specifies the service account JSON file as a string.|

## Example

Expand Down
8 changes: 7 additions & 1 deletion docs/guides/sink-to-doris.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,10 @@ In regards to `decimal` types, RisingWave will round to the nearest decimal plac
|ARRAY | ARRAY |
|No support | BYTEA |
|JSONB | JSONB |
|BIGINT | SERIAL |
|BIGINT | SERIAL |

:::note

Previously, when inserting data into an Apache Doris sink, an error would be reported if the values were "nan (not a number)", "inf (infinity)", or "-inf (-infinity)". However, we have made a change to the behavior. If a decimal value is out of bounds or represents "inf", "-inf", or "nan", we will insert null values.

:::
14 changes: 10 additions & 4 deletions docs/guides/sink-to-starrocks.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
id: sink-to-starrocks
title: Sink data from RisingWave to StarRocks
description: Sink data from RisingWave to StarRocks.
slug: /sink-to-starrocks
slug: /sink-to-starrocks
---

This guide describes how to sink data from RisingWave to StarRocks.

StarRocks is an open-source, massively parallel processing (MPP) database. For details on how to get started with StarRocks, see the [Quick start](https://docs.starrocks.io/docs/quick_start/) guide.

The StarRocks stream load does not support sinking `struct` and `json` types.
The StarRocks stream load does not support sinking `struct` type.

## Prerequisites

Expand Down Expand Up @@ -86,8 +86,14 @@ The following table shows the corresponding data type in RisingWave that should
| DATETIME | TIMESTAMP WITHOUT TIME ZONE |
| No support | TIMESTAMP WITH TIME ZONE(Can be converted to timestamp in RisingWave then sinked into StarRocks )|
| No support | INTERVAL |
| JSON | STRUCT |
| No support | STRUCT |
| ARRAY | ARRAY |
| No support | BYTEA |
| JSON | JSONB |
| BIGINT | SERIAL |
| BIGINT | SERIAL |

:::note

Previously, when inserting data into a StarRocks sink, an error would be reported if the values were "nan (not a number)", "inf (infinity)", or "-inf (-infinity)". However, we have made a change to the behavior. If a decimal value is out of bounds or represents "inf", "-inf", or "nan", we will insert null values.

:::
Loading

0 comments on commit 9522546

Please sign in to comment.