Skip to content

Commit

Permalink
Merge pull request #23575: [website][adhoc] Fix spellcheck errors and…
Browse files Browse the repository at this point in the history
… typos
  • Loading branch information
aromanenko-dev authored Oct 12, 2022
2 parents 786ba8b + e6e4d04 commit 1c1ecb2
Show file tree
Hide file tree
Showing 28 changed files with 50 additions and 50 deletions.
2 changes: 1 addition & 1 deletion website/www/site/content/en/community/case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,5 @@ started!
Want to tell the world you are using Apache Beam? Just walk
through [this instruction](https://github.com/apache/beam/tree/master/website/ADD_LOGO.md) and make it happen!

The Apache Beam PMC reserves the right to remove logos of companies that are not demeed to be in good standing in the
The Apache Beam PMC reserves the right to remove logos of companies that are not deemed to be in good standing in the
community.
4 changes: 2 additions & 2 deletions website/www/site/content/en/documentation/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ as summation, called a `CombineFn`, in which the output is significantly smaller
than the input. In this case the aggregation is called `CombinePerKey`.

In a real application, you might have millions of keys and/or windows; that is
why this is still an "embarassingly parallel" computational pattern. In those
why this is still an "embarrassingly parallel" computational pattern. In those
cases where you have fewer keys, you can add parallelism by adding a
supplementary key, splitting each of your problem's natural keys into many
sub-keys. After these sub-keys are aggregated, the results can be further
Expand Down Expand Up @@ -611,7 +611,7 @@ For more information about Splittable `DoFn`, see the following pages:

### What's next

Take a look at our [other documention](/documentation/) such as the Beam
Take a look at our [other documentation](/documentation/) such as the Beam
programming guide, pipeline execution information, and transform reference
catalogs.

Original file line number Diff line number Diff line change
Expand Up @@ -67,5 +67,5 @@ strategies remain unchanged.
## Unbounded JOIN Bounded {#join-unbounded-bounded}

For this type of `JOIN` bounded input is treated as a side-input by the
implementation. This means that window/trigger is inherented from upstreams.
implementation. This means that window/trigger is inherited from upstreams.

Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ them into JSON `TableRow` objects.

{{< paragraph class="language-py" >}}
To read from a BigQuery table using the Beam SDK for Python, apply a `ReadFromBigQuery`
transfrom. `ReadFromBigQuery` returns a `PCollection` of dictionaries,
transform. `ReadFromBigQuery` returns a `PCollection` of dictionaries,
where each element in the `PCollection` represents a single row in the table.
Integer values in the `TableRow` objects are encoded as strings to match
BigQuery's exported JSON format.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ limitations under the License.

> **IMPORTANT!** Previous implementation of Hadoop Input Format IO, called `HadoopInputFormatIO`, is deprecated starting from *Apache Beam 2.10*. Please, use current `HadoopFormatIO` which supports both `InputFormat` and `OutputFormat`.
A `HadoopFormatIO` is a transform for reading data from any source or writing data to any sink that implements Hadoop's `InputFormat` or `OurputFormat` accordingly. For example, Cassandra, Elasticsearch, HBase, Redis, Postgres, etc.
A `HadoopFormatIO` is a transform for reading data from any source or writing data to any sink that implements Hadoop's `InputFormat` or `OutputFormat` accordingly. For example, Cassandra, Elasticsearch, HBase, Redis, Postgres, etc.

`HadoopFormatIO` allows you to connect to many data sources/sinks that do not yet have a Beam IO transform. However, `HadoopFormatIO` has to make several performance trade-offs in connecting to `InputFormat` or `OutputFormat`. So, if there is another Beam IO transform for connecting specifically to your data source/sink of choice, we recommend you use that one.

Expand Down Expand Up @@ -360,7 +360,7 @@ You will need to pass a Hadoop `Configuration` with parameters specifying how th
- `mapreduce.job.outputformat.class` - The `OutputFormat` class used to connect to your data sink of choice.
- `mapreduce.job.output.key.class` - The key class passed to the `OutputFormat` in `mapreduce.job.outputformat.class`.
- `mapreduce.job.output.value.class` - The value class passed to the `OutputFormat` in `mapreduce.job.outputformat.class`.
- `mapreduce.job.reduces` - Number of reduce tasks. Value is equal to number of write tasks which will be genarated. This property is not required for `Write.PartitionedWriterBuilder#withoutPartitioning()` write.
- `mapreduce.job.reduces` - Number of reduce tasks. Value is equal to number of write tasks which will be generated. This property is not required for `Write.PartitionedWriterBuilder#withoutPartitioning()` write.
- `mapreduce.job.partitioner.class` - Hadoop partitioner class which will be used for distributing of records among partitions. This property is not required for `Write.PartitionedWriterBuilder#withoutPartitioning()` write.

_Note_: All mentioned values have appropriate constants. E.g.: `HadoopFormatIO.OUTPUT_FORMAT_CLASS_ATTR`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ pipeline
`HCatalogIO` is built for Apache HCatalog versions 2 and up and will not work out of the box for older versions of HCatalog.
The following illustrates a workaround to work with Hive 1.1.

Include the following Hive 1.2 jars in the über jar you build.
Include the following Hive 1.2 jars in the uber jar you build.
The 1.2 jars provide the necessary methods for Beam while remain compatible with Hive 1.1.

```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ data.apply(
- Example: `.withStagingBucketName("{gs,s3}://bucket/my/dir/")`

- `.withStorageIntegrationName()`
- Accepts a name of a Snowflake storage integration object created according to Snowflake documentationt.
- Accepts a name of a Snowflake storage integration object created according to Snowflake documentation.
- Example:
{{< highlight >}}
CREATE OR REPLACE STORAGE INTEGRATION "test_integration"
Expand Down Expand Up @@ -551,14 +551,14 @@ SnowflakeIO is not going to delete created CSV files from path under the “stag
- Example: `.withDebugMode(SnowflakeIO.StreamingLogLevel.INFO)`


**Important noticse**:
**Important notice**:
1. Streaming accepts only **key pair authentication**. For details, see: [Issue 21287](https://github.com/apache/beam/issues/21287).
2. The role parameter configured in `SnowflakeIO.DataSourceConfiguration` object is ignored for streaming writing. For details, see: [Issue 21365](https://github.com/apache/beam/issues/21365)

#### Flush time: duration & number of rows
Duration: streaming write will write periodically files on stage according to time duration specified in flush time limit (for example. every 1 minute).

Number of rows: files staged for write will have number of rows specified in flush row limit unless the flush time limit will be reached (for example if the limit is 1000 rows and buffor collected 99 rows and the 1 minute flush time passes, the rows will be sent to SnowPipe for insertion).
Number of rows: files staged for write will have number of rows specified in flush row limit unless the flush time limit will be reached (for example if the limit is 1000 rows and buffer collected 99 rows and the 1-minute flush time passes, the rows will be sent to SnowPipe for insertion).

Size of staged files will depend on the rows size and used compression (GZIP).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ to support data de-duplication when failures are retried by a runner), use
`ParDo`, `GroupByKey`, and other available Beam transforms.
Many data services are optimized to write batches of elements at a time,
so it may make sense to group the elements into batches before writing.
Persistant connectons can be initialized in a DoFn's `setUp` or `startBundle`
Persistent connections can be initialized in a DoFn's `setUp` or `startBundle`
method rather than upon the receipt of every element as well.
It should also be noted that in a large-scale, distributed system work can
[fail and/or be retried](/documentation/runtime/model/), so it is preferable to
Expand Down
2 changes: 1 addition & 1 deletion website/www/site/content/en/documentation/io/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,7 @@ Guidelines for creating a Beam data store Kubernetes script:

#### Jenkins jobs {#jenkins-jobs}

You can find examples of existing IOIT jenkins job definitions in [.test-infra/jenkins](https://github.com/apache/beam/tree/master/.test-infra/jenkins) directory. Look for files caled job_PerformanceTest_*.groovy. The most prominent examples are:
You can find examples of existing IOIT jenkins job definitions in [.test-infra/jenkins](https://github.com/apache/beam/tree/master/.test-infra/jenkins) directory. Look for files called job_PerformanceTest_*.groovy. The most prominent examples are:
* [JDBC](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy) IOIT job
* [MongoDB](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy) IOIT job
* [File-based](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy) IOIT jobs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ user=user-2, score=10, window=[2019-05-26T13:29:03.367Z..2019-05-26T13:29:13.367

User #1 sees two events separated by 12 seconds. With standard sessions, the gap defaults to 10 seconds; both scores are in different sessions, so the scores aren't added.

User #2 sees four events, seperated by two, seven, and three seconds, respectively. Since none of the gaps are greater than the default, the four events are in the same standard session and added together (18 points).
User #2 sees four events, separated by two, seven, and three seconds, respectively. Since none of the gaps are greater than the default, the four events are in the same standard session and added together (18 points).

#### Dynamic sessions
The dynamic sessions specify a five-second gap, so they use the following windows and scores:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The `GroupIntoBatches`-transform uses state and timers under the hood to allow t

while abstracting away the implementation details from users.

The `withShardedKey()` functionality increases parallellism by spreading one key over multiple threads.
The `withShardedKey()` functionality increases parallelism by spreading one key over multiple threads.

The transforms are used in the following way in Java & Python:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ To slowly update global window side inputs in pipelines with non-global windows:

1. Create the side input for downstream transforms. The side input should fit into memory.

The global window side input triggers on processing time, so the main pipeline nondeterministically matches the side input to elements in event time.
The global window side input triggers on processing time, so the main pipeline non-deterministically matches the side input to elements in event time.

For instance, the following code sample uses a `Map` to create a `DoFn`. The `Map` becomes a `View.asSingleton` side input that’s rebuilt on each counter tick. The side input updates every 5 seconds in order to demonstrate the workflow. In a real-world scenario, the side input would typically update every few hours or once per day.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ To test a transform you've created, you can use the following pattern:
{{< paragraph class="language-py" >}}
[TestPipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/test_pipeline.py) is a class included in the Beam Python SDK specifically for testing transforms.
{{< /paragraph >}}
For tests, use `TestPipeline` in place of `Pipeline` when you create the pipeline object. Unlike `Pipeline.create`, `TestPipeline.create` handles setting `PipelineOptions` interally.
For tests, use `TestPipeline` in place of `Pipeline` when you create the pipeline object. Unlike `Pipeline.create`, `TestPipeline.create` handles setting `PipelineOptions` internally.

You create a `TestPipeline` as follows:

Expand Down
22 changes: 11 additions & 11 deletions website/www/site/content/en/documentation/programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ public interface MyOptions extends PipelineOptions {
You can also specify a description, which appears when a user passes `--help` as
a command-line argument, and a default value.

{{< paragraph class="language-java language-py langauge-go" >}}
{{< paragraph class="language-java language-py language-go" >}}
You set the description and default value using annotations, as follows:
{{< /paragraph >}}

Expand Down Expand Up @@ -1840,7 +1840,7 @@ PCollection<String> merged = collections.apply(Flatten.<String>pCollections());
{{< /highlight >}}

{{< highlight typescript >}}
// Flatten takem an array of PCollection objects, wrapped in beam.P(...)
// Flatten taken an array of PCollection objects, wrapped in beam.P(...)
// Returns a single PCollection that contains a union of all of the elements in all input PCollections.
{{< code_sample "sdks/typescript/test/docs/programming_guide.ts" model_multiple_pcollections_flatten >}}
{{< /highlight >}}
Expand Down Expand Up @@ -1995,7 +1995,7 @@ value must be registered if used.</span>

Some other serializability factors you should keep in mind are:

* <span class="language-java language-py">Transient</span><span class="langauage-go">Unexported</span>
* <span class="language-java language-py">Transient</span><span class="language-go">Unexported</span>
fields in your function object are *not* transmitted to worker
instances, because they are not automatically serialized.
* Avoid loading a field with a large amount of data before serialization.
Expand Down Expand Up @@ -3088,7 +3088,7 @@ Beam will automatically infer the schema based on the fields and field tags of t
{{< paragraph class="language-typescript" >}}
In Typescript, JSON objects are used to represent schema'd data.
Unfortunately type information in Typescript is not propagated to the runtime layer,
so it needs to be manually specified in some places (e.g. when using cross-langauge pipelines).
so it needs to be manually specified in some places (e.g. when using cross-language pipelines).
{{< /paragraph >}}

{{< highlight java >}}
Expand Down Expand Up @@ -3733,7 +3733,7 @@ type Transaction struct{
{{< /highlight >}}

{{< paragraph class="language-go" >}}
Unexported fields are ignored, and cannot be automatically infered as part of the schema.
Unexported fields are ignored, and cannot be automatically inferred as part of the schema.
Fields of type func, channel, unsafe.Pointer, or uintptr will be ignored by inference.
Fields of interface types are ignored, unless a schema provider
is registered for them.
Expand Down Expand Up @@ -4308,7 +4308,7 @@ If there were no schema, then the applied `DoFn` would have to accept an element
since there is a schema, you could apply the following DoFn:

{{< highlight java >}}
purchases.appy(ParDo.of(new DoFn<PurchasePojo, PurchasePojo>() {
purchases.apply(ParDo.of(new DoFn<PurchasePojo, PurchasePojo>() {
@ProcessElement public void process(@Element PurchaseBean purchase) {
...
}
Expand Down Expand Up @@ -4649,7 +4649,7 @@ to register a new `Coder` for the target type.

{{< paragraph class="language-go" >}}
To set the default Coder for a Go type you use the function `beam.RegisterCoder` to register a encoder and decoder functions for the target type.
However, built in types like `int`, `string`, `float64`, etc cannot have their coders overridde.
However, built in types like `int`, `string`, `float64`, etc cannot have their coders override.
{{< /paragraph >}}

{{< paragraph class="language-java language-py" >}}
Expand Down Expand Up @@ -5416,7 +5416,7 @@ The following diagram shows data events for key X as they arrive in the
PCollection and are assigned to windows. To keep the diagram a bit simpler,
we'll assume that the events all arrive in the pipeline in order.

![Diagram of data events for acculumating mode example](/images/trigger-accumulation.png)
![Diagram of data events for accumulating mode example](/images/trigger-accumulation.png)

##### 9.4.1.1. Accumulating mode {#accumulating-mode}

Expand Down Expand Up @@ -5823,7 +5823,7 @@ to other nodes in the graph. A `DoFn` can declare multiple state variables.
<span class="language-typescript">

> **Note:** The Beam SDK for Typescript does not yet support a State and Timer API,
but it is possible to use these features from cross-langauge pipelines (see below).
but it is possible to use these features from cross-language pipelines (see below).

</span>

Expand Down Expand Up @@ -5884,7 +5884,7 @@ _ = (p | 'Read per user' >> ReadPerUser()
This is not supported yet, see https://github.com/apache/beam/issues/20510.
{{< /highlight >}}

{{< highlight typscript >}}
{{< highlight typescript >}}
{{< code_sample "sdks/typescript/test/docs/programming_guide.ts" stateful_dofn >}}
{{< /highlight >}}

Expand Down Expand Up @@ -7584,7 +7584,7 @@ that make it easier to invoke transforms from specific languages:
{{< code_sample "sdks/typescript/test/docs/programming_guide.ts" python_map >}}
```

Cross-langauge transforms can also be defined in line, which can be useful
Cross-language transforms can also be defined in line, which can be useful
for accessing features or libraries not available in the calling SDK

```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Individual capabilities have been grouped by their corresponding What / Where /

For more details on the What / Where / When / How breakdown of concepts, we recommend reading through the <a href="https://oreilly.com/ideas/the-world-beyond-batch-streaming-102">Streaming 102</a> post on O'Reilly Radar.

Note that in the future, we intend to add additional tables beyond the current set, for things like runtime characterstics (e.g. at-least-once vs exactly-once), performance, etc.
Note that in the future, we intend to add additional tables beyond the current set, for things like runtime characteristics (e.g. at-least-once vs exactly-once), performance, etc.

<!-- Summary table -->
{{< documentation/capability-matrix-single cap-data="capability-matrix" cap-style="cap-summary" cap-view="summary" cap-toggle-details=1 cap-display="block" >}}
Expand Down
4 changes: 2 additions & 2 deletions website/www/site/content/en/documentation/runners/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
--flinkMaster=<flink master url> \
--filesToStage=target/word-count-beam-bundled-0.1.jar"
{{< /highlight >}}
<!-- Span implictly ended -->
<!-- Span implicitly ended -->

{{< paragraph class="language-java" >}}
If you have a Flink `JobManager` running on your local machine you can provide `localhost:8081` for
Expand Down Expand Up @@ -234,7 +234,7 @@ options = PipelineOptions([
with beam.Pipeline(options) as p:
...
{{< /highlight >}}
<!-- Span implictly ended -->
<!-- Span implicitly ended -->

{{< paragraph class="language-portable" >}}
To run on a separate [Flink cluster](https://ci.apache.org/projects/flink/flink-docs-release-1.10/getting-started/tutorials/local_setup.html):
Expand Down
2 changes: 1 addition & 1 deletion website/www/site/content/en/documentation/runners/jet.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ cluster runs. The word count job won't be able to read the data otherwise.
<td><code>codeJarPathname</code></td>
<td>Also a property needed only when using external Jet Clusters, specifies the location of a fat jar
containing all the code that needs to run on the cluster (so at least the pipeline and the runner code). The value
is any string that is acceptad by <code>new java.io.File()</code> as a parameter.</td>
is any string that is accepted by <code>new java.io.File()</code> as a parameter.</td>
<td>Has no default value.</td>
</tr>
<tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,4 @@ When executing your pipeline with the JStorm Runner, you should consider the fol

### Monitoring your job
You can monitor your job with the JStorm UI, which displays all JStorm system metrics and Beam metrics.
For testing on local mode, you can retreive the Beam metrics with the metrics method of PipelineResult.
For testing on local mode, you can retrieve the Beam metrics with the metrics method of PipelineResult.
2 changes: 1 addition & 1 deletion website/www/site/content/en/documentation/runners/samza.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ When executing your pipeline with the Samza Runner, you can use the following pi
</tr>
<tr>
<td><code>enableMetrics</code></td>
<td>Enable/disable Beam metrics in Samza Runne.</td>
<td>Enable/disable Beam metrics in Samza Runner.</td>
<td><code>true</code></td>
</tr>
<tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ Here we've provided commands for running the example pipeline using
Gradle on a [Beam HEAD Git clone](https://github.com/apache/beam).
If you need a more stable environment, please
[setup a Java project](/get-started/quickstart-java/) that uses the latest
releaesed Beam version and include the necessary dependencies.
released Beam version and include the necessary dependencies.

### Run with Dataflow runner

Expand Down Expand Up @@ -205,7 +205,7 @@ export PYTHON_VERSION=<version>

> **Note** This output gets written to the local file system of a Python Docker
> container. To verify the output by writing to GCS, you need to specify a
> publicly acessible
> publicly accessible
> GCS path for the `output` option since portable DirectRunner is currently
> unable to correctly forward local credentials for accessing GCS.
Expand Down
Loading

0 comments on commit 1c1ecb2

Please sign in to comment.