Merge pull request #23575: [website][adhoc] Fix spellcheck errors and…

… typos
apache · Oct 12, 2022 · 1c1ecb2 · 1c1ecb2
2 parents 786ba8b + e6e4d04
commit 1c1ecb2
Show file tree

Hide file tree

Showing 28 changed files with 50 additions and 50 deletions.
diff --git a/website/www/site/content/en/community/case-study.md b/website/www/site/content/en/community/case-study.md
@@ -35,5 +35,5 @@ started!
 Want to tell the world you are using Apache Beam? Just walk
 through [this instruction](https://github.com/apache/beam/tree/master/website/ADD_LOGO.md) and make it happen!
 
-The Apache Beam PMC reserves the right to remove logos of companies that are not demeed to be in good standing in the
+The Apache Beam PMC reserves the right to remove logos of companies that are not deemed to be in good standing in the
 community.
diff --git a/website/www/site/content/en/documentation/basics.md b/website/www/site/content/en/documentation/basics.md
@@ -261,7 +261,7 @@ as summation, called a `CombineFn`, in which the output is significantly smaller
 than the input. In this case the aggregation is called `CombinePerKey`.
 
 In a real application, you might have millions of keys and/or windows; that is
-why this is still an "embarassingly parallel" computational pattern. In those
+why this is still an "embarrassingly parallel" computational pattern. In those
 cases where you have fewer keys, you can add parallelism by adding a
 supplementary key, splitting each of your problem's natural keys into many
 sub-keys. After these sub-keys are aggregated, the results can be further
@@ -611,7 +611,7 @@ For more information about Splittable `DoFn`, see the following pages:
 
 ### What's next
 
-Take a look at our [other documention](/documentation/) such as the Beam
+Take a look at our [other documentation](/documentation/) such as the Beam
 programming guide, pipeline execution information, and transform reference
 catalogs.
 
diff --git a/website/www/site/content/en/documentation/dsls/sql/extensions/joins.md b/website/www/site/content/en/documentation/dsls/sql/extensions/joins.md
@@ -67,5 +67,5 @@ strategies remain unchanged.
 ## Unbounded JOIN Bounded {#join-unbounded-bounded}
 
 For this type of `JOIN` bounded input is treated as a side-input by the
-implementation. This means that window/trigger is inherented from upstreams.
+implementation. This means that window/trigger is inherited from upstreams.
 
diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
@@ -253,7 +253,7 @@ them into JSON `TableRow` objects.
 
 {{< paragraph class="language-py" >}}
 To read from a BigQuery table using the Beam SDK for Python, apply a `ReadFromBigQuery`
-transfrom. `ReadFromBigQuery` returns a `PCollection` of dictionaries,
+transform. `ReadFromBigQuery` returns a `PCollection` of dictionaries,
 where each element in the `PCollection` represents a single row in the table.
 Integer values in the `TableRow` objects are encoded as strings to match
 BigQuery's exported JSON format.

diff --git a/website/www/site/content/en/documentation/io/built-in/hadoop.md b/website/www/site/content/en/documentation/io/built-in/hadoop.md
@@ -19,7 +19,7 @@ limitations under the License.
 
 > **IMPORTANT!** Previous implementation of Hadoop Input Format IO, called `HadoopInputFormatIO`, is deprecated starting from *Apache Beam 2.10*. Please, use current `HadoopFormatIO` which supports both `InputFormat` and `OutputFormat`.
 
-A `HadoopFormatIO` is a transform for reading data from any source or writing data to any sink that implements Hadoop's `InputFormat` or `OurputFormat` accordingly. For example, Cassandra, Elasticsearch, HBase, Redis, Postgres, etc.
+A `HadoopFormatIO` is a transform for reading data from any source or writing data to any sink that implements Hadoop's `InputFormat` or `OutputFormat` accordingly. For example, Cassandra, Elasticsearch, HBase, Redis, Postgres, etc.
 
 `HadoopFormatIO` allows you to connect to many data sources/sinks that do not yet have a Beam IO transform. However, `HadoopFormatIO` has to make several performance trade-offs in connecting to `InputFormat` or `OutputFormat`. So, if there is another Beam IO transform for connecting specifically to your data source/sink of choice, we recommend you use that one.
 
@@ -360,7 +360,7 @@ You will need to pass a Hadoop `Configuration` with parameters specifying how th
 - `mapreduce.job.outputformat.class` - The `OutputFormat` class used to connect to your data sink of choice.
 - `mapreduce.job.output.key.class` - The key class passed to the `OutputFormat` in `mapreduce.job.outputformat.class`.
 - `mapreduce.job.output.value.class` - The value class passed to the `OutputFormat` in `mapreduce.job.outputformat.class`.
-- `mapreduce.job.reduces` - Number of reduce tasks. Value is equal to number of write tasks which will be genarated. This property is not required for `Write.PartitionedWriterBuilder#withoutPartitioning()` write.
+- `mapreduce.job.reduces` - Number of reduce tasks. Value is equal to number of write tasks which will be generated. This property is not required for `Write.PartitionedWriterBuilder#withoutPartitioning()` write.
 - `mapreduce.job.partitioner.class` - Hadoop partitioner class which will be used for distributing of records among partitions. This property is not required for `Write.PartitionedWriterBuilder#withoutPartitioning()` write.
 
 _Note_: All mentioned values have appropriate constants. E.g.: `HadoopFormatIO.OUTPUT_FORMAT_CLASS_ATTR`.

diff --git a/website/www/site/content/en/documentation/io/built-in/hcatalog.md b/website/www/site/content/en/documentation/io/built-in/hcatalog.md
@@ -70,7 +70,7 @@ pipeline
 `HCatalogIO` is built for Apache HCatalog versions 2 and up and will not work out of the box for older versions of HCatalog.
 The following illustrates a workaround to work with Hive 1.1.
 
-Include the following Hive 1.2 jars in the über jar you build.
+Include the following Hive 1.2 jars in the uber jar you build.
 The 1.2 jars provide the necessary methods for Beam while remain compatible with Hive 1.1.
 
 ```

diff --git a/website/www/site/content/en/documentation/io/built-in/snowflake.md b/website/www/site/content/en/documentation/io/built-in/snowflake.md
@@ -478,7 +478,7 @@ data.apply(
   - Example: `.withStagingBucketName("{gs,s3}://bucket/my/dir/")`
 
 - `.withStorageIntegrationName()`
-  - Accepts a name of a Snowflake storage integration object created according to Snowflake documentationt.
+  - Accepts a name of a Snowflake storage integration object created according to Snowflake documentation.
   - Example:
 {{< highlight >}}
 CREATE OR REPLACE STORAGE INTEGRATION "test_integration"
@@ -551,14 +551,14 @@ SnowflakeIO is not going to delete created CSV files from path under the “stag
   - Example: `.withDebugMode(SnowflakeIO.StreamingLogLevel.INFO)`
 
 
-**Important noticse**:
+**Important notice**:
 1. Streaming accepts only **key pair authentication**. For details, see: [Issue 21287](https://github.com/apache/beam/issues/21287).
 2. The role parameter configured in `SnowflakeIO.DataSourceConfiguration` object is ignored for streaming writing. For details, see: [Issue 21365](https://github.com/apache/beam/issues/21365)
 
 #### Flush time: duration & number of rows
 Duration: streaming write will write periodically files on stage according to time duration specified in flush time limit (for example. every 1 minute).
 
-Number of rows: files staged for write will have number of rows specified in flush row limit unless the flush time limit will be reached (for example if the limit is 1000 rows and buffor collected 99 rows and the 1 minute flush time passes, the rows will be sent to SnowPipe for insertion).
+Number of rows: files staged for write will have number of rows specified in flush row limit unless the flush time limit will be reached (for example if the limit is 1000 rows and buffer collected 99 rows and the 1-minute flush time passes, the rows will be sent to SnowPipe for insertion).
 
 Size of staged files will depend on the rows size and used compression (GZIP).
 

diff --git a/website/www/site/content/en/documentation/io/developing-io-overview.md b/website/www/site/content/en/documentation/io/developing-io-overview.md
@@ -182,7 +182,7 @@ to support data de-duplication when failures are retried by a runner), use
 `ParDo`, `GroupByKey`, and other available Beam transforms.
 Many data services are optimized to write batches of elements at a time,
 so it may make sense to group the elements into batches before writing.
-Persistant connectons can be initialized in a DoFn's `setUp` or `startBundle`
+Persistent connections can be initialized in a DoFn's `setUp` or `startBundle`
 method rather than upon the receipt of every element as well.
 It should also be noted that in a large-scale, distributed system work can
 [fail and/or be retried](/documentation/runtime/model/), so it is preferable to

diff --git a/website/www/site/content/en/documentation/io/testing.md b/website/www/site/content/en/documentation/io/testing.md
@@ -389,7 +389,7 @@ Guidelines for creating a Beam data store Kubernetes script:
 
 #### Jenkins jobs {#jenkins-jobs}
 
-You can find examples of existing IOIT jenkins job definitions in [.test-infra/jenkins](https://github.com/apache/beam/tree/master/.test-infra/jenkins) directory. Look for files caled job_PerformanceTest_*.groovy. The most prominent examples are:
+You can find examples of existing IOIT jenkins job definitions in [.test-infra/jenkins](https://github.com/apache/beam/tree/master/.test-infra/jenkins) directory. Look for files called job_PerformanceTest_*.groovy. The most prominent examples are:
 * [JDBC](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy) IOIT job
 * [MongoDB](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy) IOIT job
 * [File-based](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy) IOIT jobs

diff --git a/website/www/site/content/en/documentation/patterns/custom-windows.md b/website/www/site/content/en/documentation/patterns/custom-windows.md
@@ -91,7 +91,7 @@ user=user-2, score=10, window=[2019-05-26T13:29:03.367Z..2019-05-26T13:29:13.367
 
 User #1 sees two events separated by 12 seconds. With standard sessions, the gap defaults to 10 seconds; both scores are in different sessions, so the scores aren't added.
 
-User #2 sees four events, seperated by two, seven, and three seconds, respectively. Since none of the gaps are greater than the default, the four events are in the same standard session and added together (18 points).
+User #2 sees four events, separated by two, seven, and three seconds, respectively. Since none of the gaps are greater than the default, the four events are in the same standard session and added together (18 points).
 
 #### Dynamic sessions
 The dynamic sessions specify a five-second gap, so they use the following windows and scores:

diff --git a/...ocumentation/patterns/grouping-elements-for-efficient-external-service-calls.md b/...ocumentation/patterns/grouping-elements-for-efficient-external-service-calls.md
@@ -37,7 +37,7 @@ The `GroupIntoBatches`-transform uses state and timers under the hood to allow t
 
 while abstracting away the implementation details from users.
 
-The `withShardedKey()` functionality increases parallellism by spreading one key over multiple threads.
+The `withShardedKey()` functionality increases parallelism by spreading one key over multiple threads.
 
 The transforms are used in the following way in Java & Python:
 

diff --git a/website/www/site/content/en/documentation/patterns/side-inputs.md b/website/www/site/content/en/documentation/patterns/side-inputs.md
@@ -37,7 +37,7 @@ To slowly update global window side inputs in pipelines with non-global windows:
 
 1. Create the side input for downstream transforms. The side input should fit into memory.
 
-The global window side input triggers on processing time, so the main pipeline nondeterministically matches the side input to elements in event time.
+The global window side input triggers on processing time, so the main pipeline non-deterministically matches the side input to elements in event time.
 
 For instance, the following code sample uses a `Map` to create a `DoFn`. The `Map` becomes a `View.asSingleton` side input that’s rebuilt on each counter tick. The side input updates every 5 seconds in order to demonstrate the workflow. In a real-world scenario, the side input would typically update every few hours or once per day.
 

diff --git a/website/www/site/content/en/documentation/pipelines/test-your-pipeline.md b/website/www/site/content/en/documentation/pipelines/test-your-pipeline.md
@@ -52,7 +52,7 @@ To test a transform you've created, you can use the following pattern:
 {{< paragraph class="language-py" >}}
 [TestPipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/test_pipeline.py) is a class included in the Beam Python SDK specifically for testing transforms.
 {{< /paragraph >}}
-For tests, use `TestPipeline` in place of `Pipeline` when you create the pipeline object. Unlike `Pipeline.create`, `TestPipeline.create` handles setting `PipelineOptions` interally.
+For tests, use `TestPipeline` in place of `Pipeline` when you create the pipeline object. Unlike `Pipeline.create`, `TestPipeline.create` handles setting `PipelineOptions` internally.
 
 You create a `TestPipeline` as follows:
 

diff --git a/website/www/site/content/en/documentation/programming-guide.md b/website/www/site/content/en/documentation/programming-guide.md
@@ -276,7 +276,7 @@ public interface MyOptions extends PipelineOptions {
 You can also specify a description, which appears when a user passes `--help` as
 a command-line argument, and a default value.
 
-{{< paragraph class="language-java language-py langauge-go" >}}
+{{< paragraph class="language-java language-py language-go" >}}
 You set the description and default value using annotations, as follows:
 {{< /paragraph >}}
 
@@ -1840,7 +1840,7 @@ PCollection<String> merged = collections.apply(Flatten.<String>pCollections());
 {{< /highlight >}}
 
 {{< highlight typescript >}}
-// Flatten takem an array of PCollection objects, wrapped in beam.P(...)
+// Flatten taken an array of PCollection objects, wrapped in beam.P(...)
 // Returns a single PCollection that contains a union of all of the elements in all input PCollections.
 {{< code_sample "sdks/typescript/test/docs/programming_guide.ts" model_multiple_pcollections_flatten >}}
 {{< /highlight >}}
@@ -1995,7 +1995,7 @@ value must be registered if used.</span>
 
 Some other serializability factors you should keep in mind are:
 
-* <span class="language-java language-py">Transient</span><span class="langauage-go">Unexported</span>
+* <span class="language-java language-py">Transient</span><span class="language-go">Unexported</span>
   fields in your function object are *not* transmitted to worker
   instances, because they are not automatically serialized.
 * Avoid loading a field with a large amount of data before serialization.
@@ -3088,7 +3088,7 @@ Beam will automatically infer the schema based on the fields and field tags of t
 {{< paragraph class="language-typescript" >}}
 In Typescript, JSON objects are used to represent schema'd data.
 Unfortunately type information in Typescript is not propagated to the runtime layer,
-so it needs to be manually specified in some places (e.g. when using cross-langauge pipelines).
+so it needs to be manually specified in some places (e.g. when using cross-language pipelines).
 {{< /paragraph >}}
 
 {{< highlight java >}}
@@ -3733,7 +3733,7 @@ type Transaction struct{
 {{< /highlight >}}
 
 {{< paragraph class="language-go" >}}
-Unexported fields are ignored, and cannot be automatically infered as part of the schema.
+Unexported fields are ignored, and cannot be automatically inferred as part of the schema.
 Fields of type  func, channel, unsafe.Pointer, or uintptr will be ignored by inference.
 Fields of interface types are ignored, unless a schema provider
 is registered for them.
@@ -4308,7 +4308,7 @@ If there were no schema, then the applied `DoFn` would have to accept an element
 since there is a schema, you could apply the following DoFn:
 
 {{< highlight java >}}
-purchases.appy(ParDo.of(new DoFn<PurchasePojo, PurchasePojo>() {
+purchases.apply(ParDo.of(new DoFn<PurchasePojo, PurchasePojo>() {
   @ProcessElement public void process(@Element PurchaseBean purchase) {
       ...
   }
@@ -4649,7 +4649,7 @@ to register a new `Coder` for the target type.
 
 {{< paragraph class="language-go" >}}
 To set the default Coder for a Go type you use the function `beam.RegisterCoder` to register a encoder and decoder functions for the target type.
-However, built in types like `int`, `string`, `float64`, etc cannot have their coders overridde.
+However, built in types like `int`, `string`, `float64`, etc cannot have their coders override.
 {{< /paragraph >}}
 
 {{< paragraph class="language-java language-py" >}}
@@ -5416,7 +5416,7 @@ The following diagram shows data events for key X as they arrive in the
 PCollection and are assigned to windows. To keep the diagram a bit simpler,
 we'll assume that the events all arrive in the pipeline in order.
 
-![Diagram of data events for acculumating mode example](/images/trigger-accumulation.png)
+![Diagram of data events for accumulating mode example](/images/trigger-accumulation.png)
 
 ##### 9.4.1.1. Accumulating mode {#accumulating-mode}
 
@@ -5823,7 +5823,7 @@ to other nodes in the graph. A `DoFn` can declare multiple state variables.
 <span class="language-typescript">
 
 > **Note:** The Beam SDK for Typescript does not yet support a State and Timer API,
-but it is possible to use these features from cross-langauge pipelines (see below).
+but it is possible to use these features from cross-language pipelines (see below).
 
 </span>
 
@@ -5884,7 +5884,7 @@ _ = (p | 'Read per user' >> ReadPerUser()
 This is not supported yet, see https://github.com/apache/beam/issues/20510.
 {{< /highlight >}}
 
-{{< highlight typscript >}}
+{{< highlight typescript >}}
 {{< code_sample "sdks/typescript/test/docs/programming_guide.ts" stateful_dofn >}}
 {{< /highlight >}}
 
@@ -7584,7 +7584,7 @@ that make it easier to invoke transforms from specific languages:
 {{< code_sample "sdks/typescript/test/docs/programming_guide.ts" python_map >}}
 ```
 
-Cross-langauge transforms can also be defined in line, which can be useful
+Cross-language transforms can also be defined in line, which can be useful
 for accessing features or libraries not available in the calling SDK
 
 ```

diff --git a/website/www/site/content/en/documentation/runners/capability-matrix/_index.md b/website/www/site/content/en/documentation/runners/capability-matrix/_index.md
@@ -32,7 +32,7 @@ Individual capabilities have been grouped by their corresponding What / Where /
 
 For more details on the What / Where / When / How breakdown of concepts, we recommend reading through the <a href="https://oreilly.com/ideas/the-world-beyond-batch-streaming-102">Streaming 102</a> post on O'Reilly Radar.
 
-Note that in the future, we intend to add additional tables beyond the current set, for things like runtime characterstics (e.g. at-least-once vs exactly-once), performance, etc.
+Note that in the future, we intend to add additional tables beyond the current set, for things like runtime characteristics (e.g. at-least-once vs exactly-once), performance, etc.
 
 <!-- Summary table -->
 {{< documentation/capability-matrix-single cap-data="capability-matrix" cap-style="cap-summary" cap-view="summary" cap-toggle-details=1 cap-display="block" >}}

diff --git a/website/www/site/content/en/documentation/runners/flink.md b/website/www/site/content/en/documentation/runners/flink.md
@@ -155,7 +155,7 @@ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
       --flinkMaster=<flink master url> \
       --filesToStage=target/word-count-beam-bundled-0.1.jar"
 {{< /highlight >}}
-<!-- Span implictly ended -->
+<!-- Span implicitly ended -->
 
 {{< paragraph class="language-java" >}}
 If you have a Flink `JobManager` running on your local machine you can provide `localhost:8081` for
@@ -234,7 +234,7 @@ options = PipelineOptions([
 with beam.Pipeline(options) as p:
     ...
 {{< /highlight >}}
-<!-- Span implictly ended -->
+<!-- Span implicitly ended -->
 
 {{< paragraph class="language-portable" >}}
 To run on a separate [Flink cluster](https://ci.apache.org/projects/flink/flink-docs-release-1.10/getting-started/tutorials/local_setup.html):

diff --git a/website/www/site/content/en/documentation/runners/jet.md b/website/www/site/content/en/documentation/runners/jet.md
@@ -205,7 +205,7 @@ cluster runs. The word count job won't be able to read the data otherwise.
   <td><code>codeJarPathname</code></td>
   <td>Also a property needed only when using external Jet Clusters, specifies the location of a fat jar
   containing all the code that needs to run on the cluster (so at least the pipeline and the runner code). The value
-  is any string that is acceptad by <code>new java.io.File()</code> as a parameter.</td>
+  is any string that is accepted by <code>new java.io.File()</code> as a parameter.</td>
   <td>Has no default value.</td>
 </tr>
 <tr>

diff --git a/website/www/site/content/en/documentation/runners/jstorm.md b/website/www/site/content/en/documentation/runners/jstorm.md
@@ -108,4 +108,4 @@ When executing your pipeline with the JStorm Runner, you should consider the fol
 
 ### Monitoring your job
 You can monitor your job with the JStorm UI, which displays all JStorm system metrics and Beam metrics.
-For testing on local mode, you can retreive the Beam metrics with the metrics method of PipelineResult.
+For testing on local mode, you can retrieve the Beam metrics with the metrics method of PipelineResult.
diff --git a/website/www/site/content/en/documentation/runners/samza.md b/website/www/site/content/en/documentation/runners/samza.md
@@ -179,7 +179,7 @@ When executing your pipeline with the Samza Runner, you can use the following pi
 </tr>
 <tr>
   <td><code>enableMetrics</code></td>
-  <td>Enable/disable Beam metrics in Samza Runne.</td>
+  <td>Enable/disable Beam metrics in Samza Runner.</td>
   <td><code>true</code></td>
 </tr>
 <tr>

diff --git a/website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md b/website/www/site/content/en/documentation/sdks/java-multi-language-pipelines.md
@@ -142,7 +142,7 @@ Here we've provided commands for running the example pipeline using
 Gradle on a [Beam HEAD Git clone](https://github.com/apache/beam).
 If you need a more stable environment, please
 [setup a Java project](/get-started/quickstart-java/) that uses the latest
-releaesed Beam version and include the necessary dependencies.
+released Beam version and include the necessary dependencies.
 
 ### Run with Dataflow runner
 
@@ -205,7 +205,7 @@ export PYTHON_VERSION=<version>
 
 > **Note** This output gets written to the local file system of a Python Docker
 > container. To verify the output by writing to GCS, you need to specify a
-> publicly acessible
+> publicly accessible
 > GCS path for the `output` option since portable DirectRunner is currently
 > unable to correctly forward local credentials for accessing GCS.