Fixing coderabbitai PR comments

datahub-project · Jul 1, 2024 · f6025a1 · f6025a1
1 parent c7f59d3
commit f6025a1
Show file tree

Hide file tree

Showing 5 changed files with 12 additions and 13 deletions.
diff --git a/metadata-ingestion/docs/sources/abs/README.md b/metadata-ingestion/docs/sources/abs/README.md
@@ -37,4 +37,4 @@ We are working on using iterator-based JSON parsers to avoid reading in the enti
 
 ### Profiling
 
-Profiling is not available in the current release.
+Profiling is not available in the current release.
diff --git a/metadata-ingestion/docs/sources/abs/abs.md b/metadata-ingestion/docs/sources/abs/abs.md
@@ -1,13 +1,13 @@
 
 ### Path Specs
 
-Path Specs (`path_specs`) is a list of Path Spec (`path_spec`) objects where each individual `path_spec` represents one or more datasets. Include path (`path_spec.include`) represents formatted path to the dataset. This path must end with `*.*` or `*.[ext]` to represent leaf level. If `*.[ext]` is provided then files with only specified extension type will be scanned. "`.[ext]`" can be any of [supported file types](#supported-file-types). Refer [example 1](#example-1---individual-file-as-dataset) below for more details.
+Path Specs (`path_specs`) is a list of Path Spec (`path_spec`) objects, where each individual `path_spec` represents one or more datasets. The include path (`path_spec.include`) represents a formatted path to the dataset. This path must end with `*.*` or `*.[ext]` to represent the leaf level. If `*.[ext]` is provided, then only files with the specified extension type will be scanned. "`.[ext]`" can be any of the [supported file types](#supported-file-types). Refer to [example 1](#example-1---individual-file-as-dataset) below for more details.
 
-All folder levels need to be specified in include path. You can use `/*/` to represent a folder level and avoid specifying exact folder name. To map folder as a dataset, use `{table}` placeholder to represent folder level for which dataset is to be created. For a partitioned dataset, you can use placeholder `{partition_key[i]}` to represent name of `i`th partition and `{partition[i]}` to represent value of `i`th partition. During ingestion, `i` will be used to match partition_key to partition. Refer [example 2 and 3](#example-2---folder-of-files-as-dataset-without-partitions) below for more details.
+All folder levels need to be specified in the include path. You can use `/*/` to represent a folder level and avoid specifying the exact folder name. To map a folder as a dataset, use the `{table}` placeholder to represent the folder level for which the dataset is to be created. For a partitioned dataset, you can use the placeholder `{partition_key[i]}` to represent the name of the `i`th partition and `{partition[i]}` to represent the value of the `i`th partition. During ingestion, `i` will be used to match the partition_key to the partition. Refer to [examples 2 and 3](#example-2---folder-of-files-as-dataset-without-partitions) below for more details.
 
-Exclude paths (`path_spec.exclude`) can be used to ignore paths that are not relevant to current `path_spec`. This path cannot have named variables ( `{}` ). Exclude path can have `**` to represent multiple folder levels. Refer [example 4](#example-4---folder-of-files-as-dataset-with-partitions-and-exclude-filter) below for more details.
+Exclude paths (`path_spec.exclude`) can be used to ignore paths that are not relevant to the current `path_spec`. This path cannot have named variables (`{}`). The exclude path can have `**` to represent multiple folder levels. Refer to [example 4](#example-4---folder-of-files-as-dataset-with-partitions-and-exclude-filter) below for more details.
 
-Refer [example 5](#example-5---advanced---either-individual-file-or-folder-of-files-as-dataset) if your container has more complex dataset representation.
+Refer to [example 5](#example-5---advanced---either-individual-file-or-folder-of-files-as-dataset) if your container has a more complex dataset representation.
 
 **Additional points to note**
 - Folder names should not contain {, }, *, / in their names.

diff --git a/metadata-ingestion/docs/sources/s3/README.md b/metadata-ingestion/docs/sources/s3/README.md
@@ -1,5 +1,5 @@
 This connector ingests AWS S3 datasets into DataHub. It allows mapping an individual file or a folder of files to a dataset in DataHub. 
-To specify the group of files that form a dataset, use `path_specs` configuration in ingestion recipe. Refer section [Path Specs](https://datahubproject.io/docs/generated/ingestion/sources/s3/#path-specs) for more details.
+Refer to the section [Path Specs](https://datahubproject.io/docs/generated/ingestion/sources/s3/#path-specs) for more details.
 
 :::tip
 This connector can also be used to ingest local files.

diff --git a/metadata-ingestion/src/datahub/ingestion/source/azure/abs_util.py b/metadata-ingestion/src/datahub/ingestion/source/azure/abs_util.py
@@ -1,5 +1,4 @@
 import logging
-import logging
 import os
 import re
 from typing import Iterable, Optional, Dict, List
@@ -84,7 +83,7 @@ def get_abs_properties(
     use_abs_blob_properties: Optional[bool] = False,
 ) -> Dict[str, str]:
     if azure_config is None:
-        raise ValueError("No container_client available.")
+        raise ValueError("Azure configuration is not provided. Cannot retrieve container client.")
 
     blob_service_client = azure_config.get_blob_service_client()
     container_client = blob_service_client.get_container_client(
@@ -196,7 +195,7 @@ def get_abs_tags(
 ) -> Optional[GlobalTagsClass]:
     # Todo add the service_client, when building out this get_abs_tags
     if azure_config is None:
-        raise ValueError("container_client not set. Cannot browse abs")
+        raise ValueError("Azure configuration is not provided. Cannot retrieve container client.")
 
     tags_to_add: List[str] = []
     blob_service_client = azure_config.get_blob_service_client()
@@ -241,7 +240,7 @@ def list_folders(
     container_name: str, prefix: str, azure_config: Optional[AzureConnectionConfig]
 ) -> Iterable[str]:
     if azure_config is None:
-        raise ValueError("azure_config not set. Cannot browse Azure Blob Storage")
+        raise ValueError("Azure configuration is not provided. Cannot retrieve container client.")
 
     abs_blob_service_client = azure_config.get_blob_service_client()
     container_client = abs_blob_service_client.get_container_client(container_name)

diff --git a/metadata-ingestion/src/datahub/ingestion/source/data_lake_common/data_lake_utils.py b/metadata-ingestion/src/datahub/ingestion/source/data_lake_common/data_lake_utils.py
@@ -113,11 +113,11 @@ def get_bucket_name(path: str) -> str:
         raise ValueError(f"Unable to get bucket name from path: {path}")
 
     def get_sub_types(self) -> str:
-        if self.platform == "s3":
+        if self.platform == PLATFORM_S3:
             return DatasetContainerSubTypes.S3_BUCKET
-        elif self.platform == "gcs":
+        elif self.platform == PLATFORM_GCS:
             return DatasetContainerSubTypes.GCS_BUCKET
-        elif self.platform == "abs":
+        elif self.platform == PLATFORM_ABS:
             return DatasetContainerSubTypes.ABS_CONTAINER
         raise ValueError(f"Unable to sub type for platform: {self.platform}")
Original file line number	Diff line number	Diff line change
Expand Up		@@ -37,4 +37,4 @@ We are working on using iterator-based JSON parsers to avoid reading in the enti

		### Profiling

		Profiling is not available in the current release.
		Profiling is not available in the current release.