Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[druid] Superset cannot see realtime datasource with no segment #2764

Closed
3 tasks done
zhiyuanouyang opened this issue May 15, 2017 · 7 comments
Closed
3 tasks done

[druid] Superset cannot see realtime datasource with no segment #2764

zhiyuanouyang opened this issue May 15, 2017 · 7 comments

Comments

@zhiyuanouyang
Copy link

zhiyuanouyang commented May 15, 2017

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

newest

Expected results

I use Druid Kafka-indexing to ingest realtime data into Druid datasource, and I can query using http://localhost:8082/druid/v2/datasources, the datasource shows there. I also do some query from command line and I can get responded results. But this dataset cannot be seen in superset even after I refresh the metadata of datasources. I do not know how to solve it

Actual results

Steps to reproduce

@mistercrunch
Copy link
Member

Depending on your version of Druid, the "segment metadata" endpoint exposed by Druid that Superset needs needs for a physical segment to exist in order to report the metadata. In these earlier versions of Druid, it appears as though there was no other ways to get the schema and column type information we require. Normally once the first time granularity time period closes (say the first 24 hours) then you'll get a segment and then Superset will discorver it.

@mistercrunch mistercrunch changed the title Superset cannot see realtime datasource [druid] Superset cannot see realtime datasource with no segment May 16, 2017
@zhiyuanouyang
Copy link
Author

zhiyuanouyang commented May 16, 2017

Thanks for your reply. But in the Kafka-Indexing-Service script, I specified the segment granularity to "MINUTE", but I still cannot see anything from superset after several minutes. Here is my script.

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "superset-kafka-realtime",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "auto"
        },
        "dimensionsSpec": {
          "dimensions": [
            "gctype"
          ]
        }
      }
    },
    "metricsSpec": [
      {
        "name" : "count",
        "type" : "longSum",
        "fieldName" : "count"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "MINUTE",
      "queryGranularity": "NONE"
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "maxRowsPerSegment": 5000000
  },
  "ioConfig": {
    "topic": "test",
    "consumerProperties": {
      "bootstrap.servers": "localhost:9092"
    },
    "taskCount": 1,
    "replicas": 1,
    "taskDuration": "P30D"
  }
}

Is there any configuration to set to control the granularity you are talking about? Say I want to see the data update in every minute. Really appreciate!

@mistercrunch
Copy link
Member

What is your Druid version? Do you have historical nodes setup? Does the segment metadata endpoint return column and type information?

Superset uses the segment metadata feature in Druid to provide the information it needs. The question of "how do I get version X of Druid to show segment metadata" really is a Druid question. The answer is probably to find a way for Druid to create the physical segment, or to upgrade Druid, though that's a question for the Druid community.

FYI at Airbnb we have a framework that makes it easy to load data from our Kafka topics into Druid that works along with our logging framework which uses Thrift to serialize and as a way to enforce schemas. In our case this framework creates the Superset datasources in an alternative way (not using the segment metadata endpoint) and that allows us to have more control over things with a script that is more specific.

@nipunb
Copy link

nipunb commented Nov 23, 2017

We are running into the same issue. The segment granularity is set to MINUTE. The druid/v2/datasources returns the data source in question (kafka) but it doesnt show up in Superset. What is segment metadata endpoint?

@nipunb
Copy link

nipunb commented Nov 23, 2017

I checked the druid metadata endpoint - druid/coordinator/v1/metadata/datasources
The datasource in question doesnt show up there, it does show up in the druid/v2/datasources but not in the metadata.

@mistercrunch
Copy link
Member

Which version of Druid?

@nipunb
Copy link

nipunb commented Dec 1, 2017

It took ~1hr but it showed up after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants