Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk indexing to Elasticsearch binding #412

Open
fraser-m opened this issue Sep 21, 2015 · 16 comments
Open

Add bulk indexing to Elasticsearch binding #412

fraser-m opened this issue Sep 21, 2015 · 16 comments

Comments

@fraser-m
Copy link

Hi,

I am looking to use YCSB to perform some performance testing on an Elasticsearch cluster I have set up to try and see if there are any possible bottlenecks. I've got it installed however, I have some queries- sorry if this is the wrong place but i can't see anywhere else to raise this!

The docs say you can set a custom config file, such as the following:
cluster.name=es.ycsb.cluster
node.local=true
path.data=$TEMP_DIR/esdata
discovery.zen.ping.multicast.enabled=false
index.mapping._id.indexed=true
index.gateway.type=none
gateway.type=none
index.number_of_shards=1
index.number_of_replicas=0
es.index.key=es.ycsb

Obviously, the node.local option will be false, however is there anyway specifics that should be set to specify the cluster if it is remote, or should i add another node to the cluster and run this from there?

Thanks

@fraser-m fraser-m changed the title Potential Doc Issue- Benchmarking Elasticsearch cluster Benchmarking Elasticsearch cluster Sep 21, 2015
@busbey
Copy link
Collaborator

busbey commented Sep 21, 2015

Ugh. Looks like the README never got update for working against a cluster.

My apologies. If we can get this to work and update to avoid future folks having to read the source would be good. I don't have a ES cluster handy, so this might take some back and forth. :)

It looks like you'll need:

  • elasticsearch.remote = true
  • elasticsearch.hosts.list set to a comma separated list of nodes.

Everything else looks like sane defaults to me.

@fraser-m
Copy link
Author

Thanks for getting back to me busbey.

Ok, so i've implemented the parts you suggested and it's still not quite there. A couple of things:

It seems to be picking up the cluster.name entry as the starting node- not sure if this is correct and then it attempts to connect to the remote hosts on localhost:9300.

is there anything else i should(or shouldn't) be doing here that you are aware of?

Thanks again for your assistance and response.

@busbey
Copy link
Collaborator

busbey commented Sep 22, 2015

Could you post the command line you're using?
On Sep 22, 2015 5:06 AM, "fraser-m" [email protected] wrote:

Thanks for getting back to me busbey.

Ok, so i've implemented the parts you suggested and it's still not quite
there. A couple of things:

It seems to be picking up the cluster.name entry as the starting node-
not sure if this is correct and then it attempts to connect to the remote
hosts on localhost:9300.

is there anything else i should(or shouldn't) be doing here that you are
aware of?

Thanks again for your assistance and response.


Reply to this email directly or view it on GitHub
#412 (comment)
.

@fraser-m
Copy link
Author

Hi,

The commands are as follows:

./bin/ycsb load elasticsearch -P workloads/workloada -P elasticcluster.data -s

./bin/ycsb runelasticsearch -P workloads/workloada -P elasticcluster.data -s

If you need anything else, just let me know.

Thanks again for your help.

@busbey
Copy link
Collaborator

busbey commented Sep 23, 2015

I don't see the ES config values.

Presuming they're in elastic cluster.data. could you post it with any
sensitive data redacted?
On Sep 23, 2015 3:47 AM, "fraser-m" [email protected] wrote:

Hi,

The commands are as follows:

./bin/ycsb load elasticsearch -P workloads/workloada -P
elasticcluster.data -s

./bin/ycsb runelasticsearch -P workloads/workloada -P elasticcluster.data
-s

If you need anything else, just let me know.

Thanks again for your help.


Reply to this email directly or view it on GitHub
#412 (comment)
.

@fraser-m
Copy link
Author

Hi,

config file below:

cluster.name=
node.local=false
path.data=$TEMP_DIR/esdata
discovery.zen.ping.multicast.enabled=false
index.mapping._id.indexed=true
index.gateway.type=none
gateway.type=none
index.number_of_shards=1
index.number_of_replicas=0
es.index.key=es.ycsb
elasticsearch.remote = true
elasticsearch.hosts.lists = <hostnames of the 3 hosts>

@saggarsunil
Copy link
Contributor

There are a few changed which needs to be done for ycsb for ES. YCSB client has been compiled with old version of ES java client and when i updated it to latest, certain APIs failed ( which i have fixed. ) and i agree there is lack of documentation and a user needs to go through the code.

I am running the ES cluster and trying out ycsb on it.

My plan is to update the pom.xml, readME, & relevant code ( + code comments )

Regards
Sunil Saggar

@busbey
Copy link
Collaborator

busbey commented Oct 20, 2015

excellent news @saggarsunil . it would be best if you could add an additional module that works with the new ES client along side the current one that works with the old client. See the hbase modules for an example.

@saggarsunil
Copy link
Contributor

Sure. I will look into it and give it a try.

@risdenk
Copy link
Collaborator

risdenk commented Jan 22, 2016

@saggarsunil were the ES changes you were referring to fixed with PR #552 ?

@saggarsunil
Copy link
Contributor

Yes, it seems changes are already merged. Great.

I have some more changes related to using bulk indexing in my local github.

I will go through the latest merged code and try to merge mine.

Are you looking for anything else ?
On 23 Jan 2016 03:31, "Kevin Risden" [email protected] wrote:

@saggarsunil https://github.com/saggarsunil were the ES changes you
were referring to fixed with PR #552
#552 ?


Reply to this email directly or view it on GitHub
#412 (comment)
.

@busbey
Copy link
Collaborator

busbey commented Feb 1, 2016

Nothing else needed. Please try to make sure the commit for your PR has a message that begins with the module(s) impacted, i.e. "[elasticsearch]"

@busbey
Copy link
Collaborator

busbey commented Feb 24, 2016

any status update on the bulk indexing for elastic search?

@saggarsunil
Copy link
Contributor

yes, in couple of days. Testing code.

On Wed, Feb 24, 2016 at 7:20 AM, Sean Busbey [email protected]
wrote:

any status update on the bulk indexing for elastic search?


Reply to this email directly or view it on GitHub
#412 (comment)
.


Thanks
Sunil Saggar

@saggarsunil
Copy link
Contributor

It may more time than expected. I tried to test it but it seems the
original code itself is broken, Juggling between multiple assignments.
Please bear with me.

On Wed, Feb 24, 2016 at 5:26 PM, Sunil Saggar [email protected]
wrote:

yes, in couple of days. Testing code.

On Wed, Feb 24, 2016 at 7:20 AM, Sean Busbey [email protected]
wrote:

any status update on the bulk indexing for elastic search?


Reply to this email directly or view it on GitHub
#412 (comment)
.


Thanks
Sunil Saggar


Thanks
Sunil Saggar

@busbey
Copy link
Collaborator

busbey commented Mar 1, 2016

no worries, we all get busy. take your time, we'll have plenty more releases. 😄

@busbey busbey changed the title Benchmarking Elasticsearch cluster Add bulk indexing to Elasticsearch binding Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants