-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Apache Solr Support #574
Conversation
external zookeeper cluster and an appropriate collection has been created. | ||
Make sure to pass the following properties as parameters to 'bin/ycsb' script. | ||
|
||
cloud.mode=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to clash with above, is the property cloud.mode
or solr.cloud
, The latter is prefered as it refers to the binding it belongs to in the name.
Can you update all the license headers to include 2016? |
|
||
if (cloudMode) { | ||
cloudClient.add(table, doc); | ||
cloudClient.commit(table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit commits for each record will result in bad performance. Typically commits should be done within a certain amount of time or have the Solr server do auto soft/hard commits. This is different than a standard DB. It should be made configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default in ycsb, to allow comparisons across systems, is to do a commit per record. you are correct that this is terrible for most systems.
other bindings make it configurable (hbase, cassandra, mongo, etc) and then just default to commit-per-transaction with instructions in their README for folks who can allow for client side batching.
* discussion of error codes. | ||
*/ | ||
@Override | ||
public Status update(String table, String key, HashMap<String, ByteIterator> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought here: Solr supports updating documents without doing a query/update manually. https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents and here how to do it with SolrJ http://yonik.com/solr/atomic-updates/
It would be helpful if, before this gets merged, you could squash all the commits into a single commit and then include in the commit message "[solr]" at the beginning. |
@@ -139,6 +139,11 @@ LICENSE file. | |||
<artifactId>s3-binding</artifactId> | |||
<version>${project.version}</version> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whitespace
|
||
HashMap<String, ByteIterator> entry; | ||
|
||
for (SolrDocument hit : response.getResults()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pull response.getResults() into a variable? multiple calls to getResults.
Looks good to me. Could you please squash the commits and make the commit message start with "[solr]"? |
Awesome, thanks! Commits have been squashed into 1 commit |
If folks want to take a final look, I'll be merging this in ~5-6 hours barring feedback. Thanks for all your work on this @ghaughian! |
Looks good to me. Thanks @ghaughian |
@ghaughian - There are a few comments from @madrob on PR #583 that should be addressed here. They include: |
@risdenk addressing those now; thanks. |
updating readme updating package info perfecting logic for http solr clients for all operations renamed properties, tested cloud mode and cleaned code removed dependency on dynamic field names, updated readme now enforcing checkstyle adding solr artifact removing test cases relying on external dependencies removed unused maven dependencies, added batch mode support, all try blocks now catch eplicit exceptions, Query/UpdateResponse status codes are handled more granularly, updated readme, added sample schema.xml file to support default field names in ycsb client, updated all license headers to 2016, using SolrClient object as primary client type regardless if Solr is running in Cloud or Stand-alone mode cleaned code and config files, now accepting a solr base url property, simplified sample schema.xml file, renamed class to SolrClient, now updating documents atomically, added batch support to delete method updated new line spacing of pom file comments removed sample schema file, updated readme with more indepth explanation on running/setting up the solr-binding removed some code lines no longer in use renamed zookeeper param name, now throwing caught exceptions where appropriate, debug messages are now being logged on stderr now returning an appropriate error if we receive an unexpected response from solr server, repeated calls to getResults is no longer now using singletonMap to store update params in, fixed typo and missing id field in sample config in README
[solr] Adding Apache Solr Support
[solr] Adding Apache Solr Support
[solr] Adding Apache Solr Support
Since ElasticSearch is supported I thought it would be beneficial to have Solr supported also. I'm sure there are people out there who would love to compare head-to-head these 2 search engine technologies.