-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[datastore] new excludeFromIndexes syntax should allow for a catch-all on object properties #2510
Comments
Thanks for the detailed issue. Anything is technically possible, but I believe our discussion is turning more towards "what would the average user want?" and I'm not sure enough to make the call on this. It would be nice to get an opinion from the Datastore team behind the |
Personally, I've come from using the Objectify datastore library for Java (https://github.com/objectify/objectify). I found it's interface to be very intuitive, so much so that I've created my own Datastore library in JavaScript to try emulate it's behaviour as much as possible (Pebblebed). My understanding of the Datastore has always been that embedded objects are not going to be indexed. That if you want something indexed it should be put in an indexed property on the base entity - I believe Objectify follows this same idea, and does allow embedded objects with properties which are indexed but apparently creates synthetic properties in the top-level Entity object to do so. (https://github.com/objectify/objectify/wiki/Entities#embedded-object-native-representation) So from my user perspective, I see it as the norm to rather exclude an entire embedded object from being indexed instead of it being the norm to have to exclude each and every property itself. But this is obviously just my experience with using the Datastore. I understand there might be others who expect to interact with it differently, so interested to see what others think on this issue. All I'm hoping for is at least some way to exclude a property wholly from being indexed. |
I have requested for some input from the datastore team internally. |
Somewhat of a long answer, I'll break it down into (1) why did we decide to do this and (2) why it's different from objectify. Hope this clarifies a bit why the API is defined like it is. (1) As an example of how this is useful is looking at something like an Address. This type of information may belong in either an embedded entity, or in a stand alone entity. If you were designing this in an ORM system, you might want to write something like:
This will allow you to issue queries on part of the index (e.g. Note that it is possible to exclude the entire (2) |
@pcostell Thanks for the long and detailed reply. (just noticed you wrote I can totally understand the new direction you guys have gone with embedded entities and being able to index properties of those objects within the parent entity. It makes a lot of sense and it's opened my mind to some new possibilities with the Datastore. I must say though that the way it seems to have been implemented, having to include I use embedded entities to store POJO in Java, or in JavaScript's case just regular JSON objects, of large and complex structures. These are embedded objects because I know that the properties within them should never be indexed, but I would like access to them once I've queried for the entity later on. To have to now mark each and every property within these objects individually that I would not like indexed just doesn't seem intuitive in the least. Embedded objects have always been just that - objects which are embedded inside the entity... To suddenly start making (as default behaviour) properties within those objects act like top-level entity properties doesn't seem natural. I don't fully understand your reasoning with why EDIT: I think I do understand now what you were saying about excluding the entire I guess that makes some of my points moot. But I would still like a way to just store an entire POJO / JSON object without having to think about these things. Of course, I could in this case just do a I feel like having this behaviour of having to exclude instead of targeting directly what to include will create a real issue of people creating indexes where they really shouldn't be - which is something I try and watch out for as much as possible to keep my Datastore entities as lean and performant as possible. I can't imagine how many people have probably indexed entire embedded objects unknowingly already because of this behaviour. |
Just adding an additional opinion. Datastore is backed by BigTable. This means that it doesn't have traditional indexes. Instead, every index in Datastore manifests as a new row in BigTable with a specialised rowkey. This means that datastore tables with a large number of columns will translate into a massive number of simultaneous writes to BigTable (makes a row for each column index). Add some ancestor structure and throughput of writes drops significantly. Since our streaming architecture can burst to thousands of writes per second, we can't afford to index all column. I know, we could just use BigTable, but that requires maintaining instances (which are far more expensive than just using datastore). Also BigTable doesn't guarantee writes like datastore. Anyway, before writing any data to datastore, we add excludeFromIndexes to all values.
Not a problem, but it would be nice to change the default to have no indexes. I believe a major reason that new users complain about performance issues when using datastore with Node is because they are indexing a large number of columns with some sort of ancestor structure. Just our teams experiences. Happy to provide more data if needed. |
Just to agree with basically what @briangranatir is saying, and follow up if there has been any progress or thoughts about this? I think there should be an option to have all properties unindexed as the default save behaviour - I'd much rather be explicit about what I'd like to query on than have performance issues and potentially a bloated Datastore with properties which should never have been indexed. This is exacerbated by the current behaviour of having to try explicitly say what you would not like indexed, especially in deep child objects on your entities. Perhaps if there was a property on save introduced such as an |
Hi I like the idea of @lostpebble. By default embedded entities should have all their indexes set to false.
and an @lostpebble I didn't quiet understand when you said "But validation (1500 byte length for each field) is still enacted on all the properties incase in the future". That means that the embedded entities is not indexed but the limitation on its properties still applies? |
@sebelga from what I can gather from what @pcostell wrote about this, the validation is still enacted on the inner properties of an entity you have marked as All in all, it's a little confusing and seems like they're planning beyond what is practical in most user's eyes. Though, if the time ever comes I might appreciate that I can easily now index those inner properties. The only solution I see for storing regular large Javascript objects unindexed completely and easily, is to implement some kind of serialisation to a single string value which you make |
@lostpebble thanks for the explanation. I am still confused if this new way of setting the excludeFromIndexes replaces the old one (where it was set on each entity property) or if both are working together. EDIT: // This will correctly not index the property "title"
const entity = {
data: { title: 'hello' },
excludeFromIndexes: ['title'],
key: {}
}
// This will do the same
const entity = {
data: [{
name: 'title',
value: 'hello',
excludeFromIndexes: true,
}],
key: {}
}
// This will correctly not index the property "address"
const entity = {
data: { address: { street: 'abc', city: 'abc' } },
excludeFromIndexes: ['address'],
key: {}
}
// This will do the same
const entity = {
data: [{
name: 'addresss',
value: { street: 'abc', city: 'abc' },
excludeFromIndexes: true,
}],
key: {}
}
// This will correctly *not* index any of the values of the Array
const entity = {
data: [{
name: 'myArray',
value: ['1', '2', '3'],
excludeFromIndexes: true,
}],
key: {}
}
// Here we would expect the same behaviour but we get
// the error: "Exclude from indexes cannot be set on a list value"
const entity = {
data: { myArray: ['1', '2', '3'] },
excludeFromIndexes: ['myArray'],
key: {}
} |
@pcostell I am coming back to ask if the inconsistency I mentioned above "is" an inconsistency or if it like that by design. Repecting the list values that can't be declared like embedded entities in the "array format". Thanks 👍 |
This issue was moved to googleapis/nodejs-datastore#12. |
Playing around with this today (after seeing this issue #1916 has been resolved) and I'm noticing that you have to define each and every embedded object property that you would like unindexed, instead of being able to just define a single "catch all"
excludeFromIndexes
option.This works (actually it doesn't really, see edit below...):
But this does not:
In the second example I'm still getting an error for
stringThing
andotherStringThing
being longer than 1500 bytes. Is there no way to define that it catches all the properties in an embedded object?Maybe something like this, if you'd like the catch all to have a more intentional syntax:
Though, I would hope that for this syntax of
propertyName.*
, which seems to say catch all properties of the object at propertyName, would also catch things that are not embedded objects but also simply a long string at that propertyName. I would just like a way to define that data stored at a certain property of an entity should not be indexed at all - be it a string, embedded object or whatever.EDIT:
Upon thinking about it more, why does the syntax in my second example not work? I think putting a
*
wildcard shouldn't be necessary actually. You've deliberately indicated you do not want that property indexed and that should mean the entire property, be it an embedded object and all it's properties or just a long string. If you'd still like to index some properties of an embedded object, then you'd define those which you want unindexed and leave out the ones you want indexed.This is confusing because upon looking at my entities in the datastore console it appears that my first example is actually wrong. I should have included the base
testEmbeddedObject
in my exclusion array too, so it would look like this now:Otherwise, the datastore console still sees that "base" property as indexed even though there is no data on it.
Environment details
The text was updated successfully, but these errors were encountered: