Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIv4 - Return id_field as part of Entity.get #20457

Merged
merged 1 commit into from
Jun 1, 2021

Conversation

colemanw
Copy link
Member

@colemanw colemanw commented Jun 1, 2021

Overview

Improves APIv4 metadata so it returns the name of the unique identifier field for each entity (usually but not always named "id").

Technical Details

All entities have a unique identifier field, usually named 'id' but some entities the field is named something else. e.g. Afform uses 'name' as the identifier.

This returns the name of the field as part of Entity.get, and it's also available directly from each API class e.g. Contact::getInfo().

All entities have a unique identifier field, usually named 'id'
but some entities the field is named something else.
e.g. Afform uses 'name' as the identifier.

This returns the name of the field as part of Entity.get,
and it's also available directly from each API class e.g. Contact::getInfo().
@civibot
Copy link

civibot bot commented Jun 1, 2021

(Standard links)

@civibot civibot bot added the master label Jun 1, 2021
@@ -132,8 +132,17 @@ public static function delete($checkPermissions = TRUE) {
* @return BasicReplaceAction
*/
public static function replace($checkPermissions = TRUE) {
return (new BasicReplaceAction(static::getEntityName(), __FUNCTION__))
return (new BasicReplaceAction(static::getEntityName(), __FUNCTION__, static::$idField))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug. Tests caught it when I changed the name of the id_field for this mock entity.

@colemanw
Copy link
Member Author

colemanw commented Jun 1, 2021

@totten with this PR you can get the id_field for any API entity directly from the API class. If you don't know the name of the class, CoreUtil can look it up for you.

CoreUtil::getApiClass('Contact')::getInfo()['id_field']; // returns 'id'
CoreUtil::getApiClass('Afform')::getInfo()['id_field']; // returns 'name'

@eileenmcnaughton
Copy link
Contributor

@colemanw I don't suppose this PR makes it better or worse but I do have some concerns about caching for the Entity::get data - it doesn't seem to be cached?

@colemanw
Copy link
Member Author

colemanw commented Jun 1, 2021

@colemanw I don't suppose this PR makes it better or worse but I do have some concerns about caching for the Entity::get data - it doesn't seem to be cached?

No this PR doesn't affect that.
You're right it isn't cached anywhere. Maybe it would be good to do so, or maybe it would be a waste of memory since it really isn't used for much. Afaik it's only used by the API Explorer and the Search Kit admin screen.

@eileenmcnaughton
Copy link
Contributor

@colemanw we added a call for it to the monolog extension because sometimes logging is called before the entity is available & it fatals - but it seems kinda expensive tbh

@totten
Copy link
Member

totten commented Jun 1, 2021

FWIW, caching was also my main concern about getInfo().

Granted, CustomValue::getInfo() is fairly light - but in all other variants, you have things like:

  • Parse doc blocks (AbstractEntity::getInfo())
  • Loop through all DAO fields and highlight FKs (EntityBridge::getInfo())
  • Loop through all traits and cleanup names (AbstractEntity::getInfo())

Additionally, note that - even with its fairly conservative usage right now - there are things like getBAOFromApiName($entityName) which rely on getInfo(). As a consumer, I would expect that getBAOFromApiName() would be pretty light (ie it wouldn't need to parse docblocks+fields to figure that out).

There is a parallel issue around caching the list of fields, although the status-quo is a bit different. It's already cached... Each Action instance has a method $this->entityFields() with a copy of the field list. This is useful for, say, looping through fields and applying validation-rules. But if the validation work is being done by a subscriber, then it can't access $apiRequest->entityFields() because it's protected.

To my eye, the entity-info and the entity-fields are very similar metadata and should follow the same lifecycle / coding-style / visibility.

I guess the first question about caching is more like... how long should the cache live?

  • (a) Cache should live for many PHP requests (e.g. Civi::cache('long') or Civi::cache('fields'))
  • (b) Cache should live for many PHP requests, but only if it's memcache/redis (e.g Civi::cache('short'))
  • (c) Cache should live for the duration of one PHP request (e.g. Civi::$static)
  • (d) Cache should live for the duration of one API call (e.g. $apiRequest->entityFields())
  • (e) Cache should be impromptu, driven by each consumer of the data
  • (f) Metadata should not be cached

Personally, my first pick would be (c) per-request/Civi::$static, but really anything from (a)-(d) seems OK as long as:

  • the data is accessible from listeners/subscribers
  • we can change our mind (ie there's a clear spot where that policy has been implemented)

@eileenmcnaughton
Copy link
Contributor

I think the performance here degrades the more extensions you have installed? Which probably pushes me to a or b

Note that we use the Civi::cache('metadata') more than Civi::cache('fields') - I think the policy is the same but metadata is flushed in more places - whenever an entity likely to relate to metadata (eg. PriceSet, MembershipType, Custom Field) is changed

@seamuslee001
Copy link
Contributor

Broadly this looks sensible to me and I can see how it would help. I tend to agree with Eileen re caching options

@colemanw
Copy link
Member Author

colemanw commented Jun 1, 2021

Tests are happy, let's merge this & do caching in a separate PR.

@colemanw colemanw merged commit a076350 into civicrm:master Jun 1, 2021
@colemanw colemanw deleted the id_field branch June 1, 2021 11:55
@totten
Copy link
Member

totten commented Jun 1, 2021

I think the performance here degrades the more extensions you have installed? Which probably pushes me to a or b

I don't think so... At the risk of being pedantic, my read on the trade-off (qty computations vs qty local-memory vs qty external IO calls) is that the typical resource-usage (Θ(...)) for a PHP request would behave as follows:

  • With (d) API-call-caching
    • Computation: Θ(total #API calls) (if you call Foo.create 10x, then you compute Foo metadata 10x)
    • Memory: Θ(max #nested API calls) (if you call Foo.replace which recurses to Foo.create, then you have 2x copies of Foo metadata)
    • I/O: Θ(1)
  • With(c) per PHP-request static-caching
    • Computation: Θ(#distinct activated API entities) (if you call Foo.create 10x, then you compute Foo 1x)
    • Memory is also Θ(#distinct activated API entities)
    • I/O: Θ(1)
  • (With (b), it depends on environment, but either it behaves like (c) or (a).)
  • With (a) long-caching (and no prefetching or batching)
    • Computation Θ(1)
    • Memory: Θ(#distinct activated API entities)
    • I/O: Θ(#distinct activated API entities)
  • With (a) long-caching (and some kind of prefetching or batching)
    • Computation: Θ(1)
    • Memory: Θ(#entities in the entire system) (if you have have extensions which cumulatively define 100 entities, then you load 100x metadata into memory)
    • I/O: Θ(#entities in the entire system)

I suppose in some scenarios, the #distinct activated entities and #entities in the entire system would be similar (e.g. long-running batch-processor with heterogeneous tasks; e.g. building a power-tool akin to API Explorer). In that edge-case, maybe (a)+prefetching is technically a little better, but probably not noticeably better, and I don't think those edge-cases are representative.

@totten
Copy link
Member

totten commented Jun 1, 2021

My gut agrees with you that we don't want Θ(#entities in the entire system). So probably not (a)+prefetch. (Though that could be wrong... it's just a gut sense...)

(c) and (a)+no-prefetch have the same formulas except they trade computations versus I/O. (Surprise!) IMHO, between those, it boils down to:

  • How much you fear stale caches. (Longer caches mean greater impact from cache-bugs...)
  • Whether you think 1x computation of getInfo()/entityFields() will be faster or slower than 1x external I/O for the same. (Gutguessgamblopinion: APCU-IO beats computation which beats SQL-IO, but Secretariat beats all of them.)

@eileenmcnaughton
Copy link
Contributor

The io is something like 2 or 3 file look ups per entity per enabled extension or module isn't it?

@totten
Copy link
Member

totten commented Jun 2, 2021

I'm not seeing that with respect to getInfo per se...

Ex: Suppose you run Contact.create. It needs metadata (defaults, idField, etc), so it calls getInfo()/entityFields(). It's true that this is influenced by 3x PHP files for the specific entity(CRM/Contact/DAO/Contact.php, CRM/Contact/BAO/Contact.php, Civi/Api4/Contact.php, etc). However, those 3x PHP files would be required anyway... because you can't read or write Contacts without all 3 files.

The reason why I think caching makes sense for getInfo() or entityFields() is that they involve additional work, above/beyond just requireing the PHP file. (To wit: scanning/filtering docblocks, traits, fields.) I think it's entirely reasonable to have 5 different callers that want to know the id_field, title, dao, etc -- but I don't think that each read of id_field should trigger a scan of docblocks+traits+fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants