-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DFC Connector Data Capture Feature & Store #24
Comments
I think to move forward there are two main blockers for now.
|
@jgaehring The triple store will be separate from all participating platforms. I'd have a preference to stand something up on the Infomaniak's Jelastic Cloud instance we're using to host the Shopify apps. At this stage we aren't trying to integrate with anything... just (securely) store the data somewhere, so it can be managed, by the members, as their data commons in the future. Lets have a quick chat about what might work...are you around tomorrow? I'm free 1-2pm or 4-4:30 (UK) . On the other blocker - @lecoqlibre is on vacation this week, but I think around next week... maybe we should all talk together next week? |
For my own sake, I'm just noting the snapshot of the semantizer's mixin implementation as it stands right now, although it is considered unstable: |
@jgaehring Notes from our call: We agreed to modify the export function(s) in the static area of connector-codegen (for ts, ruby & php), to check a parameter & if TRUE, we POST the exported JSONLD to our triple store. We'll start with the PHP verion (Big Barn), then TS, then Ruby. |
As discussed in today's tech call, this is the relevant part of the TypeScript codegen implementation (pending merge of PR #20) where the call to semantizer's connector-codegen/src/org/datafoodconsortium/connector/codegen/typescript/static/src/Connector.ts Lines 182 to 188 in 2c8507a
That "wrapper" can be moved lower down the stack to the internals of the semantizer, once it reaches its next stable release, but that later change shouldn't require breaking changes to either the connector or the semantizer's APIs. Therefore, I believe there should be no problem implementing the data capture feature with the existing alpha version of the semantizer, since costs prohibit that being upgraded in the near future regardless, without incurring significant tech debt once the stable release becomes available. |
What do you think about using the observer pattern to decouple the data-capture feature from the connector itself? We would have a method to register a new observer for the export method like Each time the In the client code you want to capture data from, you will just have register a handler of your choice (which can be implemented is a separated package and even in a DFC related one if you want like You can also export a pre-configured import { Connector } from "@datafoodconsortium/connector-data-capture";
const connector = new Connector();
connector.export(...); // this will trigger the data-capture handler |
@jgaehring queried whether its best ot use composer or PHAR for unit testing. @lecoqlibre will confirm here. |
So my working assumption has been that an instance of the // composer.json (for PHP)
{
"require": {
"datafoodconsortium/connector": "^v2.0.0-rc.1"
}
} then add something like this to their # .env file
EXPERIMENTAL_DATA_CAPTURE_ENABLED=true
EXPERIMENTAL_DATA_CAPTURE_EXPORT_URL=https://api.example.com/json-ld/ and that's it, a very minimal contract. To activate the data capture functionality, consumers only have to modify configuration files without the need to update their application code, which is the principal objective for this pre-release API. Although there is a simpler path to achieve that, I have taken your recommendation, @lecoqlibre, of employing the observer pattern for this plugin/mixin/whatchamacallit. The $connector = new Connector();
$observer = new DataCapture("https://api.example.com/json-ld/");
$connector->attach($observer); But for the aforementioned pre-release version, in order to eliminate the need for consumers to modify their application code, the Here's what that looks like in my current PHP implementation: connector-codegen/src/org/datafoodconsortium/connector/codegen/php/static/src/Connector.php Lines 42 to 50 in ed57bbc
That entire $ofnCapture = new DataCapture($_ENV["OFN_CAPTURE_URL"]);
$connector->attach($ofnCapture); but that's up to them. A separate option parameter could be included in the The connector-codegen/src/org/datafoodconsortium/connector/codegen/php/static/src/Connector.php Lines 139 to 142 in ed57bbc
The exact methods that will support the attachment of observers are declared explicitly and exposed as public constants: connector-codegen/src/org/datafoodconsortium/connector/codegen/php/static/src/Connector.php Lines 14 to 22 in ed57bbc
The private connector-codegen/src/org/datafoodconsortium/connector/codegen/php/static/src/Connector.php Lines 34 to 40 in ed57bbc
Any observer can then limit its event scope either by passing the correct string, accessing one of the constants, or omitting the parameter entirely so it defaults to all events: // These 2 are equivalent:
$connector->attach($observer, "export");
$connector->attach($observer, $connector->EVENT_EXPORT);
// These 3 are equivalent:
$connector->attach($observer);
$connector->attach($observer, "*");
$connector->attach($observer, $connector->EVENT_WILDCARD); To my mind, that's adequately decoupled from the data capture functionality, or any future plugins that someone wishes to develop. The Burning Question 🔥 🤔The burning question, which I raised with @RaggedStaff earlier today, is this: Do we want to roll that out with a temporary pre-release config option like I described above, where the environment variables are hardcoded and, if detected, the observer will be attached automatically in the constructor? Or do we jump straight to the intended stable API, where none of that's hardcoded, but the library's consumers must commit some minor changes to their application code in order to get the data capture plugin to work? |
Discussed in https://github.com/orgs/datafoodconsortium/discussions/30
Originally posted by jgaehring March 25, 2024
Objective
Enable remote data capture functionality in the DFC connector, as requested by
the FDC Governance Circle, so that data may be captured within the DFC
Network and relayed to an independent triple store that will act as a Data
Commons.
Proposal
While we discussed the possible necessity of incorporating the data capture
mechanism into the code generator's templates, I've realized that may not ever
be necessary or even desirable. In all three implementations of the connector,
the core request/response logic can be found within the main
Connector
classor its modules (such as the
JsonldStream
importer and exporter in the case ofthe TypeScript implementation), which are all contained within the static code
directories and not produced through code generation. Because these
import/export methods are indirectly invoked by all semantic object subclasses'
getters, setters, adders and removers, it would be the ideal place to inject
optional hooks that could extend the import/export behavior.
A good model for this kind of extension might be the axios library's
interceptor pattern:
Internally the axios interceptors are private members of the
InterceptorManager
, with a separate instantiation for both request andresponse cycle. The interceptors can also be "ejected":
Some consideration should be given to the API for the connector and the
corresponding getters and setters that will actually invoke the capturing logic.
The getters and setter can differ in behavior, with some being synchronous and
others asynchronous, while the capturing behavior will always be asynchronous.
But we could generally take an approach such as the following:
Where
logger
could be a function (or two functions, to handle both success anderror results), or an instance of a
Logger
class with a wider variety ofconfigurable options, or both.
As for the triple store, where logs will be sent, there are a lot of options. To
begin, the DFC prototype could be used for running integration tests in the
local development environment. If that achieves much of desired outcomes, a fork
of that could be prepared for deployment. A more customized solution could be
built with SemApps, but might require more development. Another extenuating
factor is the degree to which OFN's stakeholders would like this store to be
integrated with OFN's core software and regional server instances.
As for the triple store to send logs to, there are many options, depending upon
how tightly integrated with the core OFN software and server instances OFN's
stakeholders wish this to be, as opposed to a totally independent server that
core OFN knows nothing about. It may be more difficult to judge with much
accuracy the cost and time required to stand up a maintainable instance of the
triple store based on these decisions and a more detailed conversation. In any
case, however, the proposed logging interceptor should work just the same, since
the only parameter it will strictly require should be a location to send the
logs to. Different logging interceptors can be adapted to different behaviors as
desired, and even combined, since this would enable multiple interceptors. The
flexibility of the interceptor pattern may in fact allow for more incremental
development of the triple store and how it is deployed to production.
Requirements
.import.use()
andexport.use()
method, a general interfacefor the function or
Interceptor
class they would each accept as arguments,and the implementations of those functions or classes as the actually
ImportLogger
andExportLogger
. Obviously, the names for all these classesand methods can be decided upon later. These will first be implemented in
TypeScript.
implementation(s), extending the existing TypeScript connector tests as
appropriate. These will only mock the intended triple store behavior.
local instance of a triple store, possibly based on the DFC prototype or
SemApps, that can receive and store JSON-LD logs. Preferably this local
instance will be containerized so it's easy to replicate on a staging server,
or perhaps as the basis for store that can eventually go into production for
the data commons.
Milestones
import.use()
andexport.use()
methods, interfaces,classes, and corresponding unit tests.
and the data capture interceptors specifically.
Estimated Time and Cost
Milestones 1 and 2 will each require roughly 15 hours of development time, and
their order is more or less interchangeable. Depending decisions on how best to
develop, test, and deploy the triple store, milestone 3 could vary widely,
potentially as little as 6-12 development hours, or over 30 dev hours, if more
customization is required beyond simply running an off-the-shelf solution.
Similarly, milestone 4 is difficult to assess at this time, but would require at
least the same amount of dev hours, possibly more.
The contingencies in milestones 2 and 3 makes this a very imprecise estimation,
costing anywhere from $4,410 to $12,600 and taking 1 to 3 months to
complete. We can speak in further detail on the expectations for the triple
store as we go ahead with the connector features, or wait until a clearer set of
requirements can be determined for all 3 milestones.
Further discussions have highlighted that the Semantizer libraries are having functionality upgraded to support mixins. This is a dependency for this work: the Data Capture functionality will be included as a mixin.
The text was updated successfully, but these errors were encountered: