Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Questions and Concerns #19

Open
bradenmacdonald opened this issue Oct 3, 2024 · 1 comment
Open

Design Questions and Concerns #19

bradenmacdonald opened this issue Oct 3, 2024 · 1 comment

Comments

@bradenmacdonald
Copy link

Hey, thanks again for the work here. I'm trying to understand the approach taken so far, and I do have some concerns about a few parts.

  1. How does the overall architecture compare to edx-search and to the Studio Search Beta ?
  2. The indexer API seems to be tied to Django querysets - how do we index things like XBlocks, which don't have a corresponding Django model and aren't loaded using querysets?
  3. Are there any working indexers? How does this CoursewareContent thing relate to an indexer?
  4. Is there a way to run this in parallel with elasticsearch, to switch the frontend back and forth between the new backend and the old, and compare the results?
  5. Based on the README, it seems that the index configuration is defined in the django settings? What's the rationale for this? In my experience the index settings are complicated and at least the defaults need to be defined in python code by the project. We haven't yet had a use case for allowing installations to override the index config.
  6. How will the index be updated? It seems right now there is only a one-time management command to reindex everything.
  7. Can the existing index be used while an index is being updated?
  8. How do we know this API is abstract enough, if we only support one backend? We should at least add support for Algolia.
  9. How does one use this on the frontend? I would like to see a modern approach like this, not having the MFE load a .js file from the Django LMS.
  10. search_library.js is using really old-fashioned JavaScript code ("use strict", .prototype., XMLHttpRequest). It should be using TypeScript, ES modules, fetch, and have the interface and/or base class properly defined using TypeScript. I don't see why it needs to be loaded from the LMS if there's no dynamic data included in the script. It should just be bundled into the MFEs like any other frontend dependency. Again, please check out how the studio search has been implemented.
  11. The buildQuery API in search_library.js will need to be way more complicated to abstract the type of queries we actually need to make for use cases like Studio Search. See this example of a multi-search query, this "facet values" search, and this tags keyword search. Note that not only are these complex queries with lots of parameters, but they are using three different Meilisearch APIs depending on the type of the search.

The complexity of our queries in 11 really makes me wonder if it's feasible to abstract away our Studio use cases with an abstraction layer at all. If we're going to proceed with that, I think the best way is to actually: first, replace Meilisearch with Algolia or TypeSense (no abstraction layer) and see if they support the same use cases, then second, design a sufficient abstraction API.

Another option is to only provide an abstraction for the most basic content search use cases in the LMS and forum, and require Meilisearch for advanced Studio searches.

A third option is to just support Meilisearch alone, which is what we've done to date.

@qasimgulzar
Copy link
Contributor

qasimgulzar commented Oct 4, 2024

Thank you @bradenmacdonald for taking sometime and review the openedx-search-api. I will try to address you concerns point by point.

  1. edx-search is totally backend-bound architecture: it was missing essential paradigms to integrate Meilisearch and modern search engines. Even if we want to use Elasticsearch with token authentication, it is not there out of the box: we will have make significant changes into it. On the other hand we have covered those scenarios, like to utilise personalised tokens with search services. We now have one management command to index/reindex data to Meilisearch. It also has the capability to support filters and index selection so you can select specific index to populate.
  2. Well, we have added support for django queryset as well as the concept of content class, the data related to xblock will be formatted for indexing using content classes. here you can see an example of content class as it is edx-platform specific class so it is better suited to put in edx-platform.
  3. You will have to switch these edx-platform and frontend-app-learning branches to see the working demo. otherwise there is a demo video attached.
  4. Not exactly but it is direly required; we can make that change for testing purpose.
  5. Yes, index settings are very complicated to overcome this complexity, I came up with an idea to structure the configurations in to settings file you can override default settings by changing dict in LMS/CMS settings but otherwise the default setting is already there in plugin's settings
  6. Answered it in detail in point 1, just to mention again this command has capability to set index name and filters too.
  7. Yes we can use existing indexes if there is any usecase to use the same index. I created new one to keep it disconnected from legacy search engine integrations. that way it will be very helpful when we will start removing the legacy integration out of edx-platform or any other service we will be working with.
  8. I have tried it overriding with Algolia already just for my internal testing, but as we are only focused and concerned about bringing in opensource integrations. I didn't put extra effort to make it part of our API.
  9. We no longer require to load js file into MFE: it's an open option but we can also install it as a plugin. You can install by adding this to package.json like this.
  10. Your Point 10 and 11 are more relevant to each other so addressing collectively. Yes this is bit of a complex syntax. I wouldn't say it's a problem with architecture, but I am already planning to migrate JavaScript library to TypeScript. That way we can maintain it more in object-oriented fashion. It will also help use enhancing the readability and reduce the complexity as well. The old fashion xhr calls will also be replaced in that migration but as short term solution I will be pushing cleaner code with fetch API as initially the intention was to make it generic enough to utilised in app where fetch is not available. But as we are not going that far and sticking with MFEs only. It make sense to use fetch api instead of conventional xhr calls.

Your general recommendations are very practical and I would appreciate, but I would say: if we have to do it why not do it at once? By adding multiple steps to implement on functionality wouldn't be very helpful but add technical debt specially when the project is open source. The motivation behind project on of this search api was to reduce technical debt added by edx-search and introduce meilsearch into the platform. So I would recommend as of we both are on same page we need to put effort in same direction that way we can achieve this integration more effectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants