Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Idea: Add "official" web search UI #4627

Closed
JanPokorny opened this issue Jan 6, 2022 · 33 comments
Closed

[Feature] Idea: Add "official" web search UI #4627

JanPokorny opened this issue Jan 6, 2022 · 33 comments

Comments

@JanPokorny
Copy link

There are already some web UIs to search Scoop packages, like my favorite https://shovel.sh (no affiliation with the so-named fork) or https://rasa.github.io/scoop-directory/search that searches more buckets (which may not necessarily be a good thing?). They simplify work by searching more buckets (not necessarily just the locally added ones) and offering better UX (direct links to manifest, app website etc.).

I think that it would be a good idea to create a "canonical" one under this org (to centralize development) and feature it on the https://scoop.sh website. This would also serve as a nice showcase for Scoop, since any potential new user arriving on the website can check what apps are available to install.

(I would create the issue in the website's repo, but I couldn't find it. Please respond with the link if I'm just being blind/dumb.)

@rashil2000
Copy link
Member

There have been many such efforts in the past, but none of them is official as of now. Since all of them work pretty nicely (and each with a different method/scope), it has not been a priority to merge it here. Though if anyone of the authors is willing to merge it here, PRs are most welcome!

(I would create the issue in the website's repo, but I couldn't find it. Please respond with the link if I'm just being blind/dumb.)

Website gets served from this repo's gh-pages branch, so no different repo.

@JanPokorny
Copy link
Author

JanPokorny commented Jan 6, 2022

While centralizing development is not necessary, at least putting a link/searchbar on the main page seems like a beneficial step. Ping @mertd -- would you be ok with a link (or a <form>) on scoop.sh leading to shovel.sh?

@rashil2000
Copy link
Member

Putting a link to a single search utility would give users the impression that it is Scoop's official search implementation. We would certainly not like to give preference to any community search utility. That is, unless the author is willing to merge his implementation in the upstream.

@HUMORCE
Copy link
Member

HUMORCE commented Jan 6, 2022

github advanced search is my first choice for searching manifests, then extras/everything😂

@rashil2000
Copy link
Member

Well, if we're talking about personal use - I wrote a tiny PowerShell script to search for manifests, as I don't really like leaving the terminal 😜

https://github.com/rashil2000/scripts/blob/main/Find-Scoop.ps1

@mertd
Copy link

mertd commented Jan 12, 2022

Thank you for the the shout-out @JanPokorny! Of course I would not have any issues with the shovel search being linked or integrated on scoop.sh.

I also understand your concerns @rashil2000. Some ideas to address these (may be one or a combination of these):

  • compare existing web search utilities transparently and embrace one
  • mark the link/integration explicitly as a community effort
  • link multiple community sourced search tools (this may include CLI tools)
  • integrate upstream as per your suggestion -- will need some discussion with the author for future development
    • assimilate fully and add author to scoop.sh team?
    • add submodule and glue code?

@rasa
Copy link
Member

rasa commented Jan 13, 2022

And we even could split this up into

  1. A back-end crawler that generates a db,
  2. A CLI search tool that queries the db,
  3. A web search tool that queries the db

I'm not attached to scoop-directory being chosen, but it, or ScoopSearch seem to be good candidates for 1.

Edit: shovel would also be a fine choice for 1. Yet all of these are not build on top of Powershell, which powers Scoop. Does that matter? In my opinion, no.

@rashil2000
Copy link
Member

rashil2000 commented Jan 13, 2022

Since shovel.sh searches the known buckets, it fits nicely into the "official" tool criteria.

  • integrate upstream as per your suggestion -- will need some discussion with the author for future development

    • assimilate fully and add author to scoop.sh team?

I think this is how we can go about it.

  • Transfer the repos shovel and shovel-data into ScoopInstaller org. Note that the author @mertd will continue to have full admin rights over these repos.
  • Rename them to scoop.sh and scoop.sh-data respectively.
  • In the shovel.sh website, add a homepage (currently it takes directly to search form). This will serve as the new homepage for Scoop.
  • Finally, modify the DNS settings for scoop.sh domain to point to ScoopInstaller/scoop.sh@master instead of current ScoopInstaller/Scoop@gh-pages. This step needs to be done by @lukesampson since they own the domain.

@rashil2000
Copy link
Member

rashil2000 commented Jan 13, 2022

or ScoopSearch seem to be good candidates for 1.

We should rope in @gpailler into the conversation too. ScoopSearch is as a good a candidate as shovel.sh, and it sits in a convenient GitHub org of its own.

The procedure for merging upstream would be roughly the same as above.

@mertd
Copy link

mertd commented Jan 13, 2022

I understand that the only change you would require to be made to code or other content would be to integrate the current contents of the scoop.sh front page. The author of the integrated web search (be that @gpailler or me) will retain copyright and ownership (under the respective license) of the transferred repositories, but should discuss major changes with other members of the @ScoopInstaller organization. Is this how you envisioned it @rashil2000?

If so, then I think I like this approach. The original goal for shovel.sh was to offer search functionality for scoop.sh the way formulae.brew.sh does for brew.sh. With this and some work to make the manifest pages indexable for external search engines (mertd/shovel#10), that goal would be fulfilled.

@rashil2000
Copy link
Member

Yes, precisely

@rasa
Copy link
Member

rasa commented Jan 14, 2022

Can we discuss the benefits and drawbacks of each option before we commit to a particular tool? For example, I think there are many benefits to the back-end crawler creating an indexed database that can be quickly searched. This seems preferable to a large json object. I guess it could create both, if we wanted, but the db would be preferable, imo.

@JanPokorny
Copy link
Author

If you create an SQLite database, you can even quickly query it from the frontend https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/

@rashil2000
Copy link
Member

If you create an SQLite database, you can even quickly query it from the frontend https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/

Yes, this is what scoop-directory uses :)

Although the DB file isn't that big - just around 7-8MB

@gpailler
Copy link

Thanks for your interest in ScoopSearch guys 😄

It would be great to have an official web search-engine for Scoop packages and I'm willing to help on that and to transfer the repos if it makes sense to you.

ScoopSearch backend is hosted on Azure (Azure Search and Azure Functions) and costs USD0/month (Free Tier). Even with more traffic, I think that we will stick to the Free Tier. I can transfer/configure the backend to an "official" scoop.sh Azure subscription too.

I also checked quickly, and we can dump the full search index to JSON if we want to provide online and offline search capabilities.

Merging all our projects and providing a unified solution to search the Scoop packages is definitely the best solution for the community.

So up to you now !

@rashil2000
Copy link
Member

I also checked quickly, and we can dump the full search index to JSON if we want to provide online and offline search capabilities.

Just a quick question, how large would that JSON be? shovel.sh only indexes known buckets, and the single-line JSON there is already 4.81MB.

@mertd
Copy link

mertd commented Jan 15, 2022

Just a quick question, how large would that JSON be? shovel.sh only indexes known buckets, and the single-line JSON there is already 4.81MB.

The shovel.sh JSON was so large because I didn't filter out keys from the manifest files that are not needed for searching (mertd/shovel-data#9). After filtering, the file is down to 1.16MB.

I believe there are benefits to each of the approaches chosen by @gpailler, @rasa and I respectively. For shovel.sh, I went with an approach targeting simplicity and least cost: Generate the index as part of a scheduled GitHub pipeline and host it and the web app statically. Of course this has drawbacks too; executing the actual search on a back end will make search performance less dependent on the client.

@rashil2000
Copy link
Member

I see.

@gpailler
Copy link

Just a quick question, how large would that JSON be? shovel.sh only indexes known buckets, and the single-line JSON there is already 4.81MB.

I checked this morning and I ended up with a JSON file of 9.95MB for ~15,300 documents in the index. The reason for this "small" size is that the manifests are parsed and only the relevant information are added to the index.

For example, only the following content is stored in the index for the 7-zip manifest

{
   "Id":"adde431fdac84b7bbf54205c3ef58594fef42a5d",
   "Name":"7zip",
   "NameSortable":"7zip",
   "NamePartial":"7zip",
   "NameSuffix":"7zip",
   "Description":"A multi-format file archiver with high compression ratios",
   "Homepage":"https://www.7-zip.org/",
   "License":"Freeware,LGPL-2.0-only,BSD-3-Clause",
   "Version":"21.07",
   "Metadata":{
      "Repository":"https://github.com/ScoopInstaller/Main",
      "OfficialRepository":true,
      "OfficialRepositoryNumber":1,
      "RepositoryStars":850,
      "BranchName":"master",
      "FilePath":"bucket/7zip.json",
      "AuthorName":"github-actions[bot]",
      "AuthorMail":"41898282\u002Bgithub-actions[bot]@users.noreply.github.com",
      "Committed":"2021-12-27T12:30:02Z",
      "Sha":"bcaca41c8cb6ca07841d4bacd722986c1e894609"
   }
}

As @mertd said, parsing the manifests adds some complexity but it was required with my approach as I had to populate the Azure Search index properly and keep the index size under control (the Free Tier limit is 50MB).

@rashil2000
Copy link
Member

a JSON file of 9.95MB for ~15,300 documents in the index

That seems pretty reasonable to me, thanks for the info!

@JanPokorny
Copy link
Author

Integrating CLI search in this seems like a difficult problem when you think about it. The CLI might have non-public buckets added (thus need local indexing). Also if the database is statically hosted and queried on the client, re-downloading the database from the backend crawler every time it is outdated (which will happen often) might be slower than the current scoop search implementation. Potentially a backend like the Azure Search could be used to speed up queries from the CLI (and most importantly, query not-added buckets), but a hybrid approach incorporating any local non-public non-indexed buckets would be needed either way.

@rashil2000
Copy link
Member

The CLI might have non-public buckets added (thus need local indexing).
Potentially a backend like the Azure Search could be used to speed up queries from the CLI (and most importantly, query not-added buckets), but a hybrid approach incorporating any local non-public non-indexed buckets would be needed either way.

I feel we shouldn't need to worry about local indexing as of now, as scoop search already handles it.

The CLI @rasa is talking about would probably be a separate search tool with a non-PowerShell implementation.

Also if the database is statically hosted and queried on the client, re-downloading the database from the backend crawler every time it is outdated (which will happen often) might be slower than the current scoop search.

We could set a time interval for this, like 4 hours (or maybe a day). Quite a few tools (like tealdeer) do this. It doesn't really add much delay.

For instance, the little search tool I mentioned in #4627 (comment) downloads the DB file if it's older than a day, and the 7-8MB file does not take more than a couple of seconds. A one time updation like this can be tolerated IMO.

@rasa
Copy link
Member

rasa commented Jan 17, 2022

That seems to be how winget works, it downloads the database whenever it senses it's out of date.

We should compress the .json blob, and/or sqlite .db, to speed downloading.

@rashil2000
Copy link
Member

@rasa Should we start a poll, to vote on? I'll tag in the active maintainers and some recent active contributors.

@rasa
Copy link
Member

rasa commented Jan 22, 2022

A poll is a good idea, but I'm not sure how to structure it. My thought is that there are really three (or four) parts of our search functionality:

  1. Back-end crawler
  2. Web front-end
  3. CLI front-end
  4. GUI front-end
    And each could be independent. Perhaps a poll for each? And maybe have a option to develop something from scratch, such as implementing a back-end crawler in rust, go, or powershell? Or am I overthinking?

@rashil2000
Copy link
Member

This issue concerns only the "web" part, i.e. Back-end crawler and Web front-end, both of which already exist (in some form) as scoop-directory, shovel.sh and ScoopSearch. So I was thinking of a poll between these three.

I don't think many people have tried making a GUI for Scoop (- A command line installer). Nevertheless, this is separate from the website component and is being tracked here - #4660

Similarly, for CLI utilities, we can track a separate issue (given that there are already 2 good options - https://github.com/shilangyu/scoop-search and https://github.com/tokiedokie/scoop-search - which can be extended to search the website's JSON too.)

@rashil2000
Copy link
Member

rashil2000 commented Feb 1, 2022

I have created a poll. Please vote!

The outcome of the poll will undergo the rough procedure described in #4627 (comment) to get integrated into Scoop upstream.



I am tagging some recent/frequent contributors/maintainers (in no particular order). Your feedback is valuable!

@ScoopInstaller/maintainers

@tech189 @littleli @igitur @hu3rror @Erisa @Lutra-Fs @segevfiner @LazyGeniusMan @Slach @jcwillox @phanirithvij @AntonOks @RavenMacDaddy @sitiom @wenmin92 @TheRandomLabs @amreus

(I just went through the recent merged PRs and picked these names. If your name isn't there, that doesn't mean you can't vote!)

@rashil2000
Copy link
Member

rashil2000 commented Feb 1, 2022

You can also comment below to tell why you chose an option. I'll start.

ScoopSearch, because:

  • Searches practically all manifests on GitHub (close to 20k). With an optional (and very useful!) toggle to filter official manifests.
  • Sort results by match, name, date
  • Search by bucket
  • Includes some helpful info/links in each result - license, last updated date, last committer, bucket, popularity etc.
  • Directly copy code to add bucket or install an app
  • List all community buckets

@rashil2000 rashil2000 pinned this issue Feb 1, 2022
@rasa
Copy link
Member

rasa commented Feb 1, 2022

Based on preliminary results, it looks like we're going with ScoopSearch. That sounds great! Let me know how I can support ScoopSearch moving forward. I will keep scoop-directory up for the foreseeable future, but will direct users to use ScoopSearch as the "official" search engine. Thanks to everyone for voting, and your past support!

@JanPokorny

This comment was marked as resolved.

@rashil2000

This comment was marked as resolved.

@JanPokorny

This comment was marked as resolved.

@rashil2000
Copy link
Member

The brand new website for Scoop is up 🎉🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants