feat: Add spiderqueue configuration option #476

jpmckinney · 2023-03-08T16:39:29Z

closes #197

See #475 for updating the default spider queue.

codecov · 2023-03-08T16:41:30Z

Codecov Report

Merging #476 (040cc67) into master (68cb43c) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #476      +/-   ##
==========================================
- Coverage   87.31%   87.30%   -0.01%     
==========================================
  Files          41       41              
  Lines        1876     1875       -1     
==========================================
- Hits         1638     1637       -1     
  Misses        238      238

Flag	Coverage Δ
unittests	`87.30% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
scrapyd/jobstorage.py	`100.00% <100.00%> (ø)`
scrapyd/spiderqueue.py	`95.45% <100.00%> (+0.21%)`	⬆️
scrapyd/tests/test_spiderqueue.py	`100.00% <100.00%> (ø)`
scrapyd/utils.py	`89.25% <100.00%> (+0.08%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

pspsdev · 2023-03-09T15:47:27Z

Probably not the correct solution, as different types of queues might need more arguments like redis_host, redis_port etc not just the DB path. I'm wondering what would be the quickest and simplest solution to quickly improve scrapyd performance for those who run high polling rates. Maybe add support :memory: as dbpath so sqlite runs in memory? Currently the code makes it always a file based solution but scrapyd/spiderqueue.py defaults to :memory: correctly but that can never happen because upstream code always sets a file.

jpmckinney · 2023-03-09T20:33:00Z

Ah, good call, we need to move the dbs_dir parsing to the queue's initialization.

For Redis, PostgreSQL, etc. we can have dbs_dir still be a single string, e.g. a connection string like redis://user:pass@host:port/database or postgresql://user:pass@host:port/database, etc.

Edit: Or, we can pass the full config object to the queues. As-is, it's not out of the question for alternative queues to just read their configuration from the environment, rather than from the config file.

jpmckinney · 2023-03-09T20:47:39Z

Currently the code makes it always a file based solution but scrapyd/spiderqueue.py defaults to :memory: correctly but that can never happen because upstream code always sets a file.

I would say that its use of :memory: is incorrect, as it will cause all projects to use the same DB, which is not the intent. The connection string would need to be something like file:project1?mode=memory https://www.sqlite.org/inmemorydb.html

Edit: Nevermind, misread some other documentation:

Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.

…o spider queue implementation. Respect :memory: and URL values for dbs_dir.

jpmckinney · 2023-03-09T22:27:38Z

Okay, custom queues should be more extensible with the latest commit.

pspsdev · 2023-03-09T22:48:26Z

@jpmckinney this looks good. What else need to be done to get it merged?

…an scrapyd.sqlite.initialize function (unnecessary support for dbs_dir URLs added in #476)

feat: Add spiderqueue configuration option

86bc624

jpmckinney mentioned this pull request Mar 8, 2023

SQLite queue is using all CPU on high frequency poller (<1s) #475

Closed

jpmckinney mentioned this pull request Mar 8, 2023

configurable spider queue class #201

Closed

docs: Add docs for spiderqueue

da69c30

feat: Initialize SqliteSpiderQueue with config. Defer dbs_dir logic t…

040cc67

…o spider queue implementation. Respect :memory: and URL values for dbs_dir.

jpmckinney merged commit 538357c into master Mar 10, 2023

jpmckinney deleted the 197-spiderqueue branch March 10, 2023 16:39

jpmckinney added a commit that referenced this pull request Jul 23, 2024

chore: Change the scrapyd.utils.sqlite_connection_string function to …

b3b8fd2

…an scrapyd.sqlite.initialize function (unnecessary support for dbs_dir URLs added in #476)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add spiderqueue configuration option #476

feat: Add spiderqueue configuration option #476

jpmckinney commented Mar 8, 2023 •

edited

Loading

codecov bot commented Mar 8, 2023 •

edited

Loading

pspsdev commented Mar 9, 2023

jpmckinney commented Mar 9, 2023 •

edited

Loading

jpmckinney commented Mar 9, 2023 •

edited

Loading

jpmckinney commented Mar 9, 2023

pspsdev commented Mar 9, 2023

feat: Add spiderqueue configuration option #476

feat: Add spiderqueue configuration option #476

Conversation

jpmckinney commented Mar 8, 2023 • edited Loading

codecov bot commented Mar 8, 2023 • edited Loading

Codecov Report

pspsdev commented Mar 9, 2023

jpmckinney commented Mar 9, 2023 • edited Loading

jpmckinney commented Mar 9, 2023 • edited Loading

jpmckinney commented Mar 9, 2023

pspsdev commented Mar 9, 2023

jpmckinney commented Mar 8, 2023 •

edited

Loading

codecov bot commented Mar 8, 2023 •

edited

Loading

jpmckinney commented Mar 9, 2023 •

edited

Loading

jpmckinney commented Mar 9, 2023 •

edited

Loading