You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@steve8708 Questioning interest, I have made a big refactoring of the codebase for integrating thoses features :
excludeSelectors : Remove elements that you don't want in the output data
Cleaner output : Remove some
Refactoring of the full code
Concurrency
Multiple config
Config parsing now set default if not defined.
ProgressBar logging
Sub Routing namings
output now generated in it's own folder
change output.json to output/data.json
Fix .gitignore
added Prettier in the project. (Wouldn't mind to revert that if not wanted)
Things that would be required to fully "complete" the PR:
CLI full support
Terminal logs fixed. (Mostly INFO and ERROR logs from PlaywrightCrawler)
My needs:
I wanted to create a knowledge base for godot, but wanted to separate each section into their own files. I manage to do it with multiple config. But that being done and I have the output I needed, I am not interested in fixing the logging part. Useful when I saw some error from a bad error, but not that helpful imo.
Current state
So the current changes are big and 90% finish. Nonetheless, I think they are an improvement, just not a "fully stable" and completed improvement... Everythings that was added is very functionnal, but I still have issues with the output of the terminal. If the lines get wrapped, the output get ugly. Nx has a similar issue with their run-many CLI, so I don't know if it's vscode, the terminal or the lib... I'm just not interested in completing the feature.
I made this multi progress bar because with concurrent crawling, the log was hard to follow. With this, it's easier to follow, but when logging things happen like error, info and other in the mean times, it's a mess...
The issue :
When this "type" of line appear from PlaywrightCrawler, it break the multi progressbar :
INFO Statistics: null request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":4525,"requestsFinishedPerMinute":12,"requestsFailedPerMinute":0,"requestTotalDurationMillis":113118,"requestsTotal":25,"crawlerRuntimeMillis":120511,"retryHistogram":[25]}
The multi progressbar display get bugged. I do not understand enough terminal and playwright to know exactly what to change to fix this.
Why Asking ?
I have no interest in fixing the terminal as I got what I wanted, but the whole changes is a improvement and I was asking if I could make a PR and let someone else fix the issue in the PR and push it ? I guess the concurrent part could be omitted and that would "make the PR completed".
Other changes that I can omit if not wanted.
I use a "modern" prettier config, my editor will format using my config if none existe in the repo I work on. I have setup prettier as I was already changing formatting when I saving, but I'm ok with reverting this. But I could also push it if thecopied some files that would configure that as I wasn't planning to make big change, but I'm willing to remove that too if not interested.
Here's some visual preview :
Won't push the config changes tho. (Maybe only the typing)
The text was updated successfully, but these errors were encountered:
@steve8708 Questioning interest, I have made a big refactoring of the codebase for integrating thoses features :
output.json
tooutput/data.json
Things that would be required to fully "complete" the PR:
My needs:
I wanted to create a knowledge base for godot, but wanted to separate each section into their own files. I manage to do it with multiple config. But that being done and I have the output I needed, I am not interested in fixing the logging part. Useful when I saw some error from a bad error, but not that helpful imo.
Current state
So the current changes are big and 90% finish. Nonetheless, I think they are an improvement, just not a "fully stable" and completed improvement... Everythings that was added is very functionnal, but I still have issues with the output of the terminal. If the lines get wrapped, the output get ugly. Nx has a similar issue with their run-many CLI, so I don't know if it's vscode, the terminal or the lib... I'm just not interested in completing the feature.
I made this multi progress bar because with concurrent crawling, the log was hard to follow. With this, it's easier to follow, but when logging things happen like error, info and other in the mean times, it's a mess...
The issue :
When this "type" of line appear from PlaywrightCrawler, it break the multi progressbar :
The multi progressbar display get bugged. I do not understand enough terminal and playwright to know exactly what to change to fix this.
Why Asking ?
I have no interest in fixing the terminal as I got what I wanted, but the whole changes is a improvement and I was asking if I could make a PR and let someone else fix the issue in the PR and push it ? I guess the concurrent part could be omitted and that would "make the PR completed".
Other changes that I can omit if not wanted.
I use a "modern" prettier config, my editor will format using my config if none existe in the repo I work on. I have setup prettier as I was already changing formatting when I saving, but I'm ok with reverting this. But I could also push it if thecopied some files that would configure that as I wasn't planning to make big change, but I'm willing to remove that too if not interested.
Here's some visual preview :
The text was updated successfully, but these errors were encountered: