Skip to content

flychen59/chinaxivCrawler_mnbvc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chinaxiv全量爬虫

一个简单的脚本实现chinaxiv网站全量论文数据爬虫。 爬虫规则为先按类别爬取,后根据时间遍历所有文章下载链接。并将下载链接保存至pdf_links文件夹中。

启动命令

python chinaixv_crawl.py

输出格式

{
    "link": ["..."],    //下载链接
    "title": "xxx",     //论文标题
    "author": "xxx"     //作者信息
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages