This tool allows to extract bibliographical metadata from PDF files using GROBID and to store it in BibSonomy. The tool was developed during the DESIR Code Sprint in Berlin (31.7.-2.8.2018).
There is an online version of this tool available here: http://track-b.desir.dariah.eu/
The tool consists of a Java-based backend (server) and a Node.js-based frontend (client).
# install dependencies
npm install
# serve dev version with hot reload at localhost:8080
npm run dev
# build for production with minification
npm run build
# build for production and view the bundle analyzer report
npm run build --report
First install asdf, see installation
# install asdf plugin for nodejs
asdf plugin-add asdf-vm/asdf-nodejs
# build for production with minification
npm run build
GROBID can be used with a local installation or using the REST-based web api.
For a local installation the GROBID model files must be downloaded
(e.g.,
https://dl.bintray.com/rookies/maven/org/grobid/grobid-home/0.5.5/grobid-home-0.5.5.zip)
and placed into an appropriate folder which is configured via the
option grobid.home.path
in application.properties
.
Copy the file install-files/application.properties
into your
application root and set the correct paths and keys:
grobid.home.path=/Users/YourUserName/Work/Grobid/grobid-home/
To start the application use
mvn spring-boot:run
Or (if you want to use your local installation of GROBID):
mvn -Dspring.config.location=file:/....../DESIR-CodeSprint/trackB/backend/application.properties spring-boot:run
where you replace
file:/....../DESIR-CodeSprint/trackB/backend/application.properties
with the path to your local configuration file.
Make a copy of the configuration template install-files/trackB.conf
and add it to the install folder. This is in order to let the
init.d
script use extra property files for your server.
In the build folder:
mvn clean package
Or (if you want to use your local installation of GROBID):
mvn -Dspring.config.location=file:/....../DESIR-CodeSprint/trackB/backend/application.properties clean package
Copy the executable (.jar
) to the installation folder.
Create a symbolic link (ln -s
) from /opt/trackB/trackB.jar
to
/etc/init.d/trackB
to be able to launch the tool as a service
(usable for CentOS 6.x servers for example).
Set the owner of the files (for simplicity, I use the same user as for apache2 on Ubuntu):
sudo chown -R www-data:www-data /opt/trackB
Make a copy of the configuration file install-files/trackB.service
to /etc/systemd/system/
. Of course, change
the path to the jar file and the correct user to launch the command.
service trackB start
service httpd restart
The server should now listen on the port 8080 by default:
Here is an example of a conf file for Apache httpd using SSL and redirection from the port 443 (SSL) to our application running on port 8080. The port 80 is also redirected to 443 and therefore to 8443 when used. (Example using a server: trackB.dariah.eu)
NameVirtualHost *:80
NameVirtualHost *:443
<VirtualHost *:443>
SSLEngine on
SSLProxyEngine On
SSLCertificateFile /etc/letsencrypt/live/trackB.dariah.eu/cert.pem
SSLCertificateKeyFile /etc/letsencrypt/live/trackB.dariah.eu/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/trackB.dariah.eu/chain.pem
ServerName https://trackB.dariah.eu/
Redirect / https://trackB.dariah.eu/trackB/
ProxyPass /trackB/ http://localhost:8080/trackB/
ProxyPassReverse /trackB/ http://localhost:8080/trackB/
</VirtualHost>
<VirtualHost *:80>
ServerName http://trackB.dariah.eu/
DocumentRoot /var/www/
ErrorLog /var/log/httpd/trackB_error_log
CustomLog /var/log/httpd/trackB_access_log combined
Redirect / https://trackB.dariah.eu/
</VirtualHost>
BibSonomy is a social bookmarking system that helps you to organize your scientific work. Use BibSonomy to collect publications and bookmarks, to collaborate with your colleagues, and to discover interesting researches for your daily work.
You can get your BibSonomy API key from the settings page. Do not put your API key into a public repository.
GROBID is a machine learning library for extracting, parsing and re-structuring raw documents, such as PDF documents, into structured TEI-encoded ones.
Contributions are welcome! Just fork and send your pull requests.
Created at the DESIR CodeSprint by yoannspace, rjoberon, ChristophHubeL3S, ctot-nondef, and schmima. See contributors.
This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details.