This tool combines various open source tools to give insight into accessibility and performance metrics for a list of URLs. There are several parts that can be understood as such:
- This application requires a least one CSV wth a one column header labeled "Address" and one URL per line (ignores other comma delimited data).
- A crawl can be also be executed (e.g. currently using a licenced version of ScreamingFrogSEO CLI tools https://www.screamingfrog.co.uk/seo-spider/)
- Runs Deque AXE for all URLs and produces both a detailed and summary report (including updating the associated Google Sheet) See: https://pypi.org/project/axe-selenium-python/
- Runs Lighthouse CLI for all URLs and produces both a detailed and summary report (including updating the associated Google Sheet) See: https://github.com/GoogleChrome/lighthouse
- Runs a PDF audit for all PDF URLs and produces both a detailed and summary report (including updating the associated Google Sheet)
Get get started, follow the installation instructions below. Once complete:
- Start the virtual environment ( python -m venv venv && source venv/bin/activate )
- Run
start app.py
orpython app.py
. - Navigate to http://127.0.0.1:8888/reports/ or http://localhost/reports/ where the sample "DRUPAL" report will be visible.
- View the report by clicking on the report address or providing the link as such http://localhost/reports/?id=DRUPAL
- Here is a link to the sample data Google Sheet report: DRUPAL Google Sheet
NOTE: At the moment, no database is used due to an initial interest in CSV DATA ONLY. The system creates one folder for each as follows (under /REPORTS/your_report_name):
- /AXE (used to store AXE data)
- /CSV (CSVs to analyse; PDF CSV requests are appended with with a PDF qualifier)
- /LIGHTHOUSE (used to store Lighthouse data)
- /logs (tracks progress and requests)
- /SPIDER (used to store crawl data)
At this point, a database would make more sense and adding a function to "Export to CSV", etc.
As mentioned, simply provide a CSV with a list of URLs (column header = "Address") and select the tests to run through the web form.
The application is configured through environment variables. On startup, the application
will also read environment variables from a .env
file.
- HOST (defaults to 127.0.0.1)
- PORT (defaults to 8888)
- SECRET_KEY (no default, used to sign the Flask session cookie. Use a cryptographically strong sequence of characters, like you might use for a good password.)
- ALLOWED_EXTENSIONS (defaults to "csv", comma separated list)
To get all tests running, the following steps are required:
sudo apt update
sudo apt install git
sudo apt-get install python3-pip
sudo apt-get install python3-venv
sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get install python3.6
git clone https://github.com/soliagha-oc/perception.git
sudo python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py
Browse to http://127.0.0.1:8888/ (or alternatively to port 5000 if you didn't set 8888 in the .env file)
Install the following CLI tools for your operating system:
-
Download and install the matching/required
chromedriver
-
Download latest version from official website and upzip it (here for instance, verson 2.29 to ~/Downloads)
wget https://chromedriver.storage.googleapis.com/2.29/chromedriver_linux64.zip
-
Move to /usr/local/share (or any folder) and make it executable
sudo mv -f ~/Downloads/chromedriver /usr/local/share/
sudo chmod +x /usr/local/share/chromedriver
-
Create symbolic links
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
OR
export PATH=$PATH:/path-to-extracted-file/
OR
add to
.bashrc
-
Go to the geckodriver releases page. Find the latest version of the driver for your platform and download it. For example: https://github.com/mozilla/geckodriver/releases
-
Extract the file with:
tar -xvzf geckodriver*
-
Make it executable:
chmod +x geckodriver
-
Add the driver to your PATH so other tools can find it:
export PATH=$PATH:/path-to-extracted-file/
OR
add to
.bashrc
-
Install node
https://nodejs.org/en/download/
curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -
sudo apt-get install -y nodejs
-
Install npm
npm install npm@latest -g
sudo npm install npm@latest -g
-
Install lighthouse
npm install -g lighthouse
sudo npm install -g lighthouse
https://www.xpdfreader.com/download.html
To install this binary package:
-
Copy the executables (pdfimages, xpdf, pdftotext, etc.) to to /usr/local/bin.
-
Copy the man pages (*.1 and *.5) to /usr/local/man/man1 and /usr/local/man/man5.
-
Copy the sample-xpdfrc file to /usr/local/etc/xpdfrc. You'll probably want to edit its contents (as distributed, everything is commented out) -- see xpdfrc(5) for details.
See this "Quick Start" guide to enable the Drive API: https://developers.google.com/drive/api/v3/quickstart/python
Complete the steps described in the rest of this page to create a simple Python command-line application that makes requests to the Drive API.
See: https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#commandlineoptions
ScreamingFrog SEO CLI tools provide the following data sets (required listed is bold): - crawl_overview.csv (used to create report DASHBOARD)
- external_all.csv - external_html.csv (used to audit external URLs) - external_pdf.csv (used to audit external PDFs)
- h1_all.csv
- images_missing_alt_text.csv
- internal_all.csv
- internal_flash.csv - internal_html.csv (used to audit internal URLs)
- internal_other.csv - internal_pdf.csv (used to audit internal PDFs)
- internal_unknown.csv
- page_titles_all.csv
- page_titles_duplicate.csv
- page_titles_missing.csv
Note: There are spider config files located in the /conf folder. You will require a licence to alter the configurations.
Note: If a licence is not available, simply provide a CSV where at least one column has the header "address". See DRUPAL example.
Installed via pip install -r .\requirements.txt
See: https://pypi.org/project/axe-selenium-python/ and https://github.com/dequelabs/axe-core
Lighthouse is an open-source, automated tool for improving the performance, quality, and correctness of your web apps.
When auditing a page, Lighthouse runs a barrage of tests against the page, and then generates a report on how well the page did. From here you can use the failing tests as indicators on what you can do to improve your app.
-
Quick-start guide on using Lighthouse: https://developers.google.com/web/tools/lighthouse/
-
View and share reports online: https://googlechrome.github.io/lighthouse/viewer/
-
Github source and details: https://github.com/GoogleChrome/lighthouse
While there is a /reports/ dashboard, the system is enabled to write to a Google Sheets. To do this, set up credentials for Google API authentication here: https://console.developers.google.com/apis/credentials to get a valid "credentials.json" file.
To facilitate branding and other report metrics, a "non-coder/sheet formula template" is used. Here is a sample template. When a report is run from the /reports/ route, the template is loaded (template report and folder ID found in globals.py and need to be setup/updated once), and the Google Sheet is either created or updated (unique report ID auto generated and found in /REPORTS/your_report_name/logs/_gdrive_logs.txt).
If you have a Screaming Frog SEO Spider licence be sure to add it to your PATH. Even if Screaming Frog SEO Spider is not installed, a CSV can be provided to guide the report tools. Once installed, try to run the sample CSV. To do this:
- Visit http://127.0.0.1:8888/
- Enter a report name and email. Leave URL blank.
- Click on "Choose File" under "Spider SEO Reports" to upload a file with a list of URLs, column header = 'address'.
- Select the tests you wish to run.
NOTE: This would exclude PDFs which require a list of exclusively PDF URLs.
- As these tst can take a while to run, please check back at the http://127.0.0.1:8888/reports/ page for progress.
Running a sample can be accomplished two ways, using the samples provided in the "/REPORTS/DRUPAL/" folder or by downloading and installing Screaming Frog SEO Spider and running a free crawl (500 URL limit and no configuration/CLI tool access). Once the crawl is completed or file created, create/save the following CSVs:
- crawl_overview.csv (via "Reports >> Crawl Overview" in the ScreamingFrog menu) - used to create Report Overview. Without this CSV, the Report Overview will be missing (working on calculating the results to eliminate this report)
- internal_html.csv (via "Export" button in the ScreamingFrog interface) - used to point the reporting tools to the desired URLs
- internal_pdf.csv (via "Export" button in the ScreamingFrog interface) - used to point the reporting tools to the desired URLs
- external_html.csv (via "Export" button in the ScreamingFrog interface) - used to point the reporting tools to the desired URLs
- external_pdf.csv (via "Export" button in the ScreamingFrog interface) - used to point the reporting tools to the desired URLs
If another method is used to crawl a base URL, be sure to include the results in a CSV file where at least one header (first row) reads "Address", provide one or more web or PDF URLs, and ensure that the filename(s) is the same as the one listed above and in "/REPORTS/your_report_name/SPIDER/" folder. At least one *_html.csv file is required and to be in the appropriate folder.
It is possible when crawling and scanning sites to encounter various security risks. Please be sure to have a virus scanner enabled to protect against JavaScript and other attacks or disable JavaScript in the configuration.