An API to render a page inside a real Chromium (with JavaScript enabled) and send back the raw HTML.
This project is directly written for and consumed by Algolia Crawler.
๐ Secure
Leverages Context
to isolate each page, prevent cookie sharing, control redirection, etc...
๐ Performant:
Ignores unnecessary resources for rendering HTML (e.g. images
, video
, font
, etc...) and bundle an AdBlocker by default.
๐ค Automated: Renderscript has everything abstracted to render a page and login to website with minimal configuration required.
yarn dev
Goto: http://localhost:3000
docker build . -t algolia/renderscript
docker run -p 3000:3000 -it algolia/renderscript
curl -X POST http://localhost:3000/render \
-H 'Content-Type: application/json' \
-d '{"url": "https://www.algolia.com/", "ua": "local_renderscript"}'
Main endpoint. Renders the page and dumps a JSON with all the page information.
{
/**
* URL to render (for hash and query params support, use `encodeURIComponent` on it)
*/
url: string;
/**
* User-Agent to use.
*/
ua: string;
/**
* Enables AdBlocker
*/
adblock?: boolean;
/**
* Define the range of time.
* Minimum and maximum execution time.
*/
waitTime?: {
min?: number;
max?: number;
};
/**
* Headers to Forward on navigation
*/
headersToForward?: {
[s: string]: string;
};
}
{
/**
* HTTP Code of the rendered page.
*/
statusCode: number | null;
/**
* HTTP Headers of the rendered page.
*/
headers: Record<string, string>;
/**
* Body of the rendered page.
*/
body: string | null;
/**
* Metrics from different taks during the rendering.
*/
metrics: Metrics;
/**
* The redirection renderscript caught.
*/
resolvedUrl: string | null;
/**
* Has the page reached timeout?
* When timeout has been reached we continue the rendering as usual
* but reduce other timeout to a minimum.
*/
timeout: boolean;
/**
* Any error encountered along the way.
* If this field is filled that means the rest of the payload is partial.
*/
error: string | null;
}
Used for debug purposes. Dumps directly the HTML for easy inspection in your browser.
see
POST /render
parameters
CSP headers are set to prevent script execution on the rendered page.
This endpoint will load a given login page, look for input
fields, enter the given credentials and validate the form.
It allows retrieving programmatically a session-cookie from websites with CSRF protection.
{
/**
* URL to render (for hash and query params support, use `encodeURIComponent` on it)
*/
url: string;
/**
* User-Agent to use.
*/
ua: string;
/**
* Username to enter on the login form. Renderscript expects to find an `input[type=text]` or `input[type=email]` on the login page.
*/
username: string;
/**
* Password to enter on the login form. Renderscript expects to find an `input[type=password]` on the login page.
*/
password: string;
/**
* Define the range of time.
* Minimum and maximum execution time.
*/
waitTime?: {
min?: number;
max?: number;
};
/**
* Boolean (optional).
* If set to true, Renderscript will return the rendered HTML after the login request. Useful to debug visually.
*/
renderHTML?: boolean;
}
{
/**
* HTTP Code of the rendered page.
*/
statusCode: number | null;
/**
* HTTP Headers of the rendered page.
*/
headers: Record<string, string>;
/**
* Metrics from different taks during the rendering.
*/
metrics: Metrics;
/**
* Has the page reached timeout?
* When timeout has been reached we continue the rendering as usual
* but reduce other timeout to a minimum.
*/
timeout: boolean;
/**
* Any error encountered along the way.
* If this field is filled that means the rest of the payload is partial.
*/
error: string | null;
/**
* Cookie generated from a succesful login.
*/
cookies: Cookie[];
/**
* The URL at the end of a succesful login.
*/
resolvedUrl: string | null;
/**
* Body at the end of a succesful login.
*/
body: string | null;
}
If renderHTML: true
, returns text/html
.
CSP headers are set to prevent script execution on the rendered page.
List currenlty open pages. Useful to debug.
Health Check for Kubernetes and others.
This project was heavily inspired by GoogleChrome/rendertron
.
It was based on puppeteer-core
but we switched to Playwright.