Skip to content

Commit

Permalink
Add entries in documentation how to scrape HTML
Browse files Browse the repository at this point in the history
  • Loading branch information
NielsSteensma committed Apr 22, 2024
1 parent 0c16bfc commit 9d04ed7
Showing 1 changed file with 16 additions and 10 deletions.
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@


## Features
* Generate PDFs from pages
* Generate PDFs from html ( external images/stylesheets supported )
* Capture a screenshot of a webpage
* Generate PDFs from webpages
* Generate PDFs from HTML ( external images/stylesheets supported )
* Capture screenshots from webpages
* Scrape HTML from webpages



Expand All @@ -26,37 +27,42 @@ Install puppeteer in your application's root directory:

<sub>Dhalang and Puppeteer require Node ≥ 18 and Puppeteer ≥ 22</sub>
## Usage
__Get a PDF of a website url__
__PDF of a website url__
```ruby
Dhalang::PDF.get_from_url("https://www.google.com")
```
It is important to pass the complete url, leaving out https://, http:// or www. will result in an error.

__Get a PDF of a HTML string__
__PDF of a HTML string__
```ruby
Dhalang::PDF.get_from_html("<html><head></head><body><h1>examplestring</h1></body></html>")
```

__Get a PNG screenshot of a website__
__PNG screenshot of a website__
```ruby
Dhalang::Screenshot.get_from_url("https://www.google.com", :png)
```

__Get a JPEG screenshot of a website__
__JPEG screenshot of a website__
```ruby
Dhalang::Screenshot.get_from_url("https://www.google.com", :jpeg)
```

__Get a WEBP screenshot of a website__
__WEBP screenshot of a website__
```ruby
Dhalang::Screenshot.get_from_url("https://www.google.com", :webp)
```

All methods return a string containing the PDF or JPEG/PNG/WEBP in binary.
__HTML of a website__
```ruby
Dhalang::Scraper.html("https://www.google.com")
```

Above methods either return a string containing the PDF/JPEG/PNG/WEBP in binary or the scraped HTML.



## Custom PDF/screenshot options
## Custom options
To override the default options that are set by Dhalang you can pass as last argument a hash with the custom options you want to set.

For example to set custom margins for PDFs:
Expand Down

0 comments on commit 9d04ed7

Please sign in to comment.