GitHub - ramonclaudio/HTTParser: Python library for parsing web content over HTTP, with optional JavaScript rendering via Selenium.
Every time I needed to scrape or crawl a new site I'd end up writing the same boilerplate: requests for static pages, Selenium for JavaScript-heavy ones, BeautifulSoup for parsing. I made this so I don't have to keep redoing that every time. Import it, pass a format flag, done.
Python library for parsing web content over HTTP, with optional JavaScript rendering via Selenium.
Install
git clone https://github.com/ramonclaudio/HTTParser.git
cd HTTParser
pip install -r requirements.txtOptional for JavaScript rendering:
Usage
HTML
from httparser import HTTParser r = HTTParser(url="https://httpbin.org/html", method="get", response_format="html") print(r.response())
JSON
from httparser import HTTParser # GET r = HTTParser(url="https://httpbin.org/json", method="get", response_format="json") # POST r = HTTParser( url="https://httpbin.org/anything", method="post", response_format="json", payload={"key": "value"}, ) print(r.response())
JavaScript (dynamic)
from httparser import HTTParser r = HTTParser( url="https://httpbin.org/delay/3", method="get", response_format="js", browser_path="/path/to/browser", chromedriver_path="/path/to/chromedriver", ) print(r.response())
Parameters
| Parameter | Required | Format |
|---|---|---|
url |
yes | string |
method |
yes | "get" or "post" |
response_format |
yes | "html", "json", or "js" |
headers |
no | {"header": "value"} |
params |
no | {"param": "value"} |
payload |
no | {"key": "value"} (POST only) |
browser_path |
no | path to any Chromium-based browser binary (js only) |
chromedriver_path |
no | path to ChromeDriver (js only) |
JavaScript rendering setup
Download ChromeDriver at https://chromedriver.chromium.org/downloads matching your browser version. Works with Chrome, Chromium, Edge, Brave, Arc, Dia, Vivaldi, Opera, Helium, and other Chromium-based browsers.
Errors
Errors log to Error.log.
License
MIT