New Python HTML Libraries 2026
last commit 2 years ago html5lib/html5lib-python 1K +1
added 1 year ago
Standards-compliant library for parsing and serializing HTML documents and fragments in Python
last commit 7 months ago alir3z4/html2text 2K +5
added 1 year ago
Convert HTML to Markdown-formatted text.
last commit 4 months ago gawel/pyquery 2K +1
added 1 year ago
A jQuery-like library for python.
this project has been archived mozilla/bleach 2K
added 1 year ago
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
last commit 4 months ago buriy/python-readability 2K
added 1 year ago
Given an HTML document, extract and clean up the main body text and title.
last commit 4 days ago lxml/lxml 3K +1
added 1 year ago
lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language
last commit 4 months ago scrapy/parsel 1K +5
added 1 year ago
Parsel lets you extract data from XML/HTML/JSON documents using XPath or CSS selectors.
last commit 3 years ago psf/requests-html 13K -4
added 1 year ago