Extracting article text from HTML documents


Overview: Extracting article text from HTML documents | My tech blog.

In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.

Related Posts