A few days ago, I needed to compile a database of local business emails without spending a lot of time or exertion doing so. Being a linux fan, I ran across the following: combine GNU Wget with Grep’s regular expression (regex) capabilities.
Wget is an open-source tool used for mirroring directories, and likewise, has the ability to download files from most directories that are unrestricted (html, asp.net, php, et cetera).
Step One: Download and install wget for use with linux or windows.
Step Two: Run the following through terminal or a command prompt:
1 | wget -nv -nH -r -A html robots=off --ignore-tags=img,link www.example.com |
Step Three: Run the Grep regex:
1 | grep -Eiorh '([[:alnum:]_.]+@[[:alnum:]_]+?\.[[:alpha:].]{2,6})' ./ > ouput.txt |


