wget
to recursively download a web site
wget(1)
is a great tool to download web cotnent from the command line, and it does support also recursive downloading, that is it can follow links to get more than a single URL content.
One problem with the recursive download is that it does respect the
robots.txt
file, that instruments automated tools (robots) to not download the content nor inspect the content.
Clearly there’s a way to instrument
wget
to be rude and do what we want:
% wget -e robots=off -r '<your-URL>'
The above will turn off the
robots.txt
check, and will download recursively -r
all the content up to a deep level of 5, that can be tuned to your need.