Site Downloader?

timjpriebe's picture

He has: 2,667 posts

Joined: Dec 2004

I recently had a client who had lost contact with the person who previously maintained their site. Unfortunately, they had no passwords or anything, and needed to switch hosting to be able to update the site.

The site was fairly small, so I just went through and downloaded each page through my browser. The images and such were placed in directories like "index_files", so I had to move all of those into folders like "images" and change the hrefs and such.

Anyway, before doing that, I went online and saw that there were a bunch of programs to download entire sites. But the first couple I tried didn't quite fit the bill. This was a small site, so I did it by hand to save time, but if I ever have to do this with a larger site, I'll be way better off using one of those programs.

Does anyone know of a good site downloader? I'd prefer that it have the capabilities of downloading either all or just specified file types, saving relative locations, and limiting downloads to one or more specified domains. And free would be nice, though I'd probably be willing to pay as much as $50 or so if I had to.

karmaman's picture

He has: 82 posts

Joined: Nov 2003

Hi Tim I think httrack will do what you want it can be found at httrack.com hope this helps.

Busy's picture

He has: 6,151 posts

Joined: May 2001

Is Go!Zilla still around
Most of them are free, here is a list of them that I block, they only the UA but could help you find one

almaden
Anarchie
ASPSeek
attach
BackWeb
Bandit
BatchFTP
Bot\ mailto:craftbot@yahoo.com
BlackWidow
Buddy
CherryPicker
ChinaClaw
Collector
Copier
CICC
Crescent
Custo
DA
DISCo\ Pump
Download\ Demon
Download\ Wonder
Downloader
Drip
DSurf15a
dts\ agent$
EasyDL
eCatch
EirGrabber
EmailSiphon
EmailWolf
EyeNetIE
Express\ WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
frontpage
^GetRight
GetSmart
GetWeb!
gigabaz
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
ia_archiver
Image\ Stripper
Image\ Sucker
Indy\ Library
InterGET
Internet\ Ninja
Iria
Irvine
Java
JetCar
JOC
JustView
^larbin
LeechFTP
LexiBot
lftp
linkwalker
likse
marcopolo
Magnet
Mag-Net
Mass\ Downloader
Memo
MIDown\ tool
Mirror
Mister\ PiX
MJ12bot
Moozilla
MS\ FrontPage
MSIECrawler
MSProxy
NaverRobot
^NaverBot
Navroad
NearSite
NetAnts
NetSpider
Net\ Vampire
Netwu
NetZip
Ninja
NICErsPRO
NPBot
obot
Octopus
Offline\ Explorer
Offline\ Navigator
PageGrabber
Papa\ Foto
pavuk
pcBrowser
Pockey
Program\ Shareware
^psbot
Pump
QuepasaCreep
RealDownload
Reaper
Recorder
ReGet
Siphon
SiteSnagger
SlySearch
SmartDownload
Snake
SpaceBison
^Steeler
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
SlySearch
tAkeOut
Teleport\ Pro
^TurnitinBot
Tutorial\ Crawler
Vacuum
VoidEYE
WebCapture
WebCopier
Webster
^WebSauger
Web(\ Image|\ Sucker|Auto|bandit|Fetch|site|ZIP|.*er)
Wget
Whacker
Widow
WISEnutbot
Xaldon
Zao
zyborg

is cut down from htaccess so ignore the slashes, ^ and $ etc

They have: 34 posts

Joined: May 2006

Busy if anybody decide to download your site - htaccess will not help you.

Busy's picture

He has: 6,151 posts

Joined: May 2001

Lord Maverick wrote: Busy if anybody decide to download your site - htaccess will not help you.

It's like trying to stop a car theif, you can have lock nuts on your wheels, an alarm, even a pit bull but if they want it they are going to take it, can just slow them down some.
The .htaccess does stop a lot of kiddy nappers that can only download programs to do it.

Greg K's picture

He has: 2,115 posts

Joined: Nov 2003

I Use GetRight myself, and even though it is in Busy's list, they have an option under advanced to change the User Agent. (also has options to automicatlly figure out the referer being sent as well for following links).

-Greg

This space intentionally left blank...

02bunced's picture

He has: 412 posts

Joined: May 2005

Wget is nice and powerful (as you have a mac, you can install the command line version). Then to use it, just switch your command line to the directory you want to download the files from and use the following format:

wget http://www.domain.com/ -r

Serfaksan's picture

They have: 18 posts

Joined: Jun 2006

wow, I never though about download a site, but, now that you mention this and looking to all the options I think that I'm going to try it on some pages XD

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.