Extracting useful stuff from a SnipSnap wiki
The MSL lab at CMU (where I work) uses wiki software called SnipSnap. SnipSnap is Java-based, apparently standalone (you don't need a webserver), and as far as I can tell stores all its files in a database somewhere. We had a scare this spring where the wikis went down, and we didn't know how to get them back up. Thankfully, the guy who installed them was still around, and he fixed things, but the experience cued me to maybe back up our important files in some format that didn't require a working copy of SnipSnap to use.
SnipSnap allows you to back up the wiki to XML, but for some reason this wasn't working on our site. I could've gone through each page, right-clicked each attachment, and copy-pasted all the page text into text files (or even lazier: just saved all the html somewhere), but this didn't sound like fun. Instead, I wrote a python script to do all of that for me. It converts a SnipSnap wiki into text files and comments and attachments in a file system hierarchy format, that anyone with a computer can look at.
snipsnap_extract.py
uses the excellent Beautiful
Soup for screenscraping, so you'll need to download that and put
it in your path. You give the script the web address of the site, and
a file with the names of all the pages you'd like to download, and it
sucks them all down, maintaining the page hierarchy that's in the
wiki. For each page, you get a directory containing all the
attachments to that page, as well as the files
snipsnap_$pagename.txt
with the body text and
snipsnap_comments.txt
with the comments listed by poster
and date.
Usage: $ ./snipsnap_extract.py {-p,--prefix} URL prefix {-n,--name} page name $ ./snipsnap_extract.py {-p,--prefix} URL prefix {-f,--file} file listing page names $ ./snipsnap_extract.py {-u,--url} full URL Examples: $ ./snipsnap_extract.py -p http://www.foo.com/path/comments -n our_startpage $ ./snipsnap_extract.py -p http://www.foo.com/to/space/ -f files_i_want.txt $ ./snipsnap_extract.py -u http://www.foo.com/wiki/comments/our_startpage
If you find this useful, please drop me an email at katie (thingy) rivard.org.