* Spring Cleaning (Or How To Install Your Own Desktop Search Engine)

How to install SWISH-E for your own Google-like desktop search engine.

By Erik J. Heels

First published 6/1/2005; Law Practice magazine, “nothing.but.net” column; American Bar Association

A few weeks ago, our oil tank sprung a leak. Small damage, big expense. In order to make room for the new and improved double-hull oil tank, we had to clear boxes and boxes of stuff out of our storage room. Stuff like old taxes, old letters, old clothes. In a fit of resolve, my wife and I decided not to replace any box before reviewing — and possibly discarding — its contents. “Keep your old love letters, throw away your old bank statements,” says a Baz Luhrmann song that I’ve mentioned previously (see http://www.erikjheels.com/?p=131). Getting rid of all of the junk made it much easier to find the things worth finding, like old letters from friends and family.

Motivated by our now lean clean storage room, I decided to conduct spring cleaning on my computer files as well. I have saved computer files dating back to 1986 and email dating back to 1992. Before my spring cleaning I had about 19,000 files (about 1000 per year) in my archives. After spring cleaning, I have about 11,000 files (about 600 per year). I know it sounds like a lot, but my archives also include archive, development, and production versions of all of the websites I’ve ever published, plus some that I’ve saved for their historical value (like O.J.’s “not guilty” verdict from 1995, or Mark McGwire’s “record setting” 62nd homerun from 1998). Not including stuff like this, there are only about 1200 files post-purge.

Next I installed a search engine to make finding the good stuff easier. I already use a Google-like search engine on my weblog and my law firm’s intranet, so I decided to use the same search engine for indexing my desktop computer. Forget about proprietary bloatware like Google Desktop, which is also a privacy breach waiting to happen. (See http://www.informationweek.com/showArticle.jhtml?articleID=50500707.) You need a search engine that you can take with you, no matter which operating system you move to.

How To Install Your Own Google-Like Search Engine In Five Easy Steps

Step one, install the Apache web server. Apache (http://www.apache.org/) is free, open source, popular, and well documented. Apache runs on Windows, Macintosh, UNIX, UNIX-like operating systems (FreeBSD, Linux), and your toaster. Apache is included with OS X and with most Linux distributions. Under Windows, I run Apache as a system service, which means that it starts automatically when Windows starts. I can start, stop, restart, and configure Apache from start menu short cuts. If you can read and follow instructions, you can install and configure Apache.

Step two, install the SWISH-E (Simple Web Indexing System for Humans – Enhanced ) search engine and index your documents. SWISH-E (http://www.swish-e.org/) is simple, enhanced, and for humans. What more could you want? OK, you could want a CGI script for your web server.

Step three, install the swish.cgi script on your web server. The swish.cgi script is included in the source SWISH-E distribution in the “example” directory.

Step four, create a “Schedule Task” to periodically re-index your data. In Windows, I created a single-line batch file (C:/progra~1/SWISH-E/swish-e.exe -S fs -c C:/progra~1/SWISH-E/conf/my-swish.conf) that I run as a Scheduled Task. I re-index my weblog and intranet daily, my personal data weekly.

Step five, customize and fine-tune your settings. Once you get your search engine working, you can spend time customizing it to your heart’s content. There are templates for customizing the HTML that SWISH-E outputs by default. You can choose to index the filenames, the contents of files, both, or neither, depending on the file type. Unlike most desktop search programs, you can index and search the contents of ZIP files, PDF files, and OpenOffice (.sxw, .sxi, .sxc) files as well.

For more info on how to tweak SWISH-E to index PDF files, see:
http://swish-e.org/docs/swish-config.html

For more info on how to tweak SWISH-E to index OpenOffice files, see:
http://www.mail-archive.com/linux-br@bazar2.conectiva.com.br/msg28336.html

While writing this article, I used SWISH-E multiple times to find information on my computer. I have even set up SWISH-E to index my Eudora email. I can search within Eudora for information, but it is much quicker to search my email with SWISH-E from a web browser. If I need to forward a particular email message, SWISH-E tells me the name of the mailbox that contains the message.

Conclusion: Visit Your Stuff.

As we were chasing the Internet in the 1990s, my family moved frequently. My wife and I began to refer to our all-too-frequent packing and unpacking as “visiting our stuff.” But there was some stuff we never visited. Our spring cleaning forced us to visit all of our stuff, and to make decisions about what to keep and what to discard. The decisions were surprisingly simple, both for cardboard boxes and for electronic files. It is now much easier to find things in our storage room and on my computer. Maybe now we’ll get around to putting the things worth saving in scrapbooks!

Leave a Reply

Your email address will not be published. Required fields are marked *