InfoQuest! Information Services

How to Conduct Research on the Internet
By Terry Brainerd Chadwick

This guide is designed to be used in conjunction with the course How to Conduct Research on the Internet. It can be bookmarked, saved as a html file and run from your computer, or be printed out for use as a hard copy reference tool. (It is 110k, and about 31 pages printed.)

I. Research Overview

The first thing to understand about research is that there is a reason that it is called RE - search. Unless you are looking for something that is both very common and has a unique name, research is usually an "if at first you don't succeed, try, try, again" type of activity. You choose the words you first think will lead to the information you want, go down one path, and check it out. If you don't find what you are looking for, you rethink your search terms -- using what you learned during the first search -- and search again: re-search, research. Doing research is kind of like being a treasure hunter, ferreting out clues here and there that finally lead to the hidden treasure -- the information you want.

From a research perspective, the Internet is just one of many pathways to various sites that contain information. Much of the information that can be found by using the Internet can also be found by traveling other paths that may faster, less expensive, and/or easier to access. It is also important to realize that the information you want may not be available through the Internet.

When you consider using the Internet as an information resource, approach it the same as you would in doing any research: examine the purpose and goals of the project and use the tools and resources that are appropriate to meet those goals. Keep in mind that the Internet is just one tool that can or should be used. Most of the hard data resources that are on the Internet can be found in libraries, books, CD-ROMs, and commercial online databases. Depending on your project, these may be more effective and efficient sources to use. The strength and uniqueness of the Internet is as a communications tool, a way to share ideas with and ask questions of others about things that aren't covered in the hard or static data resources. The Internet can give you access to experts and specialists on almost any topic imagineable.

II. Research Case Examples

How can all of the information and expert assistance on the Internet help you? Here are fifteen ways that businesses have used the Internet for research.

  1. Obtain upgrades of computer software programs directly from the manufacturer, as soon as they are available.
  2. Test software applications before you buy them.
  3. Get evaluations of software and other products from people who have used them the way you want to use them.
  4. Locate a supplier of products you need in your business.
  5. Locate buyers, or new markets, for your products and services.
  6. Find market information for a business plan.
  7. Track industry and company activities on the stock and bond markets.
  8. Track the latest regulatory and legislative initiatives related to your industry.
  9. Discover what your customers think, and are telling others, about your products and services.
  10. Discover what people are saying about your competitors' products and services.
  11. Track government and industry tenders, procurement bids and contract awards.
  12. Research innovative developments and patent filings in your industry.
  13. Search for a new job, or employees, in your industry.
  14. Stay current with what's going on in your industry: new products, conferences, executives comings and goings.
  15. Locate new customers for your product or service.

Why I Use the Internet for Research

Note: Although this section talks about research I did years ago, in the early days of the public Internet, and the examples are dated, they still illustrate the power of the Internet as a communications-based information-finding medium. Think of them as a bit of Internet research history.

Shortly after I got my first Internet account in 1991 -- when the Internet was very young and didn't have the wide range of commercial search resources now available -- I received a number of research questions that I couldn't answer from the in-house or commercial online resources I had available. I went to the Internet and found success. Using Archie, one of the very first search engines, to search FTP sites solved the first case; the rest of my questions were answered through participation in discussion groups.

Finding a Software Patch

My organization's unix-based computer had developed a glitch. The systems operator knew there was a public domain software program to fix the problem but didn't know where to find it. I used Archie to find which anonymous ftp locations that carried the program. (Today's example would be using Gnutella or BlueBear to find which peer-to-peer (P2P) carried the files you wanted.) The file was available at about fifty locations around the world. I retrieved the software and had it to the systems operator about an hour following the request.

Food dehydrators in Hungary

My client wondered if there was a market for home-sized electric food dehydrators in Hungary. After striking out using the traditional commercial databases, I posted a query on the Hungarian discussion group (HUNGARY@GWUVM.BITNET) and the Eastern European Business Network (E-EUROPE@PUCC.PRINCETON.EDU).

Within fifteen minutes of the posting, my client received a phone call concerning the post. Within an hour there were two replies in my email box. All of these were from the U.S., and none were encouraging. But, within two days I received two encouraging replies from Hungary, one giving me the name and email address for another company in Hungary who might help out. I sent them a message and they responded that they didn't know if there was a market for food dehydrators, but they would check. Four weeks later, I received an email message that they had located someone in Hungary who wanted to distribute the product.

World Blueberry Statistics

An Oregon agriculture department official needed blueberry production statistics by country ASAP, so I queried the FAO's (Food and Agriculture Organization) Library Bulletin Service (AGRIS-L@ IRMFAO01 -- no longer available). I received several responses directing me to good sources, but not exactly what I needed. Two days after I posted the request, I received a fax from the Rome headquarters of FAO with exactly the statistics my client needed.

Database Product Assistance

Another time, a client wanted to know if there were any library cataloging programs using the database program PC-FILE. I posted this question to the Public-Access Computer Systems Forum (PACS-L@UHUPVM1.UH.EDU). Within two days I received several answers via email (half from outside the U.S.) and one long distance phone call from a man who had heard about my question from a colleague who subscribed to PACS-L. He spent over an hour telling me about the how PC-File could be used for cataloging, its advantages and disadvantages.

I haven't received answers to all the questions I've posted, nor have I found every piece of software I've looked for, but the cases listed above were enough to make the Internet a regular research tool for me.

Airborne Videography Case Study

A more involved research project I worked on was about airborne videography. I made extensive use of interest groups to find contacts, potential customers and competitors, and background information for the project. This was a complex project, with many objectives, and many Internet resources used. The research strategy for it is outlined on the following page.

III. Guidelines for Using the Internet

For all that the Internet offers, you need to be careful in how you navigate and operate on the Internet. Communication on the Internet is fast, and a hasty hot-tempered word can literally go around the world in minutes, leaving an impression with thousands of people that you may regret -- and if those words get archived, they're around for a long time to haunt you. A seemingly innocent, to you, question posted to a place where it's not wanted can get you flamed, blacklisted, and truly damage your business.

Taking a copyrighted piece of data and using it without permission and attribution can land you in legal trouble. A "great" piece of information that you didn't validate can turn out to be out-dated or just plain wrong. Warning: do your homework before you venture out.

Netiquette

Like any group or community, the Internet has some common practices, customs, conventions, and expectations. On the Internet these are called netiquette (net etiquette). If you follow netiquette you will get more and better responses to your requests for information.

The Internet consists of thousands of individual networks which allow information to pass among them. Each network has its own policies and procedures. Practices and behaviors which are routinely allowed on one system may be controlled, or forbidden, on another network. It is the user's responsibility to abide by the policies and procedures of these other systems. The fact that you can perform a particular action does not mean that you should take that action -- examples are sending bulk email and hacking.

If you fail to observe netiquette you can lose your access to the network. Most internet service providers have acceptable use policies (AUP) that outline the kinds of actions considered inappropriate on their system. Inappropriate conduct generally includes the placing of unlawful information on a system, the use of objectionable language in email or chat messages, the sending of broadcast messages to lists or individuals, and any other types of use which would cause congestion of the networks or otherwise interfere with the work of others.

Some DOs and DON'Ts for conducting effective research on the Internet

   No unsolicited email.
Don't send bulk mail or announcements to people who haven't requested them. Respect other people's privacy. Realize that it costs the recipient to receive email, just like a fax.

   Requests for information that appear in Internet mail and usenet interest groups must be appropriate and relevant to those groups.
Post to no more than five interest groups, no matter how relevant. If people are interested, they'll alert others to your question. (Warning: even five groups can be too much if someone thinks that you are out of line in your posting.)

   Keep it brief.
Try to keep your information request to two screens or less. Don't waste people's time and bandwidth. (While bandwidth may not be a big problem for most people living in the United States who pay a fixed fee for unlimited use of the Internet, please keep in mind that the Internet is global. In many parts of the world, people pay by the minute to use the phone. Therefore, long messages cost them more money.)

   Share information and knowledge.
Provide a summary to the interest group of the results of your research. Participate in relevant interest groups, even when you're not directly searching for information. You'll be more likely to get answers to your questions if you answer the questions of others.

Flames, spam cancels, and blacklists

These are some of the things that might happen to you if you ignore Netiquette.

Flames/flaming
Flames are incendiary reactions to someone's posting, often insulting or nasty in tone. A flame can occur if you post something that is irrelevant to a discussion group, or simply if someone doesn't like the way you say something. The rule is no matter how upset you are about what someone has written, don't knock off a quick answer and reply immediately. Sit on the answer for a day; re-read it to make sure that you want millions of people around the world to see you in that light; and, if you still feel strongly about the issue -- and someone else hasn't already made your point for you -- send the message directly to the person with whom you are upset, not back to the discussion group.

Spam/spamming
Spam is a number of things. One early definition is excessive multi-posting and cross-posting, i.e., too many copies, of the same (or slightly altered) message to newsgroups and mailing lists. With the growth of the web and private email accounts, the definition has evolved to unsolicited bulk-sent commercial email, or even just unsolicited email sent by someone wanting to pitch something (whether it be for commercial, non-profit, or issue-based purposes).

There are a number of reasons that spamming the Net is considered bad. The most basic reason is financial. Unlike direct mail advertising through the postal service where the cost of the message is born by the advertiser, the cost of a message sent by email is born by the recipient, much like being forced to receive a collect phone call. Many people using the Internet pay for each message they receive -- the comment to just delete it or reply with REMOVE in the subject comes a bit late -- or for storage or for time spent on the system. Would you accept a collect call from a telemarketer?

The SpamCop Foundation has published some statistics about the negative impacts of spam:

  • Spam substantially increases ISPs' costs, which are passed on to consumers.
    In 1997, America Online estimated that between 5% and 30% of its email server time at any given moment was exclusively dedicated to handling spam.
    http://content.techweb.com/wire/story/TWB19971218S0007


  • Between $2-3 of a consumer's monthly Internet bill is for handling spam, according to the 1998 Washington State Commercial Electronic Messages Select Task Force Report.
    http://www.wa-state-resident.com/finalrpt.pdf


  • 7% of Internet users who switch ISPs do so because of spam. This translates to a loss by attrition of more then $250,000 per month for an ISP with 1 million subscribers.
    Reported by the Gartner Group in 1999. http://www.brightmail.com/pdfs/gartner_rebuilt.pdf


  • Spam increases the Digital Divide. In its 1999 report, "Falling through the Net II: Defining the Digital Divide", the National Telecommunications and Information Administration found that 16.8 percent of all households that own a computer do not access the Internet for reasons of cost -- a cost that is $2-3 higher because of spam.
    http://www.ntia.doc.gov/ntiahome/fttn99/contents.html

  • Rural email users pay more. Users who have few or no local choices in Internet service providers may further pay per-minute ISP and/or toll call fees to receive each piece of spam.


  • Spam disproportionately affects disabled email users. For blind or sight-impaired users who employ a speech-synthesis device to read email, spam represents additional time delay in accessing information. For mobility-impaired users, deleting excess email and weeding out spam may be difficult or painful. Finally, spam decreases the utility of email-enabled text pagers used by deaf persons for immediate remote communications.

Interest groups are just that, communities gathered to talk about a particular interest. People who send messages that don't meet the interests of a particular group are intruding upon that group, and wasting the time of the people involved in the group. Also, people may subscribe to several discussion groups in the same interest area. They don't want to receive duplicate messages on all of the groups to which they subscribe.

Messages that have been identified as spams face being cancelled; i.e., deleted from the Internet. Spam cancels have nothing to do with the contents of a message. It doesn't matter if the message is an advertisement, if it's abusive, if it's on-topic, or whether it's for a good cause. Spam cancels are based only on how many times and places the message was sent. To avoid having your message spam cancelled, Chris Lewis, the author of the FAQ Current Spam Thresholds and Guidelines recommends that one posting be cross-posted to no more than 10 groups, no more often than once every two weeks. (This FAQ is available through the newsgroup news.admin.net-abuse.misc. You can also find it at Google Groups: http://groups.google.com.)

Spams, or sending any kind of unsolicited email, can also affect you in other ways. Since most providers have acceptable use policies that prohibit spamming, if enough people complain about something you post, i.e., identify you as a spammer, you can have your internet account closed.

Blacklists

These are lists of spam and net abuse offenders that are widely circulated around the Net. The blacklists are intended to curb inappropriate advertising on interest groups by describing offenders and their offensive behavior, expecting that people who read the list will punish the offenders in one way or another. The guidelines for inclusion are broader than those that trigger a spam cancel because they are based on content rather than number of messages. They include:

unsolicited commercial junk email (UCE)
posting commercial messages to unrelated newsgroups and mailing lists
advertising via irc or talk (live chat groups)
posting commercial messages to relevant interest groups if excessive or unwanted

You can avoid being put on a blacklist by following Netiquette and never sending unsolicited commercial email (UCE) to individuals or irrelevant messages to interest groups. If you find yourself on a blacklist, you can appeal to the owner of the blacklist and explain why you don't belong there.

Spam has become such a great problem for Internet Service Providers (ISPs) that many have joined services that report spammers. If a domain appears on one of these lists, ISPs will not accept email from them. The Mail Abuse Prevention System (MAPS) is one of the primary anti-spam services. MAPS maintains a series of databases, including the Realtime Blackhole List (RBL), containing the Internet addresses of Internet sites that don't follow MAP's suggested email abuse policies. Microsoft and other large service providers use the MAPS lists to help decide whom to block from their systems.

MAPS website: http://www.mail-abuse.com
MAPS' Basic Mailing List Management Principles for Preventing Abuse: http://www.mail-abuse.org/manage.html

More Anti-Spam Resources

   The Coalition Against Unsolicited Commercial Email (CAUCE)
http://www.cauce.org/
This is an organization of Internet users who are fed up with spam and have formed a coalition whose purpose is to promote legislation which amends 47 USC 227, the section of U.S. law that bans "junk faxing", so that it will cover electronic mail as well.

   Blacklist of Internet Advertisers
http://math-www.uni-paderborn.de/~axel/BL/blacklist.html">
http://math-www.uni-paderborn.de/~axel/blacklist_philosophy.html
Axel Boldt runs one of the most popular blacklists, the Blacklist of Internet Advertisers. His website contains information about the Blacklist and lists the people/organizations who are currently on it.

   Broadcast Fax and Junk Email
http://www.markwelch.com/faxlaw.htm
Mark Welch's page contains the text of 47 USC 227, a similar California code, has pointers to other anti-spam information, and also highlights some of the worst spammers.

   SpamCop
http://spamcop.net/
Spam-Cop is a free service that will send email on your behalf to the appropriate network administrator. It explains what it considers to be spam and tells you how to report spam using its service. For a fee, SpamCop will also process your mail and filter out the spam before you ever see it. The site also shows you who is currently committing spam, the ISPs who are the greatest sources of spam, who resolves spam complaints the fastest, and related statistics.

   Junkbusters
http://www.junkbusters.com/
The Junkbuster's site gives you all sorts of information about ways to defend yourself against privacy-invading marketing. This site doesn't just cover email spam, but also gives tips on ways to defend yourself from telemarketing, junk mail, web ads and cookies.

   Essays on Junk E-Mail (Spam) by Brad Templeton.
http://www.templetons.com/brad/spume/
This page contain a series of articles about Spam, why it is so bad, and how to combat it. Templeton, former publisher of Clarinet News and NewsEdge, writes his essays with a wry wit that makes serious topics a pleasure to read. I particularly recommend "The Evils of Spam."

   Slamming Spamming
http://www.uic.edu/depts/accc/newsletter/adn29/spam.html
This university site describes the origins of spam, why it's bad, how to effectively complain about spam, and tricks to minimize email spam.

   SpamCon Foundation
http://www.spamcon.org/
This non-profit organization's mission is to protect email as a viable communication and commerce medium by supporting measures to reduce the amount of unsolicited email that crosses private networks, while ensuring that valid email reaches its destination. It provides a lot of tips about combating spam.

Netiquette Resources

   The Net: User Guidelines and Netiquette, by Arlene Rinaldi
http://www.fau.edu/netiquette/net/netiquette.html
This guide, available in German, Italian, Spanish, French, Japanese, Portuguese, and English, is probably the most comprehensive primer on Internet DOs and DON'Ts in existence. Besides covering how to appropriately use discussion groups, it also covers ftp, telnet, WWW, and email etiquette.

   Dear Emily PostNews, by Brad Templeton
http://www.templetons.com/brad/emily.html
This is a old, venerated, satirical piece on how not to use the Internet.

   Netiquette
http://www.albion.com/netiquette/index.html
This site contains the Core Rules of Netiquette, excerpted from Virginia Shea's book, "Netiquette." The site also contains a Netiquette quiz and a netiquette mailing list for those who want to keep up with issues related to the subject.

Copyright

The Internet makes it very easy to obtain information and transmit and copy it to another location. It is easy to forget that almost all of the information on the Internet, including interest group discussions, are copyrighted and that all laws and penalties relating to copyright and other intellectual property laws apply on the Internet. The general rule of thumb is to assume that any information you find is copyrighted and to seek permission before using it. This includes discussion group postings, graphics, documents, audio and video clips. To learn more, check out the resources below.

   10 Big Myths about copyright explained: An attempt to answer common myths about copyright seen on the net and cover issues related to copyright and USENET posting, by Brad Templeton. Nice and concise.
http://www.templetons.com/brad/copymyths.html

   Copyright Law FAQ
http://www.cs.ruu.nl/wais/html/na-dir/law/Copyright-FAQ/.html
This is a long document prepared by Terry Carroll in 1994 that looks at Copyright Law, particularly from the perspective of electronic media. Its six sections are: Introduction, Copyright Basics, Common miscellaneous questions, International aspects, Further copyright resources, and Appendix (a note about legal citation form).

   US Copyright Office
http://lcweb.loc.gov/copyright/
The Library of Congress sponsors the US Copyright website that includes copyright basics, copyright laws and practices, how to register for a copyright, and a searchable copyrights database.

   The Copyright Network
http://www.benedict.com/
This site covers the fundamentals of copyright, copyright registration, fair use, software issues, public domain, news, and other copyright related information.

   Berne Convention for the Protection of Literary and Artistics Works
http://www.law.cornell.edu/treaties/berne/overview.html
The Berne Convention is the international treaty on intellectual property and copyright that a majority of the world's nations have subscribed to.

   Copyright and Fair Use by Stanford University Libraries
http://fairuse.stanford.edu/
This site covers primary materials: the major statutes, treaties and conventions, regulations and judicial opinions; current legislation, cases and issues; and resources on the Internet, including other sites that contain overviews about copyright issues.

   IP Law
http://www.law.com/professionals/iplaw.html
This is the site of the IP (Intellectual Property) Law Practice Center. It contains current news, decisions, practice papers and tools, a law dictionary and much more about copyright, patents, trademarks, and other intellectual property issues.

IV. Interest Groups

Using the Internet as a business communications tool

Although the Internet is known as a vast depository of information, communications is still what the Internet is all about. More people have some sort of email access to the Internet than any other kind of account, and the popularity of interest groups is exploding. There are more than 90,000 mailing lists, 30,000 newsgroups, tens of thousands of live chat groups, and a growing number of web forums on the Internet that provide opportunities for people to ask questions, share ideas, sound off, and just plain converse with each other on almost any topic imaginable.

An interest group is nothing more than a group of people who get together to discuss shared interests. In the "real" world, this would include people who get together to meet face-to-face at chess clubs, Chambers of Commerce special committees, computer clubs, garden clubs, etc. But in the "real" world, it can be hard to find time to meet with people are interested in real estate investment, [misc.invest.real-estate] or to find someone else in your community who shares your passion for growing orchids.

On the Internet you aren't limited by geography or time constraints. If you are a plumber with a question about plumbing software on a Sunday afternoon, you can post your question and have an answer back, from Australia no less, by the time you get up in the morning by posting a message on the PlumbNet Web Forum. If you are part of an organization and find that it is near impossible to get a quorum together for a face-to-face meeting, conduct most of your business through a mailing list. There are no time constraints and you can save the face-to-face meetings for the most important issues.

In the business world, interest groups are a great way to keep up with what's going on in your industry, and in finding answers to many questions. Interest groups can be used for:

  1. Accessing experts in your industry
  2. Finding out what people say about your company
  3. Finding out what people say about your competitors
  4. Asking questions or clarifying issues about business methods, industry trends, particular business applications for office products, and more.
  5. Discussing issues, planning projects, and other things that are normally done in face-to-face meetings within a company or organization that is geographically or time-constrained.

If you are going to use an interest group primarily for the purpose of finding information rather than to establish a long term relationship with colleagues you need to proceed carefully. If you do the rewards can be great.
Tips:

  1. Take time to read an interest group for a while, a week or two, before posting. Learn the culture of the group:
    • how frequently people post
    • how they post (formal, informal)
    • how much flaming is tolerated
    • signal-to-noise ratio (informative to trivial message ratio)
    • level of commerciality/advertising tolerated

    If the group meets your needs, be sure to frame your postings and questions to fit within the group's culture. If you can't wait a week or so to post a question, try to use a group in which you already participate, or one that is highly tolerant of newbies.

  2. Be considerate. Avoid getting in flame wars. If someone tells you you've acted inappropriately. Apologize. If you must answer, do so in private email, not on the list.
  3. Make sure that your requests for information are appropriate and relevant to the groups to which you post. Post to no more than five interest groups, no matter how relevant. If people are interested, they'll alert others to your question.
  4. Keep it brief . Try to keep your information request to two screens or less.
  5. Provide a summary to the interest group of the results of your research.
  6. Participate in relevant interest groups, even when you're not directly looking for information. You'll be more likely to get answers to your questions if people know you and if you answer the questions of others.

Newsgroups, mailing lists and Web forums: what they are and how to use them

Discussion groups come in many forms, as the examples above indicate. There are Usenet newsgroups, Internet mailing lists, and Web forums. Although they do similar things -- they allow people in a group to read and write messages to each other -- there are important differences between them.

Usenet Newsgroups

Usenet Newsgroups were around before the Internet became public. The key thing to know about newsgroups is that there are more than 30,000 of them, and you may not have access to all or any of them. You subscribe to a newsgroup through your Internet or corporate network host provider and they decide what newsgroups they will allow on their system. Therefore, you only have access to the newsgroups that your provider subscribes to. For instance, if your provider decides not to allow commercial newsgroups, then you won't be able to subscribe to biz.misc or biz.comp.accounting.

To access newsgroups, you use a newsreader -- either a standalone program or part of a browser such as Netscape or Netcruiser -- to locate, subscribe, read, and post to the newsgroups. Subscribing is generally as simple as clicking on the name of the newsgroup that interests you. When you want to read or participate with what is going on in a newsgroup, you start up your newsreader and click on the name of the groups to which you belong.

If you don't have access to newsgroups through your Internet Service Provider (ISP), you can read and participate in them through a number of websites, including Google Groups.

For more information about Usenet read the following FAQs (frequently asked questions):

What is Usenet, by Salzenburg, Spafford and Moraes
http://www.cis.ohio-state.edu/hypertext/faq/usenet/usenet/what-is/part1/faq.html
What is Usenet? A second opinion, by Vielmetti
http://www.cis.ohio-state.edu/hypertext/faq/usenet/usenet/what-is/part2/faq.html

Mailing Lists

For most users, the main distinction between a newsgroup and a mailing list is that you go to read the discussions in a newsgroup to which you are subscribed whenever you want, but the discussions in a mailing list come to you via email. A mailing list is simply an automated electronic mail program that takes a message sent to it and sends that message to all subscribers' email boxes. Unlike newsgroups, which are host dependent, anyone with an email address can belong to a mailing list.

There are a number of different automated mailing programs, and they all work in slightly different ways. They include listserv, listproc, majordomo, mailserv, mailbase, lyris and internet-style mailing lists. Almost all mailing list have you send subscribe and unsubscribe messages to the mailing list program address, e.g., for orchids-can the mailing list address is "majordomo@chebucto.ns.ca"; while messages to the discussion group are sent directly to the group, e.g., "orchids-can@chebucto.ns.ca". It is considered very poor Netiquette to send subscription-related requests directly to the interest group. If you do so, you may get flamed.

A good place to find out about the various forms of mailing lists and how to subscribe to them is Bill Goffe's Resources for Economists on the Internet because it also lists of many of the business-related mailing lists for each type:
http://rfe.org

It used to be that the only way to own and operate a discussion group was to have an ISP who offered the option, usually for a fee. Today, almost anyone can start (host) an email discussion group through a number of web-based services that provide mailing list hosting for free, in exchange for putting advertising on each email message that is sent. Yahoo! Groups and Topica are two of the largest free mailing list hosting sites. These aren't pure email lists, since posting and reading messages can also be done through the website. Many of the "traditional" types of email-based discussion groups also offer people access through either (or both) newsgroups and the web to give people the maximum amount of choice in how to participate in the group.

Web Forums

Now that so many people are using the World Wide Web as their primary access to the Internet, it's not surprising that interest groups are springing up on a number of websites. These are similar to newsgroups in that you go to the website to be able to use them, but they come in many formats. The one thing they have in common with the other kinds of discussion groups is that they bring people of similar interests together to share ideas, questions, and information about those interests. For more information about Web forums see David Woolley's Conferencing Software for the Web (Discussion Forums, Groupware, and BBS/Bulletin Boards):
http://thinkofit.com/webconf/

Other Interest Groups

There are also interest groups that operate in real time on the Internet. Internet Relay Chat (IRC), Multiple User Dialogues (MUD), and Web chat groups are different forms of live discussions available on the Internet. They are not covered here because they tend to be transitory; that is, the discussions come and go, and the real time aspect adds many of the same constraints that face-to-face meetings face. Instant messaging isn't covered here for the same reason. For more information about real-time interest groups, see:
IRC FAQ (includes a search engine for finding chats): http://www.irchelp.org/
MUD Resource collection (includes sites with lists of MUDS): http://www.godlike.com/muds//
Web chat groups: http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Chat/
Instant Messaging: http://dir.yahoo.com/Computers_and_Internet/Software/Internet/Instant_Messaging/

Finding Interest Groups

With more than 150,000 interest groups to choose from how do you find the ones that fit your areas of interest and your information finding needs? First of all, check out the various books and magazines about the Internet that are available on the market. Even though Internet books are outdated before they are published, they provide a good starting place. And the various Internet magazines contain sections on new interest groups and articles about resources available in various interest areas.

One of the very best ways to find discussion that will meet your general information needs is to find one group that you monitor and participate in all of the time. Notices of new groups in the same broad interest area are likely to be posted to that group.

There are a variety of tools available on the Internet to help you find the discussion groups that best meet your needs. Since none of the interest group search tools listed below covers all of the groups available through the Internet, use more than one of these tools to ensure a comprehensive search.

   Topica
http://www.topica.com/
Topica, not only lets your search for email lists, but helps you manage the groups you belong to, and allows you to create your own discussion group.
   Yahoo! Groups
http://groups.yahoo.com/
Yahoo! Groups allows members to create and manage email lists, or subscribe to a wide variety of existing lists, free. It also gives list owners calendars so you can easily schedule meetings, events, and set up automatic email reminders to keep your group informed; chat facilities; and polls so you can learn more about your members by polling them quickly and easily.
   Google Groups
http://groups.google.com
Google provides a way to search the archives of newsgroups going back several years. You can search for particular topics, track threads of conversations, and respond to items. You can also check the posting history of posters, to see what other newsgroups they use.
   Tile.Net
http://www.tile.net/ Tile.net offers a searchable reference to mailing lists, newsgroups, and FTP sites. The mailing list service searches interest groups that use the Lyris, LISTSERV, ListProc and Majordomo list software.
   Publicly Accessible Mailing Lists
http://paml.net/
Compiled by Stephanie da Silva and updated monthly, this list is searchable by name and subject. PAML claims to be not the biggest list directory, but the most accurate.
   Internet FAQ Archives
http://www.faqs.org/faqs/
Search for Usenet Frequently Asked Questions (FAQ) documents. This will help you determine the scope of discussion and the rules for posting to the newsgroup. FAQs are also a valuable research tool since they contain the answers to the most frequently asked questions in the newsgroup. This site allows you to search for FAQs by Usenet hierarchy, full text, category, archive name, type of FAQ, and more.
   Usenet Info Center Launch Pad
http://sunsite.unc.edu/usenet-i/home.html
This is another source for information about Usenet and newsgroups that includes an introduction to Usenet, Usenet history, several links to sources for browsing and searching newsgroups and FAQs, and more.

V. Using Search Tools

The Internet contains a world of information. One of the Net's attractions is that it is easy to publish on the Internet and anyone who wants to can. However, there is no single authority governing the quality of resources on the Net, nor is there a single indexing strategy or way to find the resources. With millions of unqualified resources available, it is particularly important to carefully plan a research strategy, understand the nature of the search tools you use, and critically think about the quality of information that you find.

Establishing a research strategy

   Define your goals.
Analyze your information-finding project. Break it down into its component parts. Determine why you need this information and what you plan to do with it. This will make your information search clear to you and easier to conduct.

   Determine the types of information you need to answer your question.
Knowing the kinds of information you want will help you choose the resources and tools that will best meet your needs.

For instance, if you just want general background information on a topic you might look at an encyclopedia, textbook, article in a popular magazine, or country background paper. But if you need an in-depth treatment of a subject you could look at technical reports, trade association publications, and professional papers. If you just need the address and contact information for a company, a check of their website or a company directory service might do the trick. But if you need a complete profile of the company you would also want check magazine articles, industry reports, annual reports, and interest group discussions that mention the firm.

In what form do you want the information? Do you need tabular or textual information? Are you looking for a full-text source online or something you can find in hard copy offline? If you are looking for statistics in a spreadsheet format, then a using search engine to look for Excel files might be your best bet. (e.g., using AltaVista with url:xls or anchor:xls or Hotbot's FTP search on xls.) If you want a bibliography of books and/or magazine articles that you can find elsewhere, than going to an online library catalog would be appropriate. If you want expert commentary on a subject then interest groups or an academic website could be the places to direct your search. Product reviews can be found in the websites of industry publications or in product-related discussion groups.

Do you need current information or historical information? Are you looking for information from a particular type of Internet site? Commercial, government, educational, military, non-profit organization? Or do you want to limit your information search to that of a particular country? All of these things should be considered before you start your search.

How to formulate a research strategy

   Identify keywords, phrases, and subject categories to use in your search.
Play with your search terms. Think of as many ways to describe your topic as you can. Identify synonyms, distinctive terms, alternative spellings -- the Internet is international and while your truck may ride on tires, a British lorry uses tyres. Think of broad topic categories, the names of companies, products, or people whom you know to be associated with your search topic. If you don't succeed at one, try another. If you find a couple of good documents, examine them for new keywords and concepts and search again.

   Search mechanisms
Boolean Logic
Computers use a number of different mechanisms to search for or limit information. The most common search mechanism used in programming and in commercial databases is called Boolean Logic. Boolean Logic uses three primary search operators: AND, OR, and NOT.

Using AND narrows the search. A search using the terms dog AND cat will return documents that contain both words.

Using OR broadens search results. A search using the terms dog OR cat will return documents that contain either or both words. Only one of the words needs to be present to retrieve a document. Use the OR operator if you have a question that has many synonyms.

The NOT operator will drop any documents that contain the excluded term. For instance, if you are looking for semiconductor chips, you might want to state your search as chip NOT potato. But use this operator sparingly. If you search on Mexico NOT New, you will exclude any documents that talk about new products manufactured in the country as well as all documents about the state.

Other Boolean operators increase precision of searching by allowing some kind of proximity searching. The terms include ADJ for words that are adjacent to one another; NEAR or N# (where # is the number of words within which key terms should be found) usually catches words close to each other in any order; WITH or W# is usually used for words within the same sentence; and SAME usually denotes words within the same paragraph.

Relevancy Searching
Most World Wide Web search tools do not support true Boolean searches. Instead they operate using some form of Relevancy Ranking, made popular by the first widespread search tool on the Internet, the Wide Area Information Server (WAIS). Relevancy searching ranks each document on a score of 1 to 1000, based on how well it matches the user's question: how many words it contained, their importance in the document, etc. Uusally the documents containing the highest frequency of the most words are ranked highest, and not all of your words need to be in the document for you to see it in your search results. Relevancy ranking automatically assumes that there is an OR operator between words in the search term. There are different types of this kind of search tool. Some place the highest emphasis on the first word in the search string; others allow you to specify that some words must (or must not) be present in the search to be considered, in other words adding a form of Boolean limiting to the equation. Most of the relevancy-based search now allow at least three forms of search expressions:
ALL, which corresponds to the Boolean operator AND;
ANY, which corresponds to the Boolean operator OR; and
PHRASE, which looks for the search terms adjacent to each other.

The most powerful search engines also support truncation which allows you to specify a stem of a word, such as educ , which could stand for educate, education, educator, etc. Other truncation operators include !, ?, $ or % -- which symbol is used depends on the search engine. Some search engines automatically stem words; others allow internal wildcards so you can ask for wom?n to get woman or women or womyn. Some search engines allow you to specify retrieval of the most current items, or by domain name extension (com, gov, edu, mil, org), or by part of the world, using the two character country codes such as ca for Canada, de for Germany, and tw for Taiwan.

   Subject vs. Keyword searching
Using a subject guide on the Internet is similar to using a subject catalog in a library. Somebody has taken the time to classify the documents/sites covered by the guide into a number of broad, and often hierarchical, subject categories. A subject guide can be useful if you are not very familiar with a topic and only know the broad category, or if you know a topic but also want to learn about the broader area. You can start with a broad topic like business, and move down the sub-categories until you find what you are looking for. A subject guide may also be preferable when the terms you wish to search on are common or have many meanings. Subject searching can be difficult if the category isn't obvious.

A keyword search allows you to find more information because the computer looks at words in the title and content of a resource as well as the subject. The challenge is to refine your topic so that the search yields an adequate number of useful citations. A keyword search generally results in more false hits than a subject search; that is, more records that match the search criteria but aren't relevant to your project. For instance a keyword search for information on the country Turkey will also give you information about the fowl and recipes for cooking them. A subject search would be more targeted. To find the country Turkey using the subject categories on Yahoo you would start with Regional, move to Countries, and then to Turkey. A keyword search in Yahoo yields records for both the country and the fowl.

The best bet is to use both a keyword search engine and a subject index; preferably more than one of each. This is the same as using more than one book or print index or CD-ROM in the library. You maximize your opportunities to find what you want.

How to choose the best search tools for your questions

As the preceding section showed, there are any number of ways that you can search for information on the Internet. The keys to being successful in searching for information on the Internet are knowing the kind of information you want and understanding how each search tool works so you can decide if it relevant and how best to phrase your search question.

   Read the instructions and tips and techniques for using the search tools you use.
Understand the types of information each tool covers and the kinds of search options you have by reading the instructions for using each search tool. Read FAQs (frequently asked questions documents) and welcome instructions for interest groups before posting.

Some of the things that you should look for, which may or may not be included in the instructions, include:
   What parts of a document does the search tool search?
Directory paths only, titles, titles and top header, all headers, the full text of the document? If only titles or top headers are searched you will need to think carefully about the types of words that are relevant to your question that might be included in a brief title.

   How does the search tool let you structure your search?
Does it let you do a Boolean search (AND, OR, NOT), search by frequency/relevancy, or both? Can you do natural language searching or search by phrase? Can you truncate words or use wild cards? Is there an option for both subject and keyword searches?

   How does the search tool get its information?
How does data get into the database? Does a robot comb the Internet for information? If so, how much of the Internet does it search? Do people enter their own information? If so, is it checked for accuracy? Is there anything that the database doesn't cover or won't let you search? Does it cover only WWW sites and not ftp, interest groups, or internal databases? Does it have restrictions on the terms on which you can search? For instance, a number of search sites restrict the ability to search for "adult" material. Some sites that search newsgroups don't cover any or all of the "alt" groups (interest groups most likely to include sex-related topics).

   How frequently is the database updated?
How often is new information added to the database? How often does it make a sweep of the Internet to update information?

Finally:

   Use more than one source and search tool.
Although there is duplication of sources among sites, queries performed on different search engines usually produce different results. For best results, use more than one search engine. Try specialized sites. Use search tools that search not only the WWW, but interest groups, ftp, gopher, and other sites. Use both keyword and subject guides.

Practice Critical Thinking

Because anyone can be a "publisher" on the Net and there is no qualifying authority to alert you to validity or relevance, you need to take a critical look at any information you locate. Once you have some results, you should evaluate the information you find. Don't accept what you read as the truth: get confirming sources, ask questions, talk to experts, probe for motivations, and use your intuition -- if something just doesn't sound right, check it out further.

Ask yourself these questions:

   What is the source of the information? Reliability.
Did it come from an academic, government or commercial site or an interest group? If the information was obtained from a commercial site, what is the site designed to sell? Do the goals of the sponsoring organization or individual affect the quality or objectivity of the information provided? What is the reputation of information source?

   Who made it available? Authority.
What are the author's credentials? Postings to interest groups frequently represent the author's individual opinion, which may or may not be backed by supportable data. You may be getting an answer from the top person in the field or from someone who just remembers reading something somewhere.

   What was the reason for making it available? Objectivity.
When reading a document, find out the purpose of the publication or sponsoring organization. Is it purely to collect and publish data, or are there any political, ideological, or other agendas? Is the information presented objectively, or does it reflect the biases of its author? Does the information appear to be valid and well-researched, or is it questionable and unsupported by evidence? How thorough is the coverage compared to other sources. If you have questions about a piece of data, call the provider and ask for the original source of data to be identified. Ask about data collection techniques.

   How timely is the information? Relevance.
How old is the information? When was it published? Do you have the most recent version of the document? While it is easy to publish a document on the Internet, maintenance and updating are not necessarily done with any regularity. This is particularly true when a document is copied from it's original source site and stored on another site. Although the document on the original site may have been updated, revised, or even removed, there is no guarantee that other sites have followed suit. If it seems old, look for a more recent version.

The University of Utah has developed a questionnaire to guide you through the critical thinking process in evaluating a website:
http://www.lib.utah.edu/instruction/handouts/webeval.html

VI. Finding Information: The Traditional Internet Services

People were doing research on the Internet long before the advent of the World Wide Web by using telnet, ftp, gopher, and WAIS Internet tools. Although many of the sites that used these tools (particularly gopher) are no longer being actively maintained, these traditional Internet tools still contain valuable information and can be accessed using stand-alone software programs or via the WWW. For instance, a search in AltaVista for URLs with gopher in them (URL:gopher), had 65,010 results in September, 2001. Many of the hits were for older documents, archives for sites now being maintained on the Web. Although you may not find a reason to directly use any of these search tools or sites, you may run across them while searching the web, so it makes sense to learn a little about them, or know where to find them and how to use them.

FTP: Finding Software with Archie

FTP (file transfer protocol) is one of the three "protocols" that are the foundation of the Internet. (The others are telnet and email.) FTP is simply a system for transferring files across the Net. Anonymous FTP allows you to connect to a remote computer in order to transfer public files back to your local computer. (An Anonymous FTP site is one that doesn't require passwords and permissions to gain entry to the site.) It is primarily used for retrieving software, pictures, and large documents. Think of it as an early version of Napster and other peer-to-peer file sharing services.
To get to an anonymous ftp site using a WWW browser, type ftp://site.name. The WWW browser handles the rest of the login.

Archie is a search tool for finding ftp files. Archie was one of the first Internet search engines. To try a search on Archie, visit the following site at Idaho State University.
http://www.isu.edu/departments/comcom/internet/archie_form.html

A number of Web-based search engines will search FTP sites.

Telnet: Diving into remote databases

The telnet protocol allows your system's computer to connect to another computer on the Internet. You can then operate that remote computer as if you were directly connected to it. With telnet you can use BBSs, databases, search tools, and other services not otherwise available through your host system. Telnet is still one way to access many online library catalogs.

To access a telnet site using a Web browser you first must have specified the stand-alone program you are using under the options and preferences section of the browser. Windows comes with a telnet program that most web browsers automatically use. If you want to use something else, you can find a telnet program at Download.com (http://www.download.com/). Then you can type telnet://site.name in the location box, and you're on your way. Be sure to jot down the login and logout instructions when you connect, as well as instructions on how to use the service or you may get "stuck" and have to disconnect.

Hytelnet is a search tool developed by Peter Scott for locating telnet sites. It was available in PC, Mac, and Unix formats and could be accessed directly through many Internet providers. Although the Hytelnet indices are no longer being updated, you can still access many of the old files. To find out more about Hytelnet visit:
http://www.lights.com/hytelnet/

Gopher: Using Veronica and Jughead to search Internet archives

Gopher was the first real internet-spanning software program. (The gopher is the mascot of the University of Minnesota where the system was developed, hence the name.) Gopher provides a standardized hierarchical text-based menu-based system for finding and retrieving files of all kinds from throughout the Internet. It's best at retrieving plain ASCII text files or providing access to telnet applications. Gopher provided the first organized means of searching and retrieving documents on the Internet. One advantage of gopher over the WWW is that you don't have to specially code the documents (in HTML) to read them. Although most gopher sites have been discontinued or have been converted to WWW sites, there is still a lot of valuable information stored in gopher sites. You can find information about gopher and see how it works at the University of Minnesota gopher site.
gopher://gopher.tc.umn.edu:70/11/

Veronica and Jughead

Veronica (Very Easy Rodent Oriented Net-wide Index to Computerized Archives) is a search engine for gophers. It uses Boolean search techniques and supports a number of options to better narrow or focus a search. Like gopher, it is difficult find a working Veronica search engine, but the Point Loma Nazarene University in San Diego has decided to make the experience available to interested users. It has Veronica-2, the "reincarnated Gopher search engine" that covers 2.5 million selections (approximately 45 percent of Gopherspace).
gopher://gopher.ptloma.edu:70/7/v2/vs

Jughead is like Veronica except it just searches for programs within a particular host system rather than the entire Internet.

WAIS: Relevance-based keyword searching for documents

WAIS (wide area information server) is a distributed information service which offers simple natural language input, indexed searching for fast retrieval, and a "relevance feedback" mechanism which allows the results of initial searches to influence future searches. It is the type of search engine upon which most early WWW search engines were based. WAIS servers usually search the entire text of the documents indexed but don't search the entire Internet, only particular databases. Like gopher, WAIS sites are no longer being updated and are hard to find. You can search an archive of WAIS sites at the Nordic W4 Project, Automatic indexing and classification of WAIS databases website.
http://www.lub.lu.se/auto_new/UDC.html

VII. WWW Search Tools

The major advantage of the World Wide Web over other Internet services is its ease of use. You move from place to place around the Internet by clicking on pieces of highlighted text. You can do ftp, telnet, gopher, newsgroups and email from within a Web browser. The graphical web browsers also allow the ability to see graphics, view movies, and listen to audio clips while online. The Web has almost become synonymous with the Internet, having all but cancelled out telnet, gopher and ftp.

The first WWW search engines were pretty simple and crude, offering primarily a very basic WAIS search. But the number, and power, of the search engines and subject guides has grown immensely. The distinctions between subject guides and keyword-based search engines is diminishing as most subject guides are searchable and many search engines have instituted subject indexing to facilitate searching large databases.

None of the search engines or directories can be considered comprehensive. In August 2001, Search Engine Watch reported that Google covered the largest number of web pages, indexing over 1 billion pages, and providing results for more than 1.3 billion pages. (Source: SearchEngine Watch.) The 1.5 billion or so indexable web pages are but a drop in the bucket of the number of publicly accessible but "invisible" (to search engines) web pages. Bright Planet estimates that the "deep Web" contains nearly 550 billion individual documents compared to the 1 billion plus of the surface Web. (Source: Complete Planet FAQ) Clearly, no one search tool even comes close to covering the information contained in the sites connecting to the World Wide Web -- one reason why it is desirable to carefully determine what information is needed before conducting a search and to use more than one search tool. (Note: The size estimates provided by the search engines should be viewed skeptically. Greg Notess' Search Engine Showdown compares the self-claimed estimates with its own "effective size" estimates and they are very different.)

The list of WWW search engines and subject guides that follows is by no means complete. They are the ones that I regularly use or that colleagues have recommended to find information on the Internet. Each of the search engines and subject guides has unique features, either in content, output arrangement, search options, or size.

I use a lot of different search tools and what I use depends on what I'm looking for, how much time I have, how much I know about the subject, and other criteria. When I'm looking for general information on the Net, I go first to Google, one of the most popular, powerful, and no frills search engines on the Web. When I want a Web search that includes publications from commercial sources, such as magazines, journals, and investment reports, I search Northern Light, which also offers custom search folders that automatically organize the results of a search into subject categories. I also like Northern Light's precision searching capability. Precision searching is the reason that AltaVista remains one of my favorite search engines as its advanced search uses Boolean search parameters. If I want to quickly survey a number of search engines, I use Metacrawler or DogPile. My favorite subject guide for non-commercial subjects is the Librarians Index to the Internet, although I use Yahoo when I want to search for educational institutions, Internet resources, and country/regional information because of its excellent subject collections. The Clearinghouse for Subject-Oriented Internet Resource Guides is very useful when I want a good guide to Internet resources on a particular subject that is written by an expert on the subject. And Gary Price's Direct Search is an outstanding guide for finding those web-based databases that aren't covered by most search engines -- a small subset of the "deep web."

Meta Search Sites

The Meta Search sites, which cover multiple search tools, are especially good when doing complex research, because they allow for custom-tailored searches and cut down on the amount of time usually spent in searching several services. Some of these sites are simply collections of seach engines which you search separately. Others are "meta-search engines" that search a number of the other search engines all in one search. The advantage of this is that you save time and duplication. The disadvantage is that you lose the distinct search options that distinguishes each of the search tools.

   MetaCrawler.
http://www.metacrawler.com/
Now owned by Go2Net, the oldest metasearch engine on the Net searches several of the most popular finding tools on the Net, and then verifies and collates the results, eliminating duplicates. Metacrawler currently searches AltaVista, Excite, GoTo.com, Infoseek, LookSmart, Lycos, Thunderstone, WebCrawler, FindWhat, DirectHit, RealNames, Kanoodle, Sprinks, About, and Google. Search parameters include by phrase, any term (OR), or all terms (AND). You can choose to search the Web, the Open Directory project, newsgroups, audio/MP3 (MP3Board, AudioGalaxy, AstraWeb and GigaBeat), images from ditto.com, or auctions (200 different auction sites). When you use the PowerSearch option, you can limit the search by search engine, geographic region or type of organization (edu, com, gov, etc.), as well as specifying how long you're willing to wait for the answer. A custom option gives you some formatting and refining options. I use this to get a comprehensive search without duplication.

   DogPile.
http://www.dogpile.com./
DogPile, like MetaCrawler, is now owned by Go2Net. Dogpile lets you search the Web, newsgroups, ftp, newswires, business news, stock quotes, weather, maps, yellow pages and white pages, images, audio/MP3, stock quotes, jobs/careers, and small business. Dogpile searches three search engines at a time, listing the results from these search engines on one page. If you don't get at least 10 documents matching your query, it will automatically move on to the next three, and so, on until all are searched or 10 matches are found. You can continue on by pressing "Next Set of Search Engines." In custom mode you can search each source singly, or set the parameters to search another or every one sequentially. Dogpile searches the following sources: Web -- LookSmart, GoTo.com, Web Catalog, Yahoo!, Dogpile Open Directory, Sprinks from About.com, Lycos, InfoSeek, Direct Hit, FindWhat, AltaVista, Kanoodle, Real Names and Google; Usenet -- Altavista; FTP -- Fast FTP Search; NewsCrawler -- Yahoo! News, Infoseek News ; BizNews; yellow and white pages and maps by InfoSpace; jobs/careers by headhunter.net; weather by Weather Underground; auctions from GoTo.com; audio/MP3 from Astraweb, AudioGalaxy, Gigabeat, MP3Board; images from Ditto.com; stock quotes by SiliconInvestor; small business by Hypermart. Its Geographic search option, which uses cookies, allows you to search by city, state or zip code.

   c|net's Search.com.
http://www.search.com/
A very good metasearch engine, SavvySearch, has been incorporated into c|net's Search.com. It allows you to search once for results from over 800 search engines, Web directories, auctions, storefronts, news sources, discussion groups, reference sites, and more. You can also metasearch within one or more of 15 selected major categories, or their subcategories. Results can be sorted by source, relevance or date. Search.com has quicklinks to horoscopes, jobs, maps, shopping, coupons, and people. It also links to c{net's other websites, including download.com, news.com, help.com, builder.com, and gamecenter.com.

Search Engines

   AltaVista
http://www.altavista.com/
This venerable search engines has had its ups and downs. Once the favorite of professional researchers, it has yielded that distinction to Google, but continues to be a favorite because of its precision search features. AltaVista indexes the full text of more than 550 million unique web pages. AltaVista uses text relevance and link analysis in producing results, but can also be searched with Boolean logic. Searching can be by exact phrase and by stating that certain words or phrases must, or must not, be in the search. In addition, you can constrain searches to certain portions of the document such as title, anchor, host, link, text applet, image, and URL. The most powerful feature of Altavista is that you can mix these constraints within a single search. AltaVista can also be searched by category and it has a number of specialty search engines and tools including images, audio/MP3, video, shopping, entertainment, news, auctions, maps yellow pages, a Family Filter, a translation facility, search guides, and more. AltaVista indexes some dynamically-generated pages.

   Northern Light
http://www.northernlight.com/
This service's distinguishing feature is that it allows you to search both the web and/or a special fee-based collection of articles from 7100 full-text publications licensed from major commercial database publishers. Northern Light indexed approximately 350 million web pages in August 2001. When you do a search you get the results listed both individually and organized in subject folders that allow you to easily narrow your search. Northern Light supports full Boolean search expressions (and, or, not, and nesting), has truncation and automatic plurals, and you can limit the search to URLs or titles. You can also do an industry search where you can limit your search to one or a number of industries and limit the output by date and press release, product reviews, or job listings. The special collection articles are available for between $1 and $4, payable by credit card for each document you download. There are also special subscription accounts.

   HotBot
http://www.hotbot.com/
HotBot was developed from and uses the Inktomi search engine which Inktomi covers about 500 million web pages, split between 110 million "Best of the Web" documents and 390 million "Rest of the Web" documents. You can search by AND, OR, exact and Boolean PHRASE, URL, page title, or Person, but you can't combine these options within a search. You can specify that a page must include an image, MP3, video or JavaScript. The advanced search option lets you specify that a search must, should, or must not contain particular words. You can limit a search to pages published within a certain period of time (from the last week to the last two years); by media type (images, audio, shockwave, java, MP3, JavaScript, VRML, video, acrobat, VB script, real audio/video, winmedia, etc.); by location (domain, country code, or continent); and by language. You can also control the depth of your search within each website (home page only or a specified number of levels) and enable word stemming.

   Google
http://www.google.com
In October 2000, Google received the WIRED Readers Raves award for "Most Intelligent Agent." Although Google encourages users to just &type in a few descriptive words and hit the 'enter' key," it does allow advanced search limits of AND, NOT, PHRASE, and, only recently, OR. Unless specifically indicated, Google automatically ANDs terms. Google does not support truncation or stemming, and is not case sensitive. Google uses both relevance and linking (the number of pages linking to a page and what they have to say about it) to determine page ranking. In addition to the basic search, Google allows you to search by stock ticker symbol for stock quotes, by address to find maps, and offers 60 interface languages. You can search for pages linking to a site. Google is also one of the few places where you can find Adobe PDF files and dynamically generated pages indexed. The Google Scout feature will find similar pages for a particular result. Google also has an "I'm Feeling Lucky" submit button, that takes you to the one site that Google thinks will best answer your question.

   Fast Search
http://www.alltheweb.com/
Fast is one of the fastest and largest search engines, covering about 625 million web pages. You can limit searches by AND, OR, NOT, or by PHRASE, but Fast does not offer truncation and is not case sensitive. You can limit searches to title, text, link name, URL, link, language and domain. FAST's search result rankings are based on relevancy, link analysis, placement of text on the page, and the use of keywords. Fast offers ftp, WAP, MP3, and rich media searching through Lycos. Fast also has "offensive content reduction" which automatically filters out most offensive behavior unless the default is changed.

   Excite.
http://www.excite.com/
Excite is one of the oldest search engines. Its search engine reviews the information content of Web pages, their meta-tags, referring anchor text, and link popularity to determine relevance and ranking. Excite covers about 300 million web pages (8/2001). You can also search for news and news photos from over 300 Web-based publications and major newswires, audio and video files, information about thousands of cities from around the globe from Excite Travel. Excite's directory service is provided by LookSmart. If you prefer, you can use advanced query language searching that allows full Boolean searching including nesting of terms. You can also specify/limit by language, country or domain. Excite doesn't support truncation and is not case sensitive. The site carries horoscopes, stock quotes, news, maps, an address book, weather information, a people finder, yellow pages, chat groups, and more. You can personalize how you see the page, and also get a free email and voicemail account accessible through the Excite website.

   Lycos.
http://www.lycos.com/
Lycos has changed a lot since it began as one of the first search engines run by Carnegie Mellon University. It now offers several searching options. Lycos has discontinued using its own search engine and now gets results from both Fast and Inktomi. Lycos uses the Open Directory for its directory service. The Lycos Network offers many other services that make it one of the most popular portals on the Web. Services include searching for ftp, audio, MP3, video and images; the Lycos Top 50 (searches); chat; a people finder; maps; auctions; classifieds; and much more. Lycos Network is home to a large online community with over 5 million registered Tripod and Angelfire members.

Subject Guides and Directories

   Yahoo!
http://www.yahoo.com/
Yahoo! is everybody's "favorite" subject guide. Since Yahoo! doesn't use robots to search the web for sites, it relies on people to register their sites with it. However, Yahoo! editors review each submission and not all sites get listed in their directory. By itself, Yahoo!'s coverage is much less than most search engines'. However, just as most search engines have directory components, Yahoo! has a search engine to complement it: Google. If you use the search box, Yahoo! will first search its own database, but if it doesn't find anything it searches Google. In Yahoo!, you can look for documents by moving through the menu categories or do a keyword search at any category level. The advanced search option allows to search by Intelligent Default, by exact phrase, AND, OR, limiting by date, Yahoo! categories only, or the Web. Yahoo! also offers yellow pages, maps, stock quotes, chat, shopping, auctions, email/web discussion groups, and My Yahoo! which is a customizable version of the site that includes email. There are also regional Yahoo! sites that link to websites information relevant to the country, city or other geographic area.

   LookSmart.
http://www.looksmart.com/
LookSmart has reviewed and organized more than 2 million Web sites indexed into more than 200,000 categories, organized hierarchically) covering everything from gardening and books to motor racing and space exploration in order. You can search by either keyword or drill down through the categories. The keyword search results give you both a list of relevant categories as well as the relevant sites. If you don't find what you want in LookSmart's collection of links, you can do a search of the web that uses Fast, Inktomi, Direct Hit and RealNames. LookSmart is the directory for AltaVista, MSN, Excite, and CNN.

(Unfortunately, the mandatory fee-based guaranteed review for commercial sites by both LookSmart and Yahoo! means that very small sites who choose a .com domain name, and who provide great content but are not money-generating, are at a major disadvantage at getting listed in these crucial directories. The assumption is that the best site to answer a particular question is a financially-rich one -- a bad assumption from my experience. That is why you won't find bid-for-rankings sites like Overture, formerly known as GoTo recommended in this guide)

   About.com
About.com
Formerly known as the Mining Company, this site contains hundreds of guides on as many topics, all put together by experts on each subject. You can keyword search for information, or drill down through 36 top-level categories (about 700 lower-level topic areas) to find what you want. About also has geographic sites for Australia, Canada, India, Ireland, UK, and soon Japan. Many of the subject areas have forums, chats, and/or e-newsletters that you may subscribe to.

   Open Directory Project
http://www.dmoz.org/
The Open Directory Project was started as an alternative to Yahoo! and machine-based search engines. Their mission statement is "Humans do it better." It has a hierarchical network of categories, with volunteer editors who review sites for each category level. There are sixteen major category levels: arts and entertainment, business, computing and Internet, games, health and fitness, news, recreation, reference, regional, science and technology, shopping, society and lifestyles, sports, and World (in German, Spanish, French, Japanese, Chinese, Italian, Portuguese, Russian, Polish, and Indonesian) The Open Directory is used as the directory service for a number of search engines and sites, including AOL, Netscape, Lycos, Hotbot, All the Web (Fast), Northern Light, and others.

   Gary Price's Direct Search (Search Tools and Directories)
http://gwis2.circ.gwu.edu/~gprice/direct.htm
This is a guide to direct links to numerous search interfaces of resources that are not easily searchable from general search tools because they are located in databases or other "hidden" places, also called the "invisible web." Categories include library archives and catalogs, full text book sources, humanities, science, social sciences, bibliographies, government, business/economics, ready reference, and news sources & serials. Price also maintains a List of Lists, Speech and Transcript Center, Streaming Media: News & Public Affairs Resources, Newscenter, and Congressional Research Service Requests, all linked to from the Direct Search pages.

   Librarians' Index to the Internet
http://lii.org/
This is an extensive index to great sites on the Internet compiled by Carole Leita at the University of California, and other librarians. There are 36 major categories, some of which are arts, automobiles, business, computers, disabled, education, families, food, geography, history, health, medicine, jobs, Internet, kids, law, language, literature, libraries, media, music, organizations, people, politics, recreation, reference desk, religion and philosophy, science, sports, women, and seniors. The site can be browsed by category and subcategory, or searched by keyword, subject, title, description or links There is an advanced search option that allows fielded searches that adds author, URL, publisher name and indexer. You can choose to use stemming or no stemming, and limit your search to various categories. You can subscribe to a weekly list of new sites added to the index, which serves as a great way to keep up-to-date on good reference resources.

   WWW Virtual Library: Data Sources by Subject.
http://vlib.org/Overview.html
This is another site where each subject is compiled and hosted by an expert in the field. The Virtual Library is the oldest catalog of the web, started by Tim Berners-Lee, the creator of html and the web itself. The Virtual Library may be browsed by category, alphabetical title, or keyword searched. The Virtual Library is co-ordinated by an elected council, with major decisions decided by the membership at large. A drawback to this distributed search guide is that the subject guides are erratically and infrequently updated, although there are efforts to keep the directories more up-to-date.

   Argus Clearinghouse for Subject-Oriented Internet Resource Guides (UMich)
http://www.clearinghouse.net/
This site also contains Resource Guides to the Internet by subject authored by experts in the field. This is usually the first place I start on a research project in a field that I am unfamiliar with. The Argus Clearinghouse is a non profit venture run by a small group of people who rely on submissions to populate the Clearinghouse. The Clearinghouse selects sites that meet their Collection Policy and submission guidelines. The selected guides are great because they usually include an explanation of the subject and/or how to use the Internet, and usually have sections for ftp, telnet, gopher, email, and WWW resources. The site reviews and rates the resource guides to give you an idea of the overall quality based on five different criteria: level of resource development, level of resource evaluation, guide design, guide organizational scheme, and guide meta-information. Each month Argus gives a different guide the Digital Librarian's Award for sites best represent Argus site criteria. You can search the site by browsing through the fourteen categories or by doing a keyword search.

There have been a number of articles and websites that describe and compare the different search engines.
Sheila Webber has compiled a list of articles written about searching and search engines in Business Sources on the Internet: Reviews of Search Engines.
http://www.dis.strath.ac.uk/business/search.html
To keep up with all of the changes in the various search engines, you can't beat Danny Sullivan's Search Engine Watch. Search Engine Watch also has a free newsletter that you can subscribe to, and a fee-based newsletter and section of the website that provides more information in greater detail.
http://searchenginewatch.com/
Search Engine Showdown by Greg Notess is another site that describes how the various search engines work. It has a chart with the basic features of the major search engines, as well as a search engine size comparison chart.
http://www.searchengineshowdown.com/
Carol Leita, who maintains the Librarians Index to the Internet, has put together both a Search Tools Chart and a Search Engine's Quick Guide that is available at the Infopeople website.
http://www.infopeople.org/search/chart.html
http://www.infopeople.org/search/guide.html

VIII. Some Good Research Databases

These are just a very small sampling of the business-related research sites and databases available through the Internet.

Library catalogs, books, serials, and bibliographies

   Library of Congress.
http://www.loc.gov
This site contains the holdings of the US National library which includes all items holding a US copyright. The site also includes the Copyright database, the American Memory site, and a link to Thomas, the Congressional website.

   Library Catalogs give you a chance to find all of the books in the world. One of the most comprehensive library catalogs is Melvyl, now known as California Digital Libraries. It contains more than 9 million different titles representing over 13.8 million holdings. It also has the California Periodicals Database of over 814,000 serials in California libraries. Besides being a comprehensive catalog, it makes producing a bibliography easy. Melvyl also has links to many other college library catalogs and databases.
http://www.melvyl.ucop.edu/

   Ingenta: journal and article database and document delivery service.
http://uncweb.carl.org/ This "old" Internet service, formerly known as CARL UnCover, is still hard to beat as a magazine/journal database and document delivery service. It indexes and carries the tables of contents to more than 17,000 periodicals. You can search the database and get citations and some brief abstracts for over 7,000,000 articles which have appeared since Fall 1988. Then you can order the documents, pay for them by credit card, and have them mailed or faxed to you with copyright fees covered.

   Hermograph Press
http://www.hermograph.com/njd/freemags.htm
This subset of the Net.Journal Directory lists magazines that offer free online articles. Although scientific and computer-related journals predominate there are a fair number of business and popular magazines represented.

   Find Articles
http://www.findarticles.com/PI/index.jhtml
FindArticles is a content-distribution partnership between LookSmart, which provides the search infrastructure, and the Gale Group, which provides the published editorial content. This site lets you search for articles in more than 300 magazines and journals. You can keyword search, search by category, magazine title or subject, and browse alphabetically by title.

Government, legislative and regulatory information

   The White House
http://www.whitehouse.gov/
The White House: Besides providing a tour of the White House and a welcome from the President, you can get to all of the Executive agencies from this site.

   FirstGov: Your First Click to the U.S. Government
http://www.firstgov.com/
FirstGov opened in late summer 2000 as the U.S. government's "easy-to-search, free-access website designed to give you a centralized place to find information from local, state, and U.S. Government Agency websites." The plan is to eventually index all government information on the Internet. There are three ways to find information in FirstGov: 1) search by keyword; 2) browse the Interesting Topics section; and 3) click on one of the three branches of the U.S. Government.

   Fedworld Information Network
http://www.fedworld.gov/
Fedworld was the first gateway service to US government electronic sites. It was developed in 1992 by the National Technical Information Service (NTIS), a self-supporting federal agency which does not receive appropriated government funds and is run like a business to provide a service. Fedworld contains Federal job announcements, U.S. Customs Traveler Information, U.S. Business Advisor, Internal Revenue Service, press releases and other US government documents. It has access to 20 Federal databases, to government electronic bulletin board systems, ftp files, and subscriptions.

   GPO Access
http://www.access.gpo.gov/su_docs/index.html
GPO Access, a service of the U.S. Government Printing Office, gives you access to many of the government's best information products, including the Federal Register, Commerce Business Daily (CBD), Code of Federal Regulations (CFR), Congressional Record, US Budget, and US Code (USC).

   Thomas: Legislative Information Online.
http://thomas.loc.gov/
This site includes the full text of legislation from the 103rd through the 106th Congresses and the Congressional record. Information on past Congresses is also available. There are also links to the House and Senate websites and Library of Congress.

Legal and political information: cases, opinions and treaties

   Legal Information Institute at Cornell.
http://www.law.cornell.edu/
This site has legal information and court decisions, including US Supreme Court, District Courts, International laws, and links to many other legal and judicial information sites.

   Lex Mercatoria: International Trade/Commercial Law Monitor
http://lexmercatoria.org/
This site, one of the first law sites on the Net, has information about international treaties, model laws and arbitration agreements, international organizations, and other international legal information. The site has been bought by Cameron May and has been extended to include a number of other international law areas.

   FindLaw
http://www.findlaw.com/index.html
This is a searchable index of legal resources that includes law schools, law reviews, legal publishers, legal associations and & organizations, continuing legal education, law student resources, a link to the Law Crawler, statutes & laws, consultants & experts, federal and state resources, international resources, legal forms, legal employment and more.

Economic, demographic, trade, and other statistics

   US Census Bureau
http://www.census.gov/
The Census Bureau site has population and housing data, maps, City and County Data Book, Statistical Abstract, Economic Census, data on agriculture, international trade, manufacturing, materials for teachers, and more.

   CIA World Factbook:
http://www.odci.gov/cia/publications/factbook/index.html
Very good demographic, economic, and political information about countries: the World Factbook can be found on many websites, including the CIA's.

Company, market and financial information

Thousands of companies have websites on the Internet. These are some sites that contain collections of company-related information, market research reports, and financial data.

   Securities and Exchange Commission
http://www.sec.gov
The primary mission of the U.S. SEC is to protect investors and maintain the integrity of the securities markets. The site has the EDGAR database that contains securities information and SEC filings on public companies.

   Hoovers Online
http://www.hoovers.com/
This site contains the MasterList Plus database which has the name, ticker symbol, location, and/or sales on over 10,000 companies.

   Thomas Register of American Manufacturers
http://www.thomasregister.com/
Database for product and service suppliers in 52,000 categories.

   Wall Street Research Net.
http://www.wsrn.com/
This site provides 250,000 links to company information, US and international stock and bond market data, research publications, economic information, and other information to help professional and private investors.

   NASDAQ Stock Market
http://www.nasdaq.com/
Frequently updated stock information in graphical format for composites, index values, NASDAQ 100, company information, etc.

   New York Stock Exchange (NYSE)
http://www.nyse.com/
This site has information about listed companies, a daily market summary, general investment information, and other information about the NYSE.

   Finance Wat.ch
http://Finance.Wat.ch/
This is a Swiss site that covers financial goings-on around the world. It includes a list of interntional stock exchanges, international financial news, and links to other financial-related websites.

   PR Newswire
http://www.prnewswire.com/
This is one of the leading sources for corporate press releases. It archives news releases from participating members for one year. You can search for information by company name. The site also contains industry information and links to the daily online magazine Money Talks.

   Business Wire
http://www.businesswire.com/
This is another leading source for corporate press releases. You can search by company name or industry to bring up information on a company that may include stock and SEC information as well as news releases and a link to the company's website. The site also contains information on tradeshows, high tech news, retail reports, and other business information.

General Business Information

   US Small Business Administration.
http://www.sbaonline.sba.gov/
Has information on starting, financing, and expanding your business, as well as SBA program offices, and links to other business resources.

   Internet.com's Electronic Commerce Guide
http://ecommerce.internet.com/
Contains articles about electronic commerce, examples of how the Internet is used to promote business, information on how to promote your business on the Internet, and more.

   US Business Advisor.
http://www.business.gov/
The U.S. Business Advisor exists to provide business with one-stop access to federal government information, services, and transactions. Contains a Q&A section, business news, finance, international trade, finance, and more.

   Khera Communications' Business Resource Center.
http://www.morebusiness.com/
The Center was created to help businesses get useful information to help them grow. Resources include information on marketing, managing, and financing a business, as well as links to procurement sites, how to write a business plan, etc.

   Federal Express Tracking Service.
http://www.fedex.com/

   UPS Parcel Tracking Service.
http://www.ups.com/

Buyers & suppliers, trade leads, procurement resources

   Commerce Business Daily. There are a number of Internet sources for the CBD, which lists federal, state, and other procurement opportunities. Many of these are fee-based, or require some affiliation with an organization. A few of the services offering the CBD via the Internet are listed below.
GPO Access CBDNet. CBDNet is the official FREE online listing of Government contracting opportunities which are published in the Commerce Business Daily. It describes how the CBD works, lists the classification codes, and has a number of search options including general keyword search, search by field, and browse.
http://cbdnet.access.gpo.gov/index.html

Loren Data Corp. CBD Today. Under agreement with the U.S. Department of Commerce, Loren Data Corp. is a 100% web-based access point for government EDI, including provision of the current day's Commerce Business Daily (CBD) free through the Internet, as well as searchable CBD archives back to 1995. The site is searchable and output is very brief in format. They also offer a fee-based email service where they deliver targeted lists to you on a daily basis.
http://www.ld.com

   GovCon: The Government Contractor Resource Center.
http://www.govcon.com
GovCon provides free access to the CBD --current and archives--, government regulations, databases, contractors, a Teaming Opportunities database for companies with specific subcontracting or collaboration needs, an Employment opportunities database, and more. Like most of the other CBD providers, GovCon offers targeted CBD listings via email. By Khera Communications.

   National Association of Women Business Owners (NAWBO)
http://www.nawbo.org/
This organization provides services and assistance to women business owners. The website describes the services and has a members only area offering many resources.

   SBA Procurement Hotlist.
http://www.sbaonline.sba.gov/hotlist/procure.html
Links to a large number of procurement-related pages, primarily US Government.

   Tenders Electronic Daily.
http://ted.eur-op.eu.int/ojs/html/index2.htm
TED is the European Community's procurement database. Fee-based; requires Java capability.

   Trade lead and business opportunities interest groups. There are a number of newsgroups and mailing lists that provide opportunities to advertise your business and products, or to post buy and sell opportunities. A few of those follow.

alt.business.import-export
This is a newsgroup where you can post import and export-oriented buy and sell opportunities. There are also announcements of various import/export-related services.

http://www.ijs.com/naafetee/
eeurope-business digest This is a discussion group sponsored by the North American Association For Exports To Eastern Europe (NAAFETEE) that is on trade opportunities to and from Eastern Europe. The website contains the archives and subscription sign-up page.

News and media sources

The Internet has become a hot spot for traditional media sources. Newspapers, newswires, TV, radio are all well represented. Some of the sites do charge a fee for full access, but most give you headlines and synopses of lead programs.

   News Link
http://www.newslink.org/
News Link, sponsored by the American Journalism Review, has more than 9,000 links to newspapers, magazines, television and radio stations, and news services worldwide. It rates top news sites, has a site of the week, on-line news research, newspaper headline gaffes, and more.

   Online Newspaper Services Resource Directory
http://www.mediainfo.com/
This site lists newspaper publishers with online services in operation or under development, as well as resources of interest to the newspaper new media community.

   New York Times
http://www.nytimes.com/
Most items in the New York Times are available on the Web, though with different headlines.

   Washington Post
http://www.washingtonpost.com/
This is a great site for up-to-date information, particularly on news coming out of the US capitol. Current articles are available free, and the archives are searchable, with articles available for a small fee.

   CNN
http://www.cnn.com
As befits a world-wide cable news organization, this site has news in print, as well as a variety of audio and video formats to meet the needs of almost everyone.

IX. 8 Steps for Doing Research on the Internet

In conclusion, here are 8 steps to successfully finding what you want on the Internet.

  1. Define your goals.
Analyze your information-finding project. Break it down into its component parts. Determine why you need this information and what you plan to do with it. This will make your information search clear to you and easier to conduct.

  2. Determine the types of information you need to answer your question.
Knowing the kinds of information you want will help you choose the resources and tools that will best meet your needs. Are you looking for statistics, in-depth books, magazine articles, expert comments, product reviews, etc.?

  3. Identify keywords, phrases, and subject categories to use in your search.
Play with your search terms, use synonyms, distinctive terms, alternative spellings (the Internet is international). Search by topic, company, product, person's name, etc.

  4. Read the instructions and tips and techniques for using the search tools you use.
Understand the types of information each tool covers and the kinds of search options you have. Read FAQs and welcome instructions for interest groups before posting.

  5. Use more than one source and search tool.
Although there is duplication among sources on sites, queries performed on different search engines usually produce different results. Try specialized sites. Use search tools that search not only the WWW, but interest groups, ftp, gopher, and other sites. Use both keyword and subject guides.

  6. Practice Netiquette.
Be considerate of others in doing your research. Not just interest groups, but paying attention to time of day and length of time on a service; logging off. Filling out registration and comment forms; viewing sponsors information -- will keep information online.

  7. Review your progress.
Look at some of the most promising records and see if there are other terms that you can use to sharpen or widen your search. Compare what you've learned with what you decided you wanted to learn in step 1. Make adjustments or redirect your focus if necessary.

  8. Practice critical thinking!
Evaluate the information you find. Don't accept what you read as the truth: get confirming sources, ask questions, talk to experts, probe for motivations, and use your intuition -- if something just doesn't sound right, check it out further.

Return to InfoQuest! Information Services Home Page
Copyright 1995-2001InfoQuest! Information Services
Last updated: September 21, 2001
Please send any comments to Terry Brainerd Chadwick at tbchad@tbchad.com or 1-503-228-4023.

URL= http://www.tbchad.com/resrch.html