|
Boolean Operators | Other Choices |
Controlled Vocabulary | The Bottom Line |
The capabilities of many Internet searchers rest on Boolean principles and both logical and positional operators may come into play. Logical operators are: and, or, not, and sometimes xor. And serves as the term of logical conjunction; or serves as the term of logical inclusion; not serves as the term of logical negation. xor includes all items with one term but not items with both terms. In some search systems, particularly on the Internet, these operators will be expressed by symbols: | for or; & for and; &! for not. Other search engines may express the operators in words: all these terms; any of these terms. Still others may have special rules which govern how the operators are delineated in your search: only in upper case letters; or enclosed in single or double quotes. The user should determine this prior to beginning a search.
Logical operators are governed by algebraic rules, holding that searches will occur in a set pattern and return particular results. For example, just like in algebra, the operations inside parentheses (or in some searchers, quotation marks) will be dealt with first, followed by not statements, then by and statements, and then by or statements. Thus, according to boolean logic, the search: water and pollution or air and pollution is equivalent to (water and pollution) or (air and pollution) and is also equivalent to (water or air) and pollution. Logical operators cause searches on terms to return particular items according to the logical relationship of those terms.
Positional (or proximity) operators include: adj, near, and foll. Positional operators cause searches on terms to return items in which the terms have a particular positional relationship to each other. The operator adj between two search terms restricts the items returned to those in which the search terms appear immediately adjacent to each other. The operator near restricts the items returned to those in which the terms appear in any order but appear with only a specified number of terms separating them and in some systems can be accompanied by a number to specify the number of terms that is acceptable. The operator foll (followed by) generally restricts the order of the items and requires their immediate adjacency; in some search engines, adj is equivalent to foll. Two other positional operators, with and same, are often employed by specific database searchers. with between two search terms may restrict the items returned to those in which the search terms appear in the same sentence, while same may cause the items returned to appear in the same field.
Internet searchers may use these operators, or variants of these operators, as well. The Twitter® search engine uses field searching (from, to, filter:links), the boolean operator OR, and the positional operators near and within (for instance, sent near:san diego within:15 mi)
Which of these operators are active and exactly how each should be employed varies from search engine to search engine. Order may be very important as well. For instance, in Elsevier Science Direct®, the order of the search works:
The default operator also changes from resource to resource and from searcher to searcher. This will affect how your search behaves after you enter it. Not knowing the default operator may mean that you will not get the results you expect or the information that you need even though it is actually available.
Most searchers utilize fields to search or sort information. Internet searchers might use fields such as the url, meta tags, title tags, and the like, while database searchers are more likely to use fields such as author, title, and subject.
Care should be taken with regard to truncation or stem searching. Unless you construct your search carefully, stem searching may occur without your realizing it. For example, a nested search on (serial murderers) may return the television series murder she wrote. In small databases, the difference may be fifty or sixty items; on the internet this could increase your results list by thousands. As well, both left and right hand truncation may occur without your realizing it. Unless you know what to look for, you may be puzzled that your search on edi journals (edi stands for electronic data interchange, a hot topic in electronic commerce), returns medical journals and edited journals. Google®, for instance, uses stem technology by default. A search on "dietary needs" will also produce similar and/or related words (such as diets). Some searchers will allow you control the search to your specifications, entering, for example, .edi. journals or a similar stop character to restrict your returns.
Some web site creators attempt to skew your results by adding pages of words to their coding to increase the number of hits on their sites and to fool automated searchers into ranking their sites higher in the items returned. Many searchers use some form of relevance, basing part of the ranking of the returned items on, for example, the 100 most weighty words in a site. Slightly dishonest, but creative, folks may add the words Christmas, sex, holiday, Santa Claus and cartoons to their home pages, thus increasing the number of hits on their pages and giving themselves greater standing with those who believe quantity and quality are synonymous (at least, until someone is bright enough to check the source code!). Sometimes, the number of sites which link to a particular page will influence the ranking. In other cases, searchers have reportedly sold the rights to certain terms, helping sites to ensure a high ranking when a particular term is searched.
As mentioned above, many searchers use some form of relevance searching; others allow natural language searching. Relevance searching ranks the items returned based on a combination of determinants such as: the number of words in the item; the number of times the search terms appear in the item; and the prominence of the search terms in the item (using the h1 to h6 tags in the coding, for instance). Natural language searching supposedly allows you to query the search engine more easily by asking, for example, Tell me about the history of the Battle of Gettysburg -- from which the search engine is supposed to extract the relevant phrase and return items containing it. However, natural language searchers may base returns on the parts of speech, ranking, for instance, a noun higher than an adjective higher than a dependent clause. Thus, in our example, you may get a lot of high ranked items about history. Not very helpful.
Constructing good searches takes time and practice. It may not seem very important at first, but a well constructed search can save you hours of frustration -- especially when you begin searching large databases on the internet or the internet itself. When searching, you should consider the controlled vocabulary, or lack thereof, the active operators, and special functions and characters which may affect a search.
A controlled vocabulary is made up of the words and terms used to describe the contents of items in a bibliographic database. It offers the user some assurance that she has found most of the possible items, relevant items. In many North American university library catalogues, the Library of Congress Subject Headings are used. Many commercial bibliographic databases, such as CINAHL, Sociological Abstracts, etc., have their own controlled vocabularies, usually generically known as "thesauri". Usually they can be accessed right from the resource. The internet has no controlled vocabulary.
Internet searchers can only do what they are told. You are the only available intelligent searcher.
This page copyright, created and maintained by Linda Hansen.
Comments and suggestions to: lhansen16@gmail.com
Created: 1996/12/09 Last updated: 2010/08/19
Terms of Copyright
This document: ...search/bool.htm