Clustering may refer to the coordination of two or more computer systems or multiple servers for the purpose of handling variable workloads and ensuring continued operation in the event that one fails. It may also refer to data clustering, a data analysis technique that divides a data set into subsets whose elements share similar characteristics.
The objective of search result clustering is to alter the manner in which users conduct online searches by organizing search results into folders that group similar items together.
Why Clustering is Important
Unless an efficient method of organization is provided, the vast amount of online information cannot be used to its full potential. Clustering engines group search results based on linguistic and textual similarity. This fundamental similarity is supported by heuristics that are coded by programmers using users’ preferences regarding what they want to see on clustered documents as their basis. Clusters are presented using a folder and subfolder structure.
When a search engine returns millions of results for a particular query, the user can either sift through the countless pages of results or rely on the search engine’s determination of the most relevant results. Neither can guarantee that the desired information is accessible, as it may be buried within pages of results or fail to meet the search engine’s criteria. In the same way that everything else is clustered or organized, the world of web searching would become more useful if search results were organized.
Clustering engines automatically group search results into categories that have been intelligently determined based on the words and phrases found within the search results. Categories are designed to achieve human-level precision and provide hierarchical drill-down capability in a familiar folder-style interface. Mind-numbing lists do not need to be scrolled through or disregarded because the primary themes are displayed in the first 300 to 500 results on the first page. A brief summary of the types of information available on a particular topic is provided so that the area of interest can be brought into sharp focus immediately.
With the vast improvement in search engines’ ability to return a large number of relevant results, it became more difficult to navigate through the results in a meaningful manner. A typical searcher does not view results beyond the first page, making it highly likely that he or she will overlook results that would have been relevant and useful to his or her search or query. Clusters allow results on the tenth page to be accessible with a single click. Additionally, related objects can be viewed together without much effort. Even unexpected relationships between words, ideas, and concepts are revealed.
A cluster is considered to be of high quality if its description is readable. It should be able to aid in refining a search to obtain precise results. Multiple search engines are queried by a clustering engine, and the results are combined and displayed on a single screen. Each result list includes information about the total number of clustered and retrieved results. Favored pages will be determined by the clustering engine’s own heuristics. Search engines occasionally return multiple copies of the same page with slightly different URLs, but search result clustering minimizes this. This is due to the fact that clustering engines do not reproduce similar results. Clusters are sufficiently specific that repeated documents are extremely uncommon. Some are able to provide advanced search capabilities, allowing users to specify the sources to be searched, the number of results desired, the waiting time allowed, the language to be used, and the filtering out of offensive content.
Search Engines that Group Results
Google Sets does not return results, but rather helps find terms that are similar to those entered. This enables the user to create more complex queries in one area and to consider alternative ways to construct a search. Google Sets is the clustering agent for Google Labs.
Wisenut is a full-text search engine that provides related topics in addition to the number of search results for any query. The name for this is the WiseGuide. Some results will include subtopics that appear beneath the clustered results. There is a link next to each clustered result whose keywords can be used to conduct a new search. In addition to the page results, a distinct set of clustered results will also be generated. This search engine has recently been acquired by LookSmart.
Teoma has been dubbed the “Google Killer” because its clustering technology is so innovative. A single search will yield four distinct sets of results. Top left are sponsored results, bottom left are non-sponsored website results, top right are suggestions for refining the result, and bottom right are link calculations from experts and enthusiasts. The link collections are appropriate for general information needs, whereas the suggestions are intended for more specific queries. A click on any link will cause the search to be rerun with a new set of site results displayed. AskJeeves has acquired the Teoma domain.
Infonetware.com is a demonstration of the company’s Real Term Technology rather than a search engine. The results page is framed so that the left frame contains topics related to the search term, while the right frame contains the web page search results. It supports comprehensive searching.
Oingo’s search source is the Open Directory Project. The search results page includes a drop-down menu of possible definitions. The list of categories in order of relevance to the search can be found beneath it, along with the directory’s site results. It is more helpful for searches involving broad categories or general terms.
Vivisimo is a meta-search engine that organizes its results into groups. It provides a very straightforward home page with organized search results. The page layout makes it simple to navigate multiple categories without “losing your place.” Clusty is a consumer search engine owned and operated by Vivisimo. It queries Ask, MSN, Open Directory, LookSmart, Gigablast, and WiseNut for results. These websites were chosen due to their precise results and rapid response times.
On the left side of the homepage, Query Server provides various search options. Each search has a nearly identical interface and clustered results. The search results are displayed in a frame on the site’s right side.
Surfwax offers both paid and complimentary services. After entering a search, the upper left corner will display a focus link. These key phrases may be used in conjunction with the search term. They are separated into narrower or broader categories and contain generic terms rather than links to specific individuals or locations.
For Northern Light News searches to be clustered into folders, a minimum number of results is required. However, folder listing does not provide information regarding the contents of a specific folder, although subfolders for broad topics are provided. The search results are sorted by date.
Clustering search engines divide several hundred results into manageable chunks. Suggestions are provided to maximize the use of information and make the search process much simpler. A search query cannot always be precise enough to immediately target the desired information.