Skip to main content

Posts

Showing posts from August, 2012

Online Searching

WebCrawler

Brian Pinkerton of the University of Washington released  WebCrawler  on April 20, 1994. It was the first crawler which indexed entire pages. Soon it became so popular that during daytime hours it could not be used. AOL eventually purchased WebCrawler and ran it on their network. Then in 1997, Excite bought out WebCrawler, and AOL began using Excite to power its NetFind. WebCrawler opened the door for many other services to follow suit. Within 1 year of its debuted came Lycos, Infoseek, and OpenText.

Parts of a Search Engine:

Search engines consist of 3 main parts. Search engine  spiders  follow links on the web to request pages that are either not yet indexed or have been updated since they were last indexed. These pages are crawled and are added to the search engine  index  (also known as the catalog). When you search using a major search engine you are not actually searching the web, but are searching a slightly outdated index of content which roughly represents the content of the web. The third part of a search engine is the  search interface and relevancy software . For each search query search engines typically do most or all of the following Accept the user inputted query, checking to match any advanced syntax and checking to see if the query is misspelled to recommend more popular or correct spelling variations. Check to see if the query is relevant to other vertical search databases (such as news search or product search) and place relevant links to a few items from th...

What is a Bot?

Computer robots are simply programs that automate repetitive tasks at speeds impossible for humans to reproduce. The term bot on the internet is usually used to describe anything that interfaces with the user or that collects data. Search engines use "spiders" which search (or spider) the web for information. They are software programs which request pages much like regular browsers do. In addition to reading the contents of pages for indexing spiders also record links. Link citations can be used as a proxy for editorial trust. Link anchor text may help describe what a page is about. Link co citation data may be used to help determine what topical communities a page or website exist in. Additionally links are stored to help search engines discover new documents to later crawl. Another bot example could be Chatterbots, which are resource heavy on a specific topic. These bots attempt to act like a human and communicate with humans on said topic.

How Web Search Engines Work

A search engine operates in the following order: Web crawling Indexing Searching Web search engines work by storing information about many web pages, which they retrieve from the  HTML  itself. These pages are retrieved by a  Web crawler  (sometimes also known as a spider) — an automated Web browser which follows every link on the site. Exclusions can be made by the use of  robots.txt . The contents of each page are then analyzed to determine how it should be  indexed  (for example, words can be extracted from the titles,page content, headings, or special fields called  meta tags ). Data about web pages are stored in an index database for use in later queries. A query can be a single word. The purpose of an index is to allow information to be found as quickly as possible. Some search engines, such as  Google , store all or part of the source page (referred to as a cache ) as well as information about the web pages, whereas others, s...

17 SEO Tips for Small Businesses & Entrepreneurs

Short but Effective Titles Top search engines rank results for the words shown in their  title tag.  The title is often referred to as the most important element of SEO, but that’s only true if it matches the content. A good title should contain your item of interest, followed by your website or business name, making sure it is fewer than 10 words. Keyword Selection & Density Selecting the  right keywords for your website  is an important part of the SEO process. The most successful keywords are  two to four word phrases  that people might realistically type into a search engine. Though it should not be a huge concern, the ideal  keyword density  is about 7% throughout your content. Simply write your content and include keywords meant for human  consumption. Keywords are important because they will dictate the overall focus of the page itself. Detailed Meta Description Tags While ‘Meta Keyword’ tags are a  topic of de...