ElasticSearch – ElasticSearch Overview

Search Engines data model roots lies with schema free and document oriented databases, and as shown by the #nosql movement, this model proves to be very effective for building applications.

Elastic Search model is JSON, which slowly emerges as the de-facto standard for representing data these days. More over, with JSON, it is simple to provide semi-structured data with complex entities as well as being programming language natural with first level parsers.


Auto-Suggest From Popular Queries Using EdgeNGrams

A popular feature of most modern search applications is the auto-suggest or auto-complete feature where, as a user types their query into a text box, suggestions of popular queries are presented. As each additional character is typed in by the user the list of suggestions is refined. There are several different approaches in Solr to provide this functionality, but we will be looking at an approach that involves using EdgeNGrams as part of the analysis chain.


What’s new with Apache Solr

Apache Solr has added many new features and performance improvements since the Search smarter with Apache Solr series was published. In this article, Solr and Lucene committer Grant Ingersoll details the improvements in Solr 1.3, including distributed search, easy database imports, integrated spell checking, new extension APIs, and much more.


Pagerank Explained.

PageRank is Google’s way of deciding a page’s importance. It matters because it is one of the factors that determines a page’s ranking in the search results. It isn’t the only factor that Google uses to rank pages, but it is an important one.

This article goes into details of how the pagerank is generated.


Sphinx – Free open-source SQL full-text search engine

Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.

Generally, it’s a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL, or from an XML pipe.


sitemaps.org

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.