Protean Career: Basic concepts in ESP

In my previous article, I talked about How to install FAST ESP in Windows and Hello World! -- Set up a test search in ESP. In this article, I'm gonna talk about some basic concepts in ESP.
-----------------------------------------------------------
Basic Concepts
The most of following concepts are from the product overview.

Concept	Description
Document set description, indexes
Applies algorithms or business rule-based ranking to the results
Data Flow Overview
Module Overview	Talking about a general ideas about all ESP modules
Basic Concepts	Talking about the concepts in ESP
Content	Data that has not submitted to the FAST ESP system
Document	Processed, searchable content are called Document
Collections	Documents are grouped into different collections. Each collection can have its own processed and indexed way (Index Profile). Also, by setting priority for each collection, we can specify the order of document processing.
Search Profile	Define what to search and how the queries and results should be processed and displayed
Document and Document Element	One content will be converted into a document. Each property of the content will be converted into a document element.
Index Schedule, Profile	FAST Search Engine maps the document's elements to fields. Fields are defined document elements that are to be searchable. Fields can be defined by Index Profile. Multiple fields may be grouped into composite fields, allowing a query to be executed on several fields at the same time.
Enterprise Crawler	Use Enterprise Crawler to access content on Web Site(s)
File Traverser	The file traverser scans specified file directories of file servers.
Pushing Content to Search Engine Using Content API	Use the content API directly to push the content to Search Engine.
Query Side	Three ways to query the result: Search API, HTTP-based Query Interface, FAST Web Service Interface
Content Interface	Integrate of application via C++, Java, .NET
Search Interface
Document Processing Interface	inclusion of customer-defined document processors
Query/Result Processing Interface	provides an interface for dynamic linking of custom query and result processors
Administration Interface	supports API integration for system administration and collection configuration
Security Integration	Security Access Module provides document-level security capabilities for integration with your content and portal infrastructure
SDKs	ESP Content SDK, Search SDK, and Application SDK provide various interfacing capabilities.
Web Service Interface	Web services are a collection of standards and protocols that allow computers to communicate across the internet using XML and the ubiquitous HTTP protocol
Document processing is defined per collection
Document Processing Engine, Pipeline, Stage	One search engine contains multiple pipelines, but one collection can only have one pipeline. One pipeline contains multiple stages. One stage performs a particular document processing task. It takes one or more document elements to be input and the resulting output is new or modified elements that may be further processed
Entity Extraction	Entity extraction is detecting, extracting, and normalizing entities from documents
Extract other entities	Two ways to extract other entities: Using Admin UI to specify additional extractor Via a regular expression document processor
Search Engine Clusters	Search Engine instances are grouped into search engine clusters. A search engine cluster is a group of Search Engine instances that share the same index schema, which is provided by an index profile.
Search Columns and Rows	Sets of indexed documents are stored in all search engine instances within a search column to scale data volume. That means each node in a search rows share the same set of indexed documents. When a query is sent to a cluster, it will be sent to all search engine instances within a search row to scale query rate.
Index profile	An index profile is an XML-based configuration file. It’s an index schema that defines the way documents are searchable. It specifies search properties like: Which document elements are to become searchable fields Which document elements are to become fields that are returned as part of a result How to calculate values that are used for sorting and ranking
The relationship between Document Processing, Indexing and Search Engine Clusters
Index Profile Structure
Scope Search	Used for Indexing customer XML content without any knowledge of the DTD/Schema. Indexing a more dynamic field structure using the Scope Search framework.
Relevancy, Data mining
Linguistic processing
Sorting
Rank value calculations
Query context analysis
Navigation
Contextual Insight
Ranking Concept
Quality
Freshness Boosting
WebAnalyzer	The WebAnalyzer is a FAST ESP module that uses links between documents to improve search relevancy
Tools to modify rank for individual documents	Two tools for modifying rank Search Business Center Boost Bulk Tool
3 boost mechanisms	Absolute Query Boost: Specify an absolute ranking position for a document against a specified query. Or exclude displaying a document against a specified query. Relative Query Boost: Ensure a document is always displayed in first xx (a number) result list against a specified query. Relative Document Boost: Ensure a document is always displayed in first xx (a number) result list whatever user submitted.
Proximity Ranking and Matching	The term proximity denotes the degree to which a query and a document match, based on the distance between the query terms within a document. Two types of proximity: Explicit Proximity Implicit Proximity
Field Collapsing	Two kinds of field collapsing Field collapsing which removes collapsed documents Field collapsing which does not remove collapsed document (default)
Boundary Matching
Duplicate Removal	Different ways of detecting and removing duplicate documents. Crawler Duplicate Removal (The FAST Crawler) Dynamic (Result-Side) Duplicate Removal (may be used to detect and remove duplicates across collections, and also enable a more flexible definition of perceived duplicates) Field Collapsing
GEO Search Overview	The Geo Search feature provides capabilities for filtering, sorting and boosting query results based on geographical location.
Query Modifications	Query processing is configured globally and three ways to modify a query in FAST ESP As an automatic rewrite of the query before execution against the index As a suggested rewrite, typically presented as a search tip on the result page A combination of the two above: The query is first executed in its original form. In case of no hits, the query is automatically resubmitted using the automatic rewrite option, and the new result is presented to the user
Query Resubmission	The resubmission is set per query and used to switch to suggested transformation of the user’s query. There are three kinds of query transformation. Modify: Automatically modified. The modified query is executed and the result set is returned Conditional Modify: Automatically modified only if no hits are returned by the executed query Suggest: Never modified. But a suggested transformed query is returned together with the result set.
FAST Query Language (FQL)
Navigator	Navigators provide functionality for drilling down into the query results based on value distribution of one or more individual fields.
Field Navigator
Deep Navigator
Shallow Navigator
Scope Navigator
Contextual Navigator
Field Navigators for Values in Scope Fields
Taxonomy
FAST Classifier
Unsupervised Clustering