-----------------------------------------------------------
Basic Concepts
The most of following concepts are from the product overview.
Concept
|
Description
|
Document set description, indexes
| |
Applies algorithms or business rule-based ranking to the results
| |
Data Flow Overview
| |
Module Overview
|
Talking about a general ideas about all ESP modules
|
Basic Concepts
|
Talking about the concepts in ESP
|
Content
|
Data that has not submitted to the FAST ESP system
|
Document
|
Processed, searchable content are called Document
|
Collections
|
Documents are grouped into different collections. Each collection can have its own processed and indexed way (Index Profile). Also, by setting priority for each collection, we can specify the order of document processing.
|
Search Profile
|
Define what to search and how the queries and results should be processed and displayed
|
Document and Document Element
|
One content will be converted into a document. Each property of the content will be converted into a document element.
|
Index Schedule, Profile
|
FAST Search Engine maps the document's elements to fields. Fields are defined document elements that are to be searchable. Fields can be defined by Index Profile. Multiple fields may be grouped into composite fields, allowing a query to be executed on several fields at the same time.
|
Enterprise Crawler
|
Use Enterprise Crawler to access content on Web Site(s)
|
File Traverser
|
The file traverser scans specified file directories of file servers.
|
Pushing Content to Search Engine Using Content API
|
Use the content API directly to push the content to Search Engine.
|
Query Side
|
Three ways to query the result: Search API, HTTP-based Query Interface, FAST Web Service Interface
|
Content Interface
|
Integrate of application via C++, Java, .NET
|
Search Interface
| |
Document Processing Interface
|
inclusion of customer-defined document processors
|
Query/Result Processing Interface
|
provides an interface for dynamic linking of custom query and result processors
|
Administration Interface
|
supports API integration for system administration and collection configuration
|
Security Integration
|
Security Access Module provides document-level security capabilities for integration with your content and
portal infrastructure
|
SDKs
|
ESP Content SDK, Search SDK, and Application SDK provide various interfacing capabilities.
|
Web Service Interface
|
Web services are a collection of standards and protocols that allow computers to communicate across the
internet using XML and the ubiquitous HTTP protocol
|
Document processing is defined per collection
| |
Document Processing Engine, Pipeline, Stage
|
One search engine contains multiple pipelines, but one collection can only have one pipeline. One pipeline contains multiple stages. One stage performs a particular document processing task. It takes one or more document elements to be input and the resulting output is new or modified elements that may be further processed
|
Entity Extraction
|
Entity extraction is detecting, extracting, and normalizing
entities from documents
|
Extract other entities
|
Two ways to extract other entities:
Using Admin UI to specify additional extractor
Via a regular expression document processor
|
Search Engine Clusters
|
Search Engine instances are grouped into search engine clusters. A search engine cluster is a group of
Search Engine instances that share the same index schema, which is provided by an index profile.
|
Search Columns and Rows
|
Sets of indexed documents are stored in all search engine instances within a search column to scale data volume. That means each node in a search rows share the same set of indexed documents. When a query is sent to a cluster, it will be sent to all search engine instances within a search row to scale query rate.
|
Index profile
|
An index profile is an XML-based configuration file. It’s an index schema that defines the way documents are searchable. It specifies search properties like:
Which document elements are to become searchable fields
Which document elements are to become fields that are returned as part of a result
How to calculate values that are used for sorting and ranking
|
The relationship between Document Processing, Indexing and Search Engine Clusters
| |
Index Profile Structure
| |
Scope Search
|
Used for
Indexing customer XML content without any knowledge of the DTD/Schema.
Indexing a more dynamic field structure using the Scope Search framework.
|
Relevancy, Data mining
| |
Linguistic processing
| |
Sorting
| |
Rank value calculations
| |
Query context analysis
| |
Navigation
| |
Contextual Insight
| |
Ranking Concept
| |
Quality
| |
Freshness Boosting
| |
WebAnalyzer
|
The WebAnalyzer is a FAST ESP module that uses links between documents to improve search relevancy
|
Tools to modify rank for individual documents
|
Two tools for modifying rank
Search Business Center
Boost Bulk Tool
|
3 boost mechanisms
|
Absolute Query Boost: Specify an absolute ranking position for a document against a specified query. Or exclude displaying a document against a specified query.
Relative Query Boost: Ensure a document is always displayed in first xx (a number) result list against a specified query.
Relative Document Boost: Ensure a document is always displayed in first xx (a number) result list whatever user submitted.
|
Proximity Ranking and Matching
|
The term proximity denotes the degree to which a query and a document match, based on the distance between the query terms within a document.
Two types of proximity:
Explicit Proximity
Implicit Proximity
|
Field Collapsing
|
Two kinds of field collapsing
Field collapsing which removes collapsed documents
Field collapsing which does not remove collapsed document (default)
|
Boundary Matching
| |
Duplicate Removal
|
Different ways of detecting and removing duplicate documents.
Crawler Duplicate Removal (The FAST Crawler)
Dynamic (Result-Side) Duplicate Removal (may be used to detect and remove duplicates across collections, and also enable a more flexible definition of perceived duplicates)
Field Collapsing
|
GEO Search Overview
|
The Geo Search feature provides capabilities for filtering, sorting and boosting query results based on geographical location.
|
Query Modifications
|
Query processing is configured globally and three ways to modify a query in FAST ESP
As an automatic rewrite of the query before execution against the index
As a suggested rewrite, typically presented as a search tip on the result page
A combination of the two above: The query is first executed in its original form. In case of no hits, the query is automatically resubmitted using the automatic rewrite option, and the new result is presented to the user
|
Query Resubmission
|
The resubmission is set per query and used to switch to suggested transformation of the user’s query. There are three kinds of query transformation.
Modify: Automatically modified. The modified query is executed and the result set is returned
Conditional Modify: Automatically modified only if no hits are returned by the executed query
Suggest: Never modified. But a suggested transformed query is returned together with the result set.
|
FAST Query Language (FQL)
| |
Navigator
|
Navigators provide functionality for drilling down into the query results based on value distribution of one or more individual fields.
|
Field Navigator
| |
Deep Navigator
| |
Shallow Navigator
| |
Scope Navigator
| |
Contextual Navigator
| |
Field Navigators for Values in Scope Fields
| |
Taxonomy
| |
FAST Classifier
| |
Unsupervised Clustering
|
Data Flow
In this section, I'm gonna talk about the data flow. The first one is about how the ESP crawl data.
The second one is about how the ESP handle user search.