Finding minimal "query term windows"

The textbook talks about using query term proximity (i.e., how closely the query terms are found together in a given document) as a ranking signal, and about generating good result snippets back to the user for the highest-ranked documents. Those problems are related in that we typically want to find the smallest "window" or sequence of text in a document that contains all (or most of) the query terms. The textbook doesn't expand on how to actually compute this, though, and I have received some questions regarding the algorithmic details around finding such "windows". For those interested, this and this might be a good read.

Publisert 8. apr. 2016 10:00