Shoji KUMAGAI  committed b780550

2010.10.18 shkumagai Add daytime works result.

  • Participants
  • Parent commits 061663a

Comments (0)

Files changed (1)

File source/chapter01/an_example_information_retrieval_problem.rst

    at least as quickly as the speed of computers, and we would now like to be able to search
    collections that total in the order of billions to trillions of words.
-  1. 巨大名ドキュメントの集まりを素早く処理すること。
-     オンラインデータの量は少なくともコンピュータの速度と同じくらいの早さで成長しており、
-     今日私たちは数十億から数兆語という規模の集合を検索出来るようにしようとしています。
+1. 巨大なドキュメントの集まりを素早く処理すること。\
+   オンラインデータの量は少なくともコンピュータの速度と同じくらいの早さで成長しており、\
+   今日私たちは数十億から数兆語という規模の集合を検索出来るようにしようとしています。
 .. 2. To allow more flexible matching operations. For example, it is impractical to perform
    the query Romans NEAR countrymen with grep, where NEAR might be defined as “within 5 words”
    or “within the same sentence”.
+2. 柔軟なマッチング操作を可能にすること。\
+   例えば、 *NEAR* という演算子が "5単語以内" と定義されているかも知れないし、"同一文内" と\
+   定義されているかも知れないので、grepで "Romans NEAR countryman" という問い合わせを\
+   実行するのは非実用的です。
 .. 3. To allow ranked retrieval: in many cases you want the best answer to an information need
    among many documents that contain certain words.
+3. ランク付けされた検索が可能であること: 多くの場合、ある単語を含む大量の文書の中で\
+   必要な情報に対する最適解が求められます。
 .. The way to avoid linearly scanning the texts for each query is to index the documents in
-   advance. Let us stick with Shakespeares Collected Works, and use it to introduce the basics
+   advance. Let us stick with Shakespeare's Collected Works, and use it to introduce the basics
    of the Boolean retrieval model. Suppose we record for each document – here a play of
-   Shakespeares – whether it contains each word out of all the words Shakespeare used (Shakespeare
+   Shakespeare's – whether it contains each word out of all the words Shakespeare used (Shakespeare
    used about 32,000 different words). The result is a binary term-document incidence matrix,
    as in Figure 1.1. Terms are the indexed units (further discussed in Section 2.2); they are
    usually words, and for the moment you can think of them as words, but the information retrieval
    or columns, we can have a vector for each term, which shows the documents it appears in,
    or a vector for each document, showing the terms that occur in it. [2]_
+文脈ではI-9やHong Kongのように、通常では単語と考えないようなものを *ターム* と表現します。
 .. Figure 1.1
    A term-document incidence matrix. Matrix element (t, d) is 1 if the play in column d contains
    the word in row t, and is 0 otherwise.