|
Posted by Chung Leong on 08/22/06 05:24
Here's the rest of the tutorial I started earlier:
Aside from text within a document, Indexing Service let you search on
meta information stored in the files. For example, MusicArtist and
MusicAlbum let you find MP3 and other music files based on the singer
and album name; DocAuthor let you find Office documents created by a
certain user; DocAppName let you find files of a particular program,
and so on.
Indexing Service uses plug-ins known as iFilters to extract information
from files it indexes. A default installation of Windows has iFilters
for many common file formats like HTML, Word, PowerPoint, and Excel.
You can extend Indexing Service's capability by installing additional
iFilters. Many are listed at http://www.ifilter.org/, with support
available for PDF, Photoshop, ZIP, Visio, Open Office, and others.
In the previous example, we used CONTAINS(Contents, '$keyword') to
search for a particular key word. Only files containing that exact word
would be returned. If $keyword is 'date,' then Indexing Service would
find those files with the word "date" but not those containing 'dates.'
To relax the criteria somewhat, we can use the FORMSOF (INFLECTIONAL,
<word>) construct. Example:
$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, date)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);
Now Indexing Service will look for all the inflected forms of the word:
date, dates, dating, dated, etc. If the word specified is "good," then
it'd look for good, better, best, and well.
To search on a partial word, we use the * sign:
$keyword = ' "kn*" ';
The double-quotation marks indicate a wild-card search. The above
pattern means any word starting with "kn" is considered a match.
Indexing Service also supports the use of the <field> LIKE '%pattern%'
and <field> = 'value' SQL expressions. They are best avoided, however,
as they can be incredible slow: Matching against the value of a field
often means reading from the files.
To sort the results, we add an ORDER BY clause:
$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, good)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')
ORDER BY size DESC";
$res = oledb_query($sql, $link);
The above example list the files found from the biggest to the
smallest. "ORDER BY write DESC" would list the more recently modified
files first, while "ORDER BY create DESC" list first the ones more
recently created. You can, of course, also use these file attributes as
search criteria.
Thus far we have been searching on the computer's default catalog. If
searching will be done only in a particular folder, it's worthwhile to
create a separate catalog. You can do this in the Computer Management
console. To search different catalog to OLE-DB, you specify the catalog
name in the connection string as the data source::
$link = oledb_open("Provider=MSIDXS; Data Source=web_cat");
Finally, what if you want to search files residing on a network server?
While it's possible to index a network drive, it's not terribly
efficient. Instead, you'd want to enable Indexing Service on that
computer and perform the search there.
To search a remote catalog, we prepend the SCOPE() statement with the
computer name and the catalog name:
$dir = '\\fileserver\projects'
$keyword = 'FORMSOF (INFLECTIONAL, bad)';
$sql = "SELECT filename, size, path
FROM fileserver.System..SCOPE('DEEP TRAVERSAL OF
\"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);
Note that the double period is not a typo. Windows Authentication is
used to determine what files are visible. For the code above to work
the web server has to run as a user on the network.
Navigation:
[Reply to this message]
|