SharePoint Blogs / SharePoint University
SharePoint Blogs and SharePoint University - all in one place!
Need SharePoint Training? Attend a SharePoint Bootcamp!

Please delete cookies related to sharepointblogs.com and sharepointu.com to resolve login issues!

MOSS Search Results Can Be Near Real Time

Well, I'm not sure how many times I can mention KnowledgeLake and "Transactional Content Management" without getting flogged by the blog hosts for peddling our wares again... but here I go again.

So once again, I'll set the stage with the world I work in every day.  KL is all about facilitating document processing all the way from paper to grave.  By grave I mean the end of a document lifecycle.  So after KL Capture Server blasts a batch of documents into SharePoint we often take advantage of some form of workflow to kick off additional document/account processing. 

For example, imagine a lending branch scanning in and releasing a series of documents related to a loan application.  Upon receipt of the actual application document a workflow might be initiated.  Here's where it gets interesting.  During loan application processing there might be several approval steps that are based on peripheral documents such as income statements and/or loan collateral documentation.  If the institution is processing many loans per day, they don't have time to wait around for an incremental crawl to take an hour or sometimes even 15 minutes.

So what can we do to really tighten down search result availability?  Well in this type of environment I would architect the farm a certain way and setup the incremental crawl for the content source to fire literally every minute.  So the information below outlines how I would configure the farm to squeeze the absolute most performance out of crawl processing.

Implementation:
  • The farm should include a separate (and beefy) machine for Index Server.  I recommend a box with at MINIMUM of 4 (64bit) CPU cores 16GB RAM running.  The Query role should not be enabled on this server.  Note that you can't mix 32bit and 64bit WFEs in the farm so if you're running 32bit front ends, stick with 32bit Index Server.
  • In order to get that hefty Index Server to take advantage of available resources we need to force it to use more threads while crawling content.  We can do that using 1 of 2 possible techniques
  • OPTION 1: When configuring the "Office SharePoint Server Search" role on the Index Server, set the Indexer Performance to "Maximum":

image 

  • OPTION 2: We can create a crawler impact rule in Application Management => Manage search service => Crawler Impact Rules => Add Rule
    NOTE: Crawler Impact Rules take precedence over Indexer Performance Settings and since the default simultaneous requests is based on the number of processors on the index server, it's possible that the "Maximum" indexer performance setting could be overridden by the default crawler impact setting (even if no crawler impact rules exist).

image

  • Then, regardless of which option is chosen, we need to set the "Target" Web Front End to be the actual Index Server itself (WFE role must be enabled) or possibly a specific "target" WFE machine would not be used for serving content to end users.

image

  • Finally, we set the incremental crawl schedule to fire in 1 minute increments.  Navigate to the Shared Services Administration page for your SSP.  Then click Search Settings => Content sources and crawl schedules => [Content Source Name].  Then click "Create[/Edit] schedule" under the Incremental Crawl field.  Set the values as identified below and click OK => OK.

image

That should do it.  You've just configured the search service to kick off incremental crawls in 1 minute intervals!  Shortly after an incremental crawl completes, if any changes were made to any of the index files, those changes will be propagated out to the Query (Search) servers.  Once that propagation has been processed, the content will be available for searching!

Monitoring Performance:
  • Keep an eye on the "Manage Content Sources" page in the SSP administration site.  It will tell you the indexing status. 
    • You want to watch the Indexing Status field.  It will cay "Crawling Incremental" when it's crawling.  It should say "Idle" when it is finished crawling.  Refresh often to ensure that at some point during the 1 minute interval it is able to finish the incremental crawl.
    • If Index Status never changes to Idle then unfortunately you don't have the horsepower to maintain a 1 minute incremental crawl interval.  You should increase the interval by 1 minute until you verify that your crawl can complete in the allotted amount of time.
  • Keep an eye on the performance of your Index Server, Target Server (if applicable), and your SQL Server.  If ramping up crawl performance has created an uncomfortable increase on system resource utilization on ANY of these servers, you can either back down the crawl threads (Crawler Impact Rules/Indexer Performance) or you can increase the incremental crawl duration or both.
Additional Points of Interest:
  • There are many factors related to crawl performance.  Everything from how powerful your Index, Target, and SQL Servers are to the I/O performance of the SQL Server databases.  The SSP Search database is particularly vulnerable as it can become very large quickly. 
  • Not all environments are the same.  Your mileage may vary.  For example, KnowledgeLake solutions often revolve in high volumes of TIFF files.  There is no TIFF iFilter available for MOSS out of the box so the "NULL" iFilter is used.  This means that the document metadata is gathered and inserted into the property store in the SSP Search database but the actual binary file doesn't have to be parsed.  So our indexing speed is often much faster.
  • With such a high load created on the Index Server and SQL Server during crawl processing, it's recommended that any Full Crawls be scheduled during off peak times (evenings and weekends, etc).  This is because the Full Crawl will obey the same threading rules used by the incremental crawl.  This could yield a very high level of stress on the SQL Server over an extended period of time.

OK.  That's about all I have to say about that.  Once again, the cool thing about SharePoint is that it is so configurable!  If the changes I specified here don't work for you, please don't flame me :)  !  Just back off of the threading or put the settings back where they started and you'll be just fine.


Posted 02-28-2008 1:58 PM by Russ Houberg

Comments

SharePoint 2007 Link Love: 02-28-2008 at Virtual Generations wrote SharePoint 2007 Link Love: 02-28-2008 at Virtual Generations
on 02-28-2008 4:29 PM

Pingback from  SharePoint 2007 Link Love: 02-28-2008 at  Virtual Generations

Bookmarking the web - w10/2008 wrote Bookmarking the web - w10/2008
on 03-09-2008 11:09 AM

Pingback from  Bookmarking the web - w10/2008

Tim wrote re: MOSS Search Results Can Be Near Real Time
on 08-18-2008 7:54 AM

Is there a way to have a crawl schedule broken down by every 15 minutes instead of every hour only?  I.E., instead of 1:00 AM - 2:00 AM etc., it would be 1:00 AM - 1:15 AM - 1:30 AM - 1:45 AM - 2:00 AM - 2:15 AM etc.

Russ Houberg wrote re: MOSS Search Results Can Be Near Real Time
on 08-19-2008 7:43 AM

The crawl schedule does have a specific start time, in the case of the picture above, it's 12:00am.  But it is the repeat every X minutes that determines the schedule.  

Currently, in the picture above, it's set to 1 minute.  So the crawl will run virtually all the time.  You could easily set this to 15 minutes such that it would kick off in 15 minute intervals.  

You can't control the end time of the crawl.  It ends when it runs out of objects to crawl.

Hope that helps!

Alan Lambkin wrote re: MOSS Search Results Can Be Near Real Time
on 01-05-2009 11:44 AM

It would be really nice if the system could kick off a search every time it completes one. Setting the time accordingly is tough when the amount of content keeps growing. Have you heard any rumors of this feature floating around?

Nice post, Thanks!

Russ Houberg wrote re: MOSS Search Results Can Be Near Real Time
on 01-05-2009 2:40 PM

Well, it sounds like you're talking about the crawl schedule and not a "search schedule".  One thing to note is that the crawler will only run 1 incremental crawl at a time.  

So if you're interested in continuously running an incremental crawl, you could set your crawl repeat schedule to every (1) minutes for 1440 minutes.  That way you're literally crawling every minute of the day to keep your indexes as fresh as possible.  

As long as you're not dive bombing your SQL Server, then there is no reason that this couldn't work.

Lee Goergen wrote re: MOSS Search Results Can Be Near Real Time
on 01-12-2009 3:44 PM

Russ,

I am working at a client site with kl products and I am having trouble - the incremental crawl is not running and I am having a hard time proving it.  Is there anything in the server logs that tells when the gatherer starts and stops.  Have access to server sharepoint and kl logs - not crawl log

Don't have routine access to crawl logs - have to bother admin  - what I have seen so far seems to show not running.

Any help would be wonderful --Lee

... wrote re: MOSS Search Results Can Be Near Real Time
on 03-05-2009 9:25 PM

Interessante Informationen.

... wrote re: MOSS Search Results Can Be Near Real Time
on 03-11-2009 8:44 PM

Gute Arbeit hier! Gute Inhalte.

Steve Garner wrote re: MOSS Search Results Can Be Near Real Time
on 06-01-2009 4:14 PM

Great post, however:

In the Add Crawler Impact Rule dialog above note the instructions include: "Do not include the protocol (for example 'http://')

But the example given contains the protocol:  http://*

So, assuming this works, anyone who follows the directions would never know.  How strange.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Need SharePoint Training? Attend a SharePoint Bootcamp!
Posts (c) their respective authors. Everything else (c) 2009 SharePoint Experts, Inc.