EPiServer - Simple search and shared blocks

Introduction

From time to time we have to build a simple search solution for our clients.
They usually have a website with up to 10 000 pages, they don't need enterprise search, statistics, and complex queries, they just want something simple to begin with.
The only requirement they have is: it should work like Google!

Well, Google has more than 50 000 full time employees, and their search engine is everything but simple :)

While EPiFind is excellent enterprise search product, not all clients are willing to purchase the license / hosting.

Standard EPiSearch is free, but it cannot index the content of shared blocks.

For that reason, I created a simple search engine based on Lucene.NET that can be used in non-demanding EPiServer search solutions. Since it uses Lucene.NET under the hood, it's 100% free, and requires no maintenance.

It listents to PublishedContent event, and every time the page or block are published, it will re-index all pages that are affected by this change.

The source code is available on the following link: https://github.com/dejancaric/EPi.SimpleSearch

How it works

SimpleSearch uses IContentEvents.PublishedContent event to detect when pages / shared blocks have been published, as well as all pages that are affected by this change.

It then uses the web crawler to parse the actual HTML (not just searchable properties), and re-index affected pages.

SimpleSearch comes with a scheduled job which should be started manually only once, since re-indexing will be triggered automatically.

Page types

Usually we don't want to index all page types. For example, Thank you pages, listing pages, etc. should not be indexed.

All other page types should implement ISearchablePage interface:

[ContentType(GUID = "9CCC8A41-5C8C-4BE0-8E73-520FF3DE8267")]
public class MyPage : PageData, ISearchablePage
{
    [Display(
        GroupName = SystemTabNames.Content,
        Order = 100)]
    public virtual bool ExcludeFromSearch { get; set; }
}

ExcludeFromSearch property allows the editors to manually exclude certain pages.

Views

What about Html? We usually don't want to index navigation, header, footer, aside content, etc. They are just the "noise". The web crawler will skip all elements that contain data-nosearch attribute.

Let's say we have the following HTML snippet:

<div id="1">
    <p>DIV 1</p>
</div>
<div id="2">
    <p>DIV 2</p>
</div>

If we want to skip the first DIV, all we have to do is to add data-nosearch attribute:

<div id="1" data-nosearch="">
    <p>DIV 1</p>
</div>
<div id="2">
    <p>DIV 2</p>
</div>

API

Simple pagination:

var searchService = ServiceLocator.Current.GetInstance<ISearchService>();
var result = searchService.Search(new SearchRequest
{
    PageNumber = 1,
    PageSize = 5,
    Text = "alloy"
});

Filter by page type:

var searchService = ServiceLocator.Current.GetInstance<ISearchService>();
var result = searchService.Search(new SearchRequest
{
    PageNumber = 1,
    PageSize = 5,
    Text = "alloy",
    FilteredTypes = new[] { typeof(ArticlePage), typeof(ProductPage) }
});

Autocomplete:

var searchService = ServiceLocator.Current.GetInstance<ISearchService>();
var result = searchService.AutocompleteSearch("al", 10);

comments powered by Disqus