U4-5194 - v7: Backoffice search is imprecise, doesn't handle multiple words correctly

Created by Asbjørn Riis-Knudsen 07 Jul 2014, 20:32:38 Updated by Shannon Deminick 22 Jul 2014, 23:18:42

Relates to: U4-5235

Relates to: U4-5045

Relates to: U4-5167

Most of the time my pages have titles with several words in them. I would expect that search results in the backoffice would narrow down as I type more words from the name. Instead, the opposite happens. I get ''more'' results, not less, as I add words. It seems that the search is performed with a boolean OR between the words rather than AND, which would be the expected behavior. This is highly counter-intuitive. And the lack of support for search for quoted terms means there is no workaround.

Comments

Asbjørn Riis-Knudsen 16 Jul 2014, 19:22:15

From a bit of research it seems that the backoffice search doesn't handle multiple words correctly. For instance, if I search for "John Doe" (without the quotes), the query looks like this: __nodeName:john doe

This causes Umbraco to search for nodes where the name name contains "John" and ''any'' field contains "Doe". That is not what I want. I would expect the query to look more like this: __nodeName:(john doe)

This causes Umbraco to return much more relevant results, because the title now should contain ''both'' "john" and "doe".

I modified the searching code in Umbraco.Web.Editors.EntityController accordingly (starting at line 290): var sb = new StringBuilder();

var querywords = query.Split(' ');

//node name exactly boost x 10 sb.Append("+(__nodeName:"); sb.Append("""); sb.Append(query.ToLower()); sb.Append("""); sb.Append("^10.0 ");

//node name normally with wildcards sb.Append(" __nodeName:"); sb.Append("("); foreach (var w in querywords) { sb.Append(w.ToLower()); sb.Append("* "); } sb.Append(") ");

foreach (var f in fields) { //additional fields normally sb.Append(f); sb.Append(":"); sb.Append("("); foreach (var w in querywords) { sb.Append(w.ToLower()); sb.Append("* "); } sb.Append(")"); sb.Append(" "); }

//must match index type sb.Append(") +__IndexType:"); sb.Append(type);

For me, this produces much more concise and relevant results, but I'm far from a Lucene expert.


Asbjørn Riis-Knudsen 18 Jul 2014, 15:48:34

I have made a pull request with a fix for this issue (and issue U4-5167): https://github.com/umbraco/Umbraco-CMS/pull/426


Shannon Deminick 22 Jul 2014, 22:26:26

Great!

The quotes should be handled differently by escaping them for lucene syntax, I'll have a look at that later, for now your fix is better than what is currently there :)


Priority: Normal

Type: Bug

State: Fixed

Assignee: Shannon Deminick

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted: Pull request

Affected versions: 7.0.0, 7.1.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.1, 7.2.0, 7.1.2, 7.1.3, 7.1.4, 7.1.5

Due in version: 7.1.5

Sprint:

Story Points:

Cycle: