A 3 minutes read, written by Terry
July 30, 2015
I was recently tasked with building an online gallery for a client, which also needed to allow users to filter data based on various criteria. It sounded easy enough, but the data for this gallery would be in the Solr search appliance for the site.
The project was a lot of fun but posed a few struggles for a newb like me.
Solr is an open-source Search Appliance built on Apache Lucene™. Once the website’s data has been indexed into a Solr collection, developers can add search functionality to a website, as well as execute queries on it like a database.
The next few paragraphs are going to describe how to group data, as well as examine a problem I encountered and finally, how I resolved it. Before proceeding, please note that all of the code examples in the schema.xml apply to Solr version 4.2. If you’re using a newer version, the code may be somewhat different, but the logic should be very similar.
In SQL we can simply write Group By columnName and it’s done. With Solr it wasn’t so simple.
To return grouped data from a Solr query (in my case, a field called “category1”), there are two parameters we need to set.
You can use the “fl” parameter to select which fields you’d like returned.
To do this via the Solr interface you have to populate those fields via the Raw Query Parameters field:
If we are directly accessing our Solr appliance with a URL string, our URL would look something like this:
http://localhost:8983/solr/my-collection/select?q=*%3A*&fl=category1%2Ctitle&wt=json&indent=true&group=true&group.field=category1
If you are using the Apache_Solr_Service
, your code would look something like this:
$collection_path = “/solr/my-collection”; $solr = new Apache_Solr_Service(SOLR_IP, SOLR_PORT,$collection_path); $query = "*"; $start = 0; $num_rows = 1000; $additionalParameters = array( 'fl'=>’title, category1’, 'group.field'=>’’category1”, 'group'=>"true" ); if (get_magic_quotes_gpc() === 1) { $query = stripslashes($query); } $output = $solr->search($query, $start, $num_rows, $additionalParameters);
I encountered one major issue when grouping data. Depending on the type of field you’re using, Solr will tokenize the content of that field. You can learn more about tokenizers here.
This is where you will have to revisit your schema.xml.
In my schema, my category1 field is a “text_general
” field type. The “text_general
” field type was being tokenized using the solr.StandardTokenizerFactory
class.
This class splits the text field into tokens, treating whitespace and punctuation as delimiters.
<field name="category1" type="text_general" indexed="true" stored="true"/> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
This caused a few issues with grouping:
Obviously, this was a problem. I couldn’t just change the field because the tokenized fields allow for more efficient and accurate indexing and searching.
After doing a bit of research I discovered that the solution was actually pretty simple. I needed to create an additional field that would not be tokenized by Solr and then copy the value from my original field into that new field. Then I could use this new field for grouping.
I used the “string” field type. In my schema.xml I have my original field in the <fields>
node:
<field name="group1" type="text_general" indexed="true" stored="true"/>
I then created my new field to the <fields>
node:
<field name="category1_full" type="string" indexed="true" stored="true"/>
Outside of the <fields node>
I copied the contents of the original field to my new field.
<copyField source="category1" type="text_general" dest="category1_full"/>
Now I can run the same queries I did earlier, but instead of using category1 as my group field, I am using category1_full
.
http://localhost:8983/solr/my-collection/select?q=*%3A*&fl=category1_full%2Ctitle&wt=json&indent=true&group=true&group.field=category1_full
If you are using the Apache_Solr_Service
, your code would look something like this:
$collection_path = “/solr/my-collection”; $solr = new Apache_Solr_Service(SOLR_IP, SOLR_PORT,$collection_path); $query = "*"; $start = 0; $num_rows = 1000; $additionalParameters = array( 'fl'=>’title, category1,category1_full’, 'group.field'=>’’category1_full”, 'group'=>"true" ); if (get_magic_quotes_gpc() === 1) { $query = stripslashes($query); } $output = $solr->search($query, $start, $num_rows, $additionalParameters);
This is one way to solve this issue. As with most problems, there are likely several solutions. If you know a different way, then please post a comment on this blog and let me know. I’d love to hear different approaches.
Terry Palmer
As a developer and our support specialist, Terry can help you with almost everything on your website. No matter what you need, he's your guy, specializing in all our core technologies.
Terry's our lead lunchtime gym attendee, and he coordinates our (multiple) company hockey teams. You might find him listening to sports radio. Constantly.
Source: How to group data when querying Solr
© copyright 2015 by yellowpencil