Page tree
Skip to end of metadata
Go to start of metadata

SolrCloud is a set of Solr features that expands the capabilities of Solr's distributed search, simplifying the creation and management of Solr clusters. SolrCloud is still under active development, but already supports the following features:

  • Central configuration for the entire cluster
  • Automatic load balancing and fail-over for queries
  • Zookeeper integration for cluster coordination and configuration

For an introduction to SolrCloud, and how it is different from index replication, see the LucidWorks Knowledgebase article What is SolrCloud?. In addition, the Apache Solr Reference Guide includes an extensive section on SolrCloud, which includes background information and configuration instructions. Some changes have been made for LucidWorks Search, however, which are described below.

LucidWorks Search implements SolrCloud as a purely Solr feature; to manage SolrCloud shards and replicas, you should refer to and use instructions designed for a purely Solr installation. There are only a few caveats and modifications for LucidWorks Search, detailed below, specifically for bootstrapping ZooKeeper and the cluster nodes.

Topics discussed in this section:

Enabling SolrCloud Mode

LucidWorks Search includes an installer that can install the application on each node of the planned SolrCloud cluster. For details on using this approach, see the section SolrCloud Cluster Installation. This approach will allow you to install three ZooKeeper instances to create a quorum, and then install as many LucidWorks Search nodes as needed.

The standard instructions for starting SolrCloud are modified slightly for LucidWorks Search. Commands within the installer take these modifications into account, but if starting without the installer, refer to the modifications described below.

While much of the SolrCloud documentation in the Apache Solr Reference Guide section on SolrCloud can be used, it is important to only start LucidWorks Search in SolrCloud mode with the instructions included here.

Using the Embedded ZooKeeper

It's possible to make two standalone, or single server, installations communicate with each other in SolrCloud mode using the ZooKeeper instance embedded with Lucidworks Search. This can be useful to create a simple two-node cluster when just starting to learn how this functionality can work for your search application. With this approach, two separate installations are made (as described in the section Single Server Installation). Then one installation is started with commands to bootstrap configurations and start ZooKeeper.

Because we need two servers for this example, we will make two installations of LucidWorks, one on the server "example" and the other on the server "example2". During installation, do not start LucidWorks Search. Instead, start the two installations manually, as shown below.

Icon

We recommend that you only install LucidWorks using the installer application; copying the LucidWorksSearch directory to another directory to create another server may cause conflicts with ports. Information on installing LucidWorks is available in the section on Installation.

The installation in example should use port 8983 for the LWE-Core component, which will be changed from the default during the installation process. The installation on example2 should use the default port (8888) for the LWE-Core component. If enabling other components, be sure to modify the ports for each installation as well. If new to LucidWorks, see the section on Working With LucidWorks Search Components for more information about the components. Your port selections might look like this:

Component

example Ports

example2 Ports

LWE-Core

8983

8888

LWE-Connectors

8965

8765

LWE-UI

8889

8989

ZooKeeper will run on the LWE-Core port + 1000, so in this scenario we expect ZooKeeper to run on port 9983. It's important to keep that in mind while planning the installation ports so there isn't an inadvertent conflict with LucidWorks Search ports.

Icon

SolrCloud uses ZooKeeper to manage nodes, and it's worth taking a look at the ZooKeeper website to understand how ZooKeeper works before configuring SolrCloud. Solr can embed ZooKeeper, but for a production use, it's recommended to run a ZooKeeper ensemble, as described in the ZooKeeper section of the SolrCloud wiki page.

Starting LucidWorks Search

To start LucidWorks Search in SolrCloud mode, use the usual LucidWorks start script, but pass some Java options to it. To start example, you would use a command like this:

Start 'example'

The bootstrap_conf allows copying of the configuration files for each collection to the nodes, while zkRun starts ZooKeeper. The numShards value defines how many nodes there will be in the cluster. Be sure to set this accurately, as Solr cannot yet easily increase the number of shards without re-bootstrapping the cluster.

We only need to pass bootstrap_conf and numShards the first time LucidWorks is started in SolrCloud mode. In subsequent LucidWorks restarts, start this leader node with ./start.sh -lwe_core_java_opts "-DzkRun". The -DzkRun could be added to master.conf, in which case the start.sh script alone would start ZooKeeper each time.

To start the next nodes of the cluster, we still use the start script, but with some different options. This would start example2:

Start 'example2'

Note that the port defined as the zkHost is the port of the LWE-Core component + 1000. So, if LWE-Core on our example server was defined at port 8983, ZooKeeper would be started at port 9983.

Icon

The above instructions assume a Linux-based operating system. For Windows-based systems, use start.bat as in these examples:

Start example:

Start example2:

If you have used the installer to install LucidWorks in SolrCloud mode, the required commands have been added to the master.conf for each server, and no special start or stop instructions are required for restarts. In that case, you would not run the embedded ZooKeeper; instead you would have installed and configured a quorum, and the zkHost parameters have been added to the master.conf file.

Bootstrapping Solr vs. LucidWorks Search

This table outlines the differences between the Solr instructions for bootstrapping SolrCloud mode and the LucidWorks Search instructions. It is meant as a summary if you are already familiar with how SolrCloud works.

SolrCloud

LucidWorks Search

Use start.jar

Use start.sh or start.bat with -lwe_core_java_opts defined

Use bootstrap_confdir to upload configuration files to ZooKeeper

bootstrap_conf=true

Use collection.configName

Not needed with bootstrap_conf=true

Default configuration directory is ./solr/collection1/conf

Default configuration directory is $LWS_HOME/conf/solr/cores/collection1_0/conf

How SolrCloud Works with LucidWorks

There are some caveats to using SolrCloud with LucidWorks Search, as it is so far only partially integrated with the system. Future releases of LucidWorks Search will contain more tight integration points with SolrCloud functionality.

Replicated Configurations

When running LucidWorks Search in SolrCloud mode, some LucidWorks Search-specific features are not yet fault tolerant and highly available. While the index and configuration files are fully SolrCloud supported, the following are not currently replicated across shards:

  • Data sources and their related metadata (such as crawl history)
  • The LucidWorks user database, which stores manually created users (such as the default "admin" user)
  • User alerts
  • LDAP configuration files
  • SSL configuration

Even though these features aren't replicated, they can still be used with LucidWorks Search in SolrCloud mode. The files that hold this metadata are in the $LWS_HOME/conf folder and could be copied to the other nodes in the cluster to act as backup if the main node goes down for any length of time. This is a manual process and not yet automated by LucidWorks Search.

Using the Admin UI in SolrCloud Mode

To accommodate for the lack of replicated configurations, we recommend that you do a full LucidWorks Search installation (i.e., all components) on every machine in your cluster. You should then choose one node to use for the Admin UI. This is the node that will store your data sources and associated metadata. Another node can be chosen as the node that does crawling, or you can use the same node used by the Admin UI. Document updates will still be sent to the nodes, via the index update processes that make up SolrCloud functionality.

If the node used for the Admin UI goes down, you can choose another node to act as the Admin UI node, but unless the related configuration files have been copied to that node you will not have the same user accounts and data sources in the other nodes. Once you bring the node originally used for the Admin UI back, it should still have your data sources and other LucidWorks-specific metadata.

You can configure LucidWorks Search to not start the Admin UI by changing $LWS_HOME/conf/master.conf and setting the lweui.enabled parameter to 'false'.

Feature Limitations

The following LucidWorks features may encounter significant problems when working in SolrCloud mode:

  • Click Scoring cannot be used in SolrCloud mode at this time.
  • Auto-complete-related suggestions should be pulled from a single index node if auto-complete is enabled by adding '&distrib=false' to any query. Distributed auto-complete indexing is possible but requires configuration of the auto-complete indexing on each node and adding a 'query' component to the autocomplete requestHandler in solrconfig.xml.
  • De-duplication does not work in SolrCloud due to a bug in Solr (SOLR-3473).
  • SSL does not work with SolrCloud due to a bug in Solr (SOLR-3854).
  • Log indexing and query statistics in the Admin UI will be inconsistent. If you are using LucidWorks Search in SolrCloud mode or with each component installed on a different server, please see the section Log Indexing with Separated Components for details on how to make sure your logs are fully indexed.

Collections APIs

LucidWorks Search and Solr both have Collections APIs. They are not duplicates, even though they share the same parameters. It is important, however, to only use the LucidWorks Search Collections API to create collections, because of the issues described in the section #Replicated Configurations. The LucidWorks Search Admin UI also uses the LucidWorks Collections API to create collections.

When creating a new collection (with either the Admin UI or the API), and you are working in SolrCloud mode, you can specify the number of shards to break it up into. This number, however, cannot be higher than the number of shards defined when LucidWorks Search was bootstrapped.

Behind the scenes, the LucidWorks Search Collections API update LucidWorks Search-specific collection configuration files and also uses Solr's Collection API to create the collection in Solr. This has some ramifications for LucidWorks Search:

  • Solr's Collection API does not allow defining the instanceDir or the dataDir, so there is no way for LucidWorks Search to instruct Solr to create the new collection directories in the same place on the filesystem as the pre-existing collections that ship with LucidWorks Search. Solr creates collections by default with the conf and data directories in the same location, but the LucidWorks Search directory structure separates those directories to $LWS_HOME/conf/solr/cores and $LWS_HOME/data/solr/cores. Because Solr's Collection API does not allow setting the path values explicitly, they are created in Solr's default location. What this means is that new collections created in SolrCloud mode will be located in a different location from the pre-existing collections (i.e., they will be located under $LWS_HOME/conf/solr and the data directory will not be located under $LWS_HOME/data/solr). This is normal and will not have any impact on document indexing or searching.
  • Solr's Collection API itself uses Solr's CoreAdmin API to asynchronously create cores on each node. For this reason, the collection will appear to be renamed as <collection>_shard<x>_replica<y>. LucidWorks Search will mostly display the correct name, but the directory on the server will show the core name (and each core on each node will be named differently). The Solr Admin UI will also display the core name in the Core dropdown list. If you are accessing the Solr Admin for several different nodes, this may cause some initial confusion. Essentially, LucidWorks displays information about a collection, but Solr displays information about the specific core you are looking at. For more information about the differences between cores and collections in Solr, also refer to the SolrCloud Glossary and other pages on SolrCloud in the Apache Solr Reference Guide.

Using a Stand-Alone ZooKeeper Instance or Ensemble

If you review the Solr Reference Guide or any of the Solr documentation about SolrCloud, you may notice that using the Apache ZooKeeper instance that is included with Solr is not recommended for real production systems. This is because the embedded Zookeeper will not provide sufficient failover; the ZooKeeper instance is dependent on the Solr instance so if one of the Solr instances is shut down, an associated ZooKeeper instance will also be shut down.

For this reason, the LucidWorks installer includes the ability to install a ZooKeeper quorum while installing LucidWorks Search.

If you have an existing ZooKeeper, or an existing SolrCloud setup, the Apache Solr Reference Guide provides information about how to use a stand-alone ZooKeeper instance at Setting Up an External ZooKeeper Ensemble. That information is worth reviewing before installing a stand-alone ZooKeeper. The same instructions apply if used with LucidWorks Search, with the exception of the bootstrapping instructions as described in the earlier section #Starting LucidWorks Search (above).

Icon

When using stand-alone ZooKeeper with LucidWorks Search, you need to take care to keep your version of ZooKeeper updated with the latest version distributed with Solr and LucidWorks Search. Since you are using it as a stand-alone application, it does not get upgraded when you upgrade LucidWorks Search.

Solr 4.0 and LucidWorks 2.5.0 and 2.5.1 use Apache ZooKeeper v3.3.6.

Solr 4.1 and higher, and LucidWorks 2.5.2 and higher, use Apache ZooKeeper v3.4.5.

Related Topics