Since v0.8.0
Solr Clouds are complex distributed systems, and thus require additional help when trying to scale up or down.
Scaling/Autoscaling can mean different things in different situations, and this is true even within the SolrCloud.spec.scaling
section.
The following sections describes all the features that the Solr Operator currently supports to aid in scaling & autoscaling SolrClouds.
The scaling
section in the SolrCloud CRD can be configured in the following ways
spec:
scaling:
vacatePodsOnScaleDown: true # Default: true
populatePodsOnScaleUp: true # Default: true
Solr can be scaled up & down either manually or by HorizontalPodAutoscaler
’s, however no matter how the SolrCloud.Spec.Replicas
value
changes, the Solr Operator must implement this change the same way.
For now Replicas are not scaled up and down themselves, they are just moved to utilize new Solr pods or vacate soon-to-be-deleted Solr pods.
Note: Scaling actions with replica movements are a executed via Cluster Operation Locks, please refer to the documentation for more information about how these operations are executed.
When the desired number of Solr Pods that should be run SolrCloud.Spec.Replicas
is decreased,
the SolrCloud.spec.scaling.vacatePodsOnScaleDown
option determines whether the Solr Operator should move replicas
off of the pods that are about to be deleted.
When a StatefulSet, which the Solr Operator uses to run Solr pods, has its size decreased by x
pods, it’s the last
x
pods that are deleted. So if a StatefulSet tmp
has size 4, it will have pods tmp-0
, tmp-1
, tmp-2
and tmp-3
.
If that tmp
then is scaled down to size 2, then pod tmp-3
will be deleted first, followed by tmp-2
because they are tmp
’s last pods numerically.
If Solr has replicas placed on the pods that will be deleted as a part of the scale-down, then it has a problem. Solr will expect that these replicas will eventually come back online, because they are a part of the clusterState. The Solr Operator can update the cluster state to handle the scale-down operation by using Solr APIs to move replicas off of the soon-to-be-deleted pods.
If scaling.vacatePodsOnScaleDown
option is not enabled, then whenever the SolrCloud.Spec.Replicas
is decreased,
that change will be reflected in the StatefulSet immediately.
Pods will be deleted even if replicas live on those pods.
If scaling.vacatePodsOnScaleDown
option is enabled, which it is by default, then the following steps occur:
Because of the available Solr APIs, the statefulSet can only be scaled down 1 pod at-a-time, this is why the Scale down step is repeated until the statefulSet size reaches the desired size.
If the SolrCloud.spec.replicas
is set to 0, then the SolrCloud will set the statefulSet replicas to 0 without moving or deleting replicas.
The data will be saved in PVCs if the SolrCloud is set to use persistent storage, and dataStorage.persistent.reclaimPolicy
is set to Retain
.
If the reclaimPolicy
is set to Delete
, these PVCs will be deleted when the pods are scaled down.
When the desired number of Solr Pods that should be run SolrCloud.Spec.Replicas
is increased,
the SolrCloud.spec.scaling.populatePodsOnScaleUp
option determines whether the Solr Operator should move replicas
onto the pods that have been created because of the scale-up.
If scaling.populatePodsOnScaleUp
option is not enabled, then whenever the SolrCloud.Spec.Replicas
is increased,
the StatefulSet’s replicas will be increased, and no other actions will be taken by the Solr Operator.
This means that the new pods that are created will likely remain empty until the user takes an action themselves.
This could be creating collections, migrating replicas or scaling up existing shards/collections.
If scaling.populatePodsOnScaleUp
option is enabled, which it is by default, then the following steps occur:
spec.replicas
(number of pods).The managed scale-up option relies on the BalanceReplicas API in Solr, which was added in Solr 9.3.
Therefore, this option cannot be used with Solr versions < 9.3.
If scaling.populatePodsOnScaleUp
option is enabled and an unsupported version of Solr is used, the cluster lock will
be given up after the BalanceReplicas API call fails.
This behavior is very similar to scaling.populatePodsOnScaleUp
being disabled.