Managed OpenSearch Service Overview: Expectations VS. Reality

Amazon OpenSearch Service vs unmanaged OpenSearch – a complete overview examining available features, capabilities, costs, and limitations.

Last Updated: May 2022

Running search operations can be extremely difficult and many companies struggle to maintain their deployments. To achieve the best results at the lowest possible cost, companies need to efficiently manage both the infrastructure and data layers of operations.

The infrastructure layer of operation includes: Elasticsearch/OpenSearch orchestration, deploying clusters, scaling resources, configuring security, provisioning and more.

The data layer includes: taking care of ingesting data, configuring and maintaining data structure, optimizing shards, avoiding latency, preventing incidents, improving performance and more.

Like many operations running in the cloud, when operating OpenSearch, you can choose to take the managed or unmanaged route. With an unmanaged approach, the customer takes full responsibility for everything, both the infrastructure and data layers (applications and services).

Alternatively, you can employ the Amazon OpenSearch Service, which manages most of the infrastructure tasks for you, thus facilitating the deployment, operation, and scaling of OpenSearch clusters in the AWS cloud.

In this blog post, we’ll explore the managed vs unmanaged route by examining available features, capabilities, costs, and limitations. We’ll also have a look at the expectations users have of the service, versus the reality they meet.

What does Amazon OpenSearch Service cover?

AWS offers an OpenSearch managed service to easily deploy and manage OpenSearch infrastructure.

AWS makes it clear that they do not operate or configure applications for their customers – the focus is on managing AWS infrastructure more efficiently and securely. While AWS can assist security and some capacity optimization, they do not support the data layer.

Setup and configuration

Responsibilities of the unmanaged route

Those who choose to manage OpenSearch on their own are entirely responsible for setting things up from A to Z, including network infrastructure, OS, disks, JVM, orchestration tools, backup and restore procedures, and security. These tasks can be quite complex, requiring a great deal of time and expertise.

Setting up all the necessary tasks and services involves tinkering with many settings, such as memory limits and port mapping. If the settings aren’t tuned correctly, this could lead to performance issues. For instance, if you don’t set up cleanup procedures for your logs and regularly monitor your memory usage, the disk space may run out and cause an outage. Other settings, such as dynamically assigning a host port to multiple containers, can harm mission-critical parameters like high availability.

Using Amazon OpenSearch Managed Service

One of the greatest advantages of Amazon OpenSearch Service is that it renders setup and configuration easy and immediate. You can easily deploy OpenSearch, selecting the desired number of instances, instance types, and storage options. Once selected, the service does the rest—setting up the domain, provisioning infrastructure capacity, and installing OpenSearch software.

Once the cluster is up and running, Amazon OpenSearch Service fully manages resources and performance through automated administrative tasks, including hardware provisioning, automatic daily backups, cluster recovery after failure, and version upgrades. Managing resources is simple—with straightforward drop-down menus for adjusting instance size and other parameters.

Amazon OpenSearch Service monitors, visualizes, and analyzes certain key metrics in real-time. However, alerts and events must be built from scratch or set up through CloudWatch.

Setup and configuration drawbacks

Although this service integrates seamlessly with other AWS services, it only supports the following set of plugins. Some of the missing plugins are vital for expanding OpenSearch capabilities.

Cost considerations

It’s tricky to compare the overall costs of an unmanaged and managed OpenSearch deployment. Cost is a function of many different factors, for example, the amount of data, backup needs, transfer costs, and the ability to plan ahead, as well as time and effort.

Managed costs

Although the Amazon OpenSearch Service does not require upfront fees or minimum usage requirements, its costs can become quite high. As operations begin to scale, the managed service can become particularly costly. At this stage, you may find yourself in a kind of “vendor lock-in,” making it difficult to transition back to an unmanaged environment.

Moreover, while automated snapshots are stored in Amazon S3 for free, any additional manual snapshots will be stored and subject to Amazon S3 pricing. Yet another thing to take into account is that if you can plan ahead, choosing Reserved Instances in advance, as opposed to the on-demand option, can lower costs significantly.

Unmanaged expenses

At face value, the unmanaged option is far cheaper, costing roughly 30-40%—sometimes even up to 50%—less than the managed option. Without the OpenSearch service however, you’ll need to find a way to manage the infrastructure on your own, handling provisioning, monitoring, and finding adequate tools for observability and troubleshooting. If your team doesn’t have the resources to set things up correctly, optimize operations, and solve issues quickly, costs could pile up as well.

What about the data?

Another factor influencing costs is the amount of data. If you have less than 5 GB of data, the service may be quite expensive, especially if you have multiple small clusters for isolation purposes. For each cluster, OS requires three dedicated master nodes to ensure a stable cluster. As a result, storing documents below 5 GB may not be worth the price. You get more bang for your buck if you have over 1 TB of data.

Monitoring, system optimization, and maintenance

Monitoring is a challenging and important issue in both the managed and unmanaged scenarios. Let’s take a look.

On the managed side, first off, Amazon OpenSearch Service has limited access to administrative APIs, logs, and metrics. Although it provides aggregate metrics at the cluster level, it lacks some important node-level metrics and query logs. This is incredibly limiting when it comes to troubleshooting and pinpointing the root cause of any issues that arise.

When monitoring OpenSearch on your own, you may have more freedom and greater access to system metrics and logs, but you’ll still be tasked with finding a good monitoring tool and knowing how to make the best use of it. Either way, system visibility is critical for the next step—troubleshooting and performance optimization.

An important element of managing OpenSearch properly is the ability to adjust configurations in order to optimize performance. This points to a disadvantage in the AWS service, which only supports a limited set of operations and configuration changes. Among the functions missing are altering important performance factors (e.g., thread pool and query cache sizes) and basic functions like reindexing from a remote cluster (via reindex.remote.whitelist).

Finally, the OpenSearch Service takes care of updates for you, allowing you to conveniently track progress without having to get involved. However, a significant drawback is AWS’s use of blue-green deployments to do so. In this method, all the data of the cluster is copied to the new nodes, after which the old cluster is destroyed. This process may take days, exceed the cluster’s capacity, or even cause it to crash mid-operation. As a result, upgrades, rollouts, maintenance, and even the tiniest update can become time-consuming and expensive in large deployments.

When managing OpenSearch on your own, these operations are achieved through a rolling restart. This is more complex, but on the whole, takes much less time and is more efficient.

Mission-Critical operations: The case of downtime

A built-in disadvantage of managed services is that it increases your dependence on external support teams. When the production environment goes down, you’re completely dependent on OS support, and it may take days until the issue is resolved.

Another problem with the OS service that increases the risk of downtime is that it does not support shard rebalancing. Thus, if a single node runs out of space, the whole cluster will stop ingesting data. Businesses with mission-critical operations should take these limitations into account, and if they have decided to go with the managed option they might be better off with purchasing premium AWS support.

It’s important to note that dealing with downtime without a managed service is no less challenging. And there are plenty of examples that demonstrate the difficulty of dealing with downtime on your own.

Managed Services – Expectations VS. Reality

Users often start out self-managed before moving to managed services. The reality of managing ES/OS is usually not what they expected.

Managed services structure their business in such a way that they do not take care of the application layer. They don’t provide proactive support, and they take no responsibility for the way you configure your data. This last part makes perfect sense, seeing as it would be out of scope for a managed service to be responsible for the data you want to put in your system. Their agenda is not to intervene with the way you decide to structure, configure and manage your data.

Self-Managed – Expectations

Self-Managed – Reality

Ensure that mission-critical applications are running at peak performance and stable

Generic monitoring tools are not enough to operate successfully

Keep costs low – by avoiding payments to outside providers for support and services

Skilled DevOps engineers are hard to find. They then need to develop & maintain internal expertise tooling for multiple technologies. Hardware & employee costs pile up