Opster Elasticsearch and OpenSearch blogs

Opster AutoOps Now Listed on Amazon Web Services Marketplace

Elisheva Stern — Mon, 13 Feb 2023 15:17:13 +0000

Opster has announced today that AutoOps is now listed on the Amazon Web Services (AWS) Marketplace.

Opster AutoOps is designed specifically for organizations that rely on Elasticsearch and OpenSearch for their mission-critical applications. AutoOps improves performance, automates routine database operations and reduces the time and cost associated with managing Elasticsearch and OpenSearch.

Being listed on AWS Marketplace opens up new opportunities for customers looking for search solutions and support. This listing makes it easier for organizations to discover, purchase, and manage the solutions they need to optimize their Elasticsearch and OpenSearch databases. AWS customers have a smoother billing process without needing to set up a separate procurement plan, and also have the ability to use committed spend.

For those using AWS OpenSearch service, the AutoOps listing is an amazing chance to gain additional capabilities, visibility and support for their OpenSearch clusters.

“We’re excited to be part of the AWS Marketplace community,” said Ziv Segal, CEO and Co-Founder of Opster. “As more and more organizations turn to Elasticsearch and OpenSearch for their applications, it’s essential that they have the right tools to manage and optimize their databases. Our listing on AWS Marketplace makes it easier for AWS customers to use Opster, and we’re grateful for the opportunity to help more and more organizations reach their goals.”

To visit Opster’s listing on AWS Marketplace, click here.

The post Opster AutoOps Now Listed on Amazon Web Services Marketplace appeared first on Opster.

Opster’s OpenSearch Kubernetes Operator – New Features!

Elisheva Stern — Sun, 12 Feb 2023 13:19:16 +0000

Opster’s OpenSearch Kubernetes Operator, licensed Apache V2, simplifies spinning up and scaling clusters, optimizing configuration, upgrading versions, security and more. Whether you’re managing your own K8s cluster or going the hosted route, you can deploy the OpenSearch Operator and benefit from its abilities on all cloud environments.

In addition to the many existing features, the Operator has recently gained two new features – the Snapshot Manager and Monitoring for OpenSearch environments.

Operator Snapshot Manager

The Snapshot Manager is an extremely useful feature that allows users to define snapshot repositories with very simple definitions in the CRD of their OpenSearch cluster. This enables users to centrally configure roles and permissions for the backend cloud, and a scheduled job will be triggered each time a new snapshot is configured or existing snapshot settings are changed.

This feature also spares users the time and effort of having to work with the OS API or Kibana in order to configure and manage snapshots. Furthermore, it is compatible with all OS versions, making it highly convenient to use.

Operator Monitoring

The new monitoring feature provides built-in monitoring for OpenSearch environments. The monitoring stack is provided out-of-the-box by Aiven Prometheus exporter plugin for OS and is also compatible with all OS versions.

The benefits of using this feature are immense. Firstly, compared to regular installations, users can now easily set up a monitoring stack that is up-to-date with the latest metrics and visualizations. Secondly, the Grafana dashboard provided with this feature is highly customizable and allows users to generate metrics based on their specific requirements and use cases. In addition to the Grafana dashboard, the Aiven plugin also includes a set of PrometheusRules to help manage OpenSearch clusters in real-time.

In other words, with one simple action you can install the full monitoring bundle for every OpenSearch cluster that is managed by the Operator.

Overall, the new features added to the OpenSearch Kubernetes Operator are extremely useful for users who want to make their OpenSearch management as efficient and convenient as possible. Both new features provide users with a great deal of control over their data and allow them to configure and manage their clusters with ease and accuracy.

To get started with the Operator, click here.

The post Opster’s OpenSearch Kubernetes Operator – New Features! appeared first on Opster.

Opster’s Q4-2022 Product Feature Highlights & What’s Ahead

Elisheva Stern — Mon, 02 Jan 2023 11:46:02 +0000

As we kick off 2023, we’re happy to share some of Opster’s highlights from the last quarter of 2022 and some of our upcoming features that will be released this year.

At the end of 2022 we introduced tons of exciting new functionalities that help our customers and answer their needs. If you want to try them out for yourself, you can start a free trial here.

Here’s a few of the most exciting features we added:

Q4-2022 Feature Highlights

Visualize shard activity over time in Shard View
AutoOps advanced analysis capabilities
Easily locate noisy neighbors in multi-tenant clusters
API access for AutoOps
One Login integration
Grafana integration for AutoOps
Advanced abilities with Opster Management Console
New AutoOps Reports
Automatic Template Analyzer analysis

Visualize hotspots and resolve bottlenecks thanks to enhancements to Shard View

With Shard View, you can view the system’s activities over time and track how shard activity has changed and affected node load. You can “play” the activity of different time windows and see how the heat map evolved. This helps users see how hotspots formed and resolve bottlenecks.

Drill down issues in your system with the latest events improvements

We improved the drill down of events in AutoOps by taking users directly to the relevant section in Shard View. This way you get a clear picture of the event content and can see the relevant screen already sorted according to the event parameters.

Easily locate noisy neighbors in multi-tenant clusters with sorting of search and indexing latency

A completely unique feature that does not exist in any other system outside of Opster. AutoOps detects which indices were experiencing the highest latency and enables you to sort accordingly, so you can instantly see which indices are experiencing the issues. This way you can easily locate noisy neighbors in multi-tenant clusters or which application/use case is experiencing the burst.

Easily integrate your own automations and internal systems with API access

We’ve opened API access to get AutoOps insights, events and more. This allows easy integration into your own automations and internal systems, both to get insights and act on the recommendations provided in the API. For the full API documentation and instructions on how to use the tokens, see here.

Improved security and access control with One Login Integration

Security and access control are very important to our customers, so we added the ability to integrate the One Login authentication method into AutoOps. This security feature allows you to control who has access to AutoOps in an easier and simpler fashion.

Coming Soon

Integrate open-source tools with Grafana integration for AutoOps

You will be able to integrate the open-source tools you’re used to using with your Opster tools and dashboards. The Grafana dashboard will provide advanced metrics compiled by AutoOps and the events + insights from the AutoOps analysis.

Advanced cluster management abilities with the Opster Management Console

With the newest version of the OMC, we added:
– Scaling of nodes
– Easy steps to change resourcing
– Easy management of admin credentials
– Support for new OpenSearch versions

More insights, metrics and visuals to see your cluster’s health with the new AutoOps Reports

To provide further value on how the system has improved thanks to Opster, we’ve added more insights, metrics and visuals. You can see the full picture of your clusters’ health and the cost of the deployments in the new redesign of the Reports screen in AutoOps, with advanced statistics and KPIs.

No need to run the Template Analyzer manually – AutoOps will do it automatically for you

Introducing: template events in AutoOps. You don’t have to run the Template Analyzer manually anymore – AutoOps does it automatically. We’ve added an event to track the creation of new templates, as well as events to show changes in existing templates that mat require your attention. AutoOps will analyze all the templates to identify fields that need to be adjusted in order to improve performance, and a report is compiled and made available to users automatically.

As always, there’s more coming soon.

Here’s to another great year – happy 2023!

Want to start using AutoOps? You can sign up for a free trial here, or contact us here!

The post Opster’s Q4-2022 Product Feature Highlights & What’s Ahead appeared first on Opster.

OpenSearch Monitoring Tools

sharon — Thu, 17 Nov 2022 12:09:38 +0000

Quick Links

Which open-source/free tools should you use for OpenSearch monitoring?
Monitoring OpenSearch with open-source tools
So which tool should you choose?
Why standard monitoring tools aren’t enough
Conclusion

Which open-source/free tools should you use for OpenSearch monitoring?

Published on: November 2022

Observability is a critical aspect of operating any system, exposing its inner workings, and facilitating the detection and resolution of problems. Monitoring tools serve as the first and most basic layer in system observability. In OpenSearch, the search engine that powers so many of today’s applications, reliable monitoring is an absolute must and is the primary building block of a successful operation.

OpenSearch infrastructure can be quite complex, requiring the monitoring of many performance parameters that are often interlinked. These include memory, CPU, cluster health, node availability, indexing rates, and JVM metrics (e.g., heap usage, pool size, and garbage collection). There are multiple open-source monitoring tools available for OpenSearch, each with its advantages and limitations. While these tools can be extremely useful, as operations scale, it is common to encounter issues that aren’t easily resolved with the standard tools.

This blog post will explore four popular open-source tools for OpenSearch tracking, their defining features, and their key differences. It will also explain where such standard monitoring tools are lacking and how Opster can help you achieve optimal OpenSearch performance.

Monitoring OpenSearch with open source tools

1. Cerebro

An open-source MIT-licensed web admin tool, Cerebro enables OpenSearch users to monitor and manipulate indexes and nodes, while also providing an overall view of cluster health. It has over a million downloads on Docker and 5k stars on GitHub. Cerebro is similar to Kopf, an older monitoring tool that was installed as a plugin on earlier OpenSearch versions. When web applications could no longer run as plugins on OpenSearch, Kopf was discontinued and replaced by Cerebro, a standalone application with similar capabilities and UI.

Built with Scala, AngularJS, Framework, and Bootstrap, Cerebro can be set up easily, in just a few steps. It also boasts built-in capabilities to conveniently track and oversee operations in OpenSearch, including resyncing corrupted shards to another node, a dashboard showing the replication process in real-time, configuring backup using snapshots, and activating a selected index with a single click.

The Cerebro community is relatively small, resulting in less frequent updates and fewer features. Its documentation is sparse and it doesn’t support data from logs. In addition, while it is an excellent tool for tracking real-time processes, Cerebro does not provide graphs with historic/time-based node statistics and, thus, doesn’t offer anomaly detection or troubleshooting capabilities.

2. Prometheus and Grafana

Prometheus is a powerful metric-collection system capable of scraping metrics from OpenSearch. Grafana is a tool that, when coupled with Prometheus, can be used to visualize OpenSearch data. Both Prometheus and Grafana have larger communities and more contributors than Cerebro and, therefore, provide more features and capabilities. Prometheus and Grafana have 46k stars and 53.1k stars on GitHub respectively, and both have over 10 million downloads on Docker.

Able to display data over long periods of time, Grafana features versatile visual capabilities, including flexible charts, heat maps, tables, and graphs. It also provides built-in dashboards that can display information taken from multiple data sources. There are a large number of ready-made dashboards created by the Grafana community, which can be imported and used in your environment. For example, Grafana’s OpenSearch time-based graphs can display meaningful statistics on nodes. These capabilities make Grafana a good solution for visualizing and analyzing metrics, enabling users to add conditional rules to dashboard panels that can trigger notifications.

A major drawback of Grafana is that it doesn’t support full-text data querying. Moreover, it doesn’t support data from logs.

3. Opster Management Console (OMC)

Opster Management Console (OMC) provides the orchestration, monitoring and management capabilities that are offered by managed services, completely for free. By using the OMC, a single interface, users can: upgrade versions automatically, scale cluster resources, manage certificates & back-ups, monitor resources & costs, and more.

In addition, OMC routinely analyzes the connected system and provides alerts when there are signs of performance degradation. It offers recommendations on how to improve configuration & resolve issues, optimize templates, improve search performance & resource utilization, and reduce needed hardware.

OMC easily runs on any Kubernetes environment (on cloud and on-premise) and supports all versions of OpenSearch. Although the tool is relatively new, it has gained popularity among OpenSearch users due to its capabilities and ease of use. You can install the OMC from here.

So which tool should you choose?

Before you go straight for the OpenSearch monitoring tool with the greatest functionality, there are a few things to consider.

First, Cerebro is easy to set up and operate. Nevertheless, it has fewer its documentation is sparse, it doesn’t support data from logs, and does not provide graphs with historic/time-based node statistics.

Second, as generic monitoring tools, Prometheus and Grafana enable you to monitor everything, but they aren’t tailored to OpenSearch specifically. This can be quite limiting. Although users can plot many different kinds of graphs in Grafana, they cannot display which nodes are connected to the cluster and which have been disconnected. In addition, Grafana does not support an index or shard view, making it impossible to see where shards are located or to track the progress of shard relocation.

Finally, Opster Management Console (OMC) is relatively new but shows great promise and has gained popularity among OpenSearch users due to its advanced capabilities and ease of use.

Why standard monitoring tools aren’t enough

When it comes to OpenSearch, even with reliable monitoring tools in place, you may still encounter sudden, unexpected, and serious downtime episodes. Let’s take a closer look at why this is the case.

There are several reasons monitoring tools alone aren’t enough. For starters, it’s bad practice to install a monitoring tool and forget about it. Rather, you should keep up with the latest configuration guidelines and best practices and know how to implement them correctly.

Second, choosing which metrics to monitor and knowing how to analyze them is no small feat, as OpenSearch infrastructure can become quite complex. With so many metrics interacting with each other, even the smallest change can adversely impact performance. A monitoring tool may indicate you’ve run out of memory, for example, but this information alone isn’t enough to identify the underlying cause, let alone to resolve the issue and prevent recurrence.

Also, while traditional commercial monitoring tools are useful for event correlation and providing alerts, they still lack the capabilities needed to truly get to the bottom of your OpenSearch issues. Despite claims of providing root-cause analysis, these solutions generally provide basic event correlation analysis while failing to identify the root cause, which is critical for forecasting and avoiding future issues.

OpenSearch performance is crucial, especially as operations scale or when applications that affect end users are on the line. Successfully operating OpenSearch requires much more than improved monitoring and alerts: Teams must have access to tools with advanced prediction and problem-solving capabilities in order to tackle the complicated issues that may arise.

Conclusion

Ensuring visibility is critical for successfully managing complex systems. While there are many tools available for monitoring OpenSearch, not all are created equal. Most standard tools offer only basic analysis and do not get to the heart of the problem. Given the complexity of OpenSearch, this is inadequate in production, especially for operations at scale or those affecting customer experience.

No matter which monitoring tool you decide to use and how you’re hosting your OpenSearch deployment, you can benefit from Opster’s complete solution for OpenSearch. With an Opster AutoOps subscription, your database administration will be taken care of from start to finish, including advanced monitoring with proactive incident prevention. You’ll benefit from complete resolution of issues in your infrastructure & data layers, end-to-end support, and constant optimization of your clusters. Try Opster AutoOps for free.

The post OpenSearch Monitoring Tools appeared first on Opster.

How to solve 8 common OpenSearch errors

sharon — Thu, 17 Nov 2022 12:06:36 +0000

Last updated: November 2022

Developer forums are riddled with questions about OpenSearch errors and exceptions. Although never a pleasant topic, errors and exceptions can serve as a powerful tool, illuminating deeper issues in your OpenSearch infrastructure that need to be fixed. Getting acquainted with some of the prevalent failures will not only save you time and effort, but also help ensure the overall health of your OpenSearch cluster.

At Opster, we have analyzed a wide range of OpenSearch problems to understand what caused them. In this blog post, we’ll explain why some OpenSearch errors and exceptions occur and how to avoid them, and review some general best practices that can help you identify, minimize, and handle these issues with greater efficiency.

Let’s start by taking a look at some of the recurring errors and exceptions that most OpenSearch users are bound to encounter at one point or another.

1. Mapper_parsing_exception

OpenSearch relies on mapping, also known as schema definitions, to handle data properly, according to its correct data type. In OpenSearch, mapping defines the fields in a document and specifies their corresponding data types, such as date, long, and string.

In cases where an indexed document contains a new field without a defined data type, OpenSearch uses dynamic mapping to estimate the field’s type, converting it from one type to another when necessary. If OpenSearch fails to perform this conversion, it will throw the “mapper_parsing_exception failed to parse” exception. Too many of these exceptions can decrease indexing throughput, causing delays in viewing fresh data.

To avoid this issue, you can specify the mapping for a type immediately after creating an index. Alternatively, you can add a new mapping with the /_mapping endpoint. Note that while you can add to an existing mapping, you cannot change existing field mappings. This would cause the data that is already indexed to be unsearchable. Rather, to make the change properly, you need to reindex the entire index.

2. BulkIndexError

It’s often more efficient to index large datasets in bulk. For example, instead of using 1,000 index operations, you can execute one bulk operation to index 1,000 docs. This can be done through the bulk API. However, this process is prone to errors and requires you to carefully check for possible problems, such as mismatched data types and nulls.

When it comes to bulk APIs, you need to be extra vigilant, as even if there were hundreds of positive responses, some of the index requests in the bulk may have failed. So, in addition to setting up your bulk API with all the proper conditions ahead of time, go through the list of responses and check each one to make sure that all of your data was indexed as expected.

3. Search Timeout Errors: ConnectionTimeout, ReadTimeoutError, RequestTimeout, and More

If a response isn’t received within the specified search time period, the request fails and returns an error message. This is called a search timeout. Search timeouts are common and can occur for many reasons, such as large datasets or memory-intensive queries.

To eliminate search timeouts, you can increase the OpenSearch request timeout (the default is 30 seconds), reduce the number of documents returned per request, reduce the time range, tweak your memory settings, and optimize your query, indices, and shards. You can also enable slow search logs in order to monitor search run time, scan for heavy searches, and more.

4. All Shards Failed

When searching in OpenSearch, you may encounter an “all shards failed” error message. This happens when a read request fails to get a response from a shard. The request is then sent to a shard copy. After multiple request failures, there may be no available shard copies left. This can happen when the data is not yet searchable because the cluster or node is still in an initial start process, or when the shard is missing or in recovery mode and the cluster is red.

Many issues can cause this: the node may be disconnected or rejoined; the shards being queried may be in recovery and, therefore, not available; the disk may have been corrupted; a search may have been poorly written (for example, referring to a field with the wrong field type); or a configuration error may be causing an operation to fail.

5. Process Memory Locking Failed: “memory locking requested for OpenSearch process but memory is not locked”

For your node to remain healthy, you must ensure that none of the JVM memory is ever swapped out to disk. You can do this by setting bootstrap.memory_lock to true. In addition, ensure that you’ve set up memory locking correctly by consulting the OpenSearch documentation.

If OpenSearch is unable to lock memory, you will encounter this error message: “memory locking requested for OpenSearch process but memory is not locked.” This can happen when a user running OpenSearch doesn’t have the right permissions. These permissions can be granted by setting unlimit -1 to unlimited as root before starting OpenSearch, or by setting memelock to unlimited in /etc/security/limits.conf. Afterward, set MAX_LOCKED_MEMORY to unlimited and LimitMEMLOCK to infinity. This will prevent OpenSearch from becoming non-responsive and help avoid large GC pauses.

6. OpenSearch Bootstrap Checks Failed

Bootstrap checks inspect various settings and configurations before OpenSearch starts to make sure it will operate safely. If bootstrap checks fail, they can prevent OpenSearch from starting (if you are in production mode) or issue warning logs in development mode. It’s recommended to familiarize yourself with the settings enforced by bootstrap checks, noting that they are different in development and production modes. By setting the system property es.enforce.bootstrap.checks to true, you can avoid bootstrap checks altogether.

7. TransportError

In OpenSearch, the transport module refers to communication between nodes in a cluster and is used for every call that goes from one node to another. Transport errors are generic, and failures can be due to anything ranging from missing shards, conflicting settings, poorly structured content, network failures, and missing headers.

There are different types of transport errors. One error message—“TransportError(403, u’cluster_block_exception’, u’blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];’)”—can occur when indexes become read only. This can happen when there isn’t enough available disk space for OpenSearch to allocate and relocate shards to and from nodes. To solve this particular issue, you can increase your disk space, delete old data to free up space, or update your index read-only mode.

Another type of transport error can appear when you try to use an index that was just created, before all the shards were allocated. In this case, you will get a TransportError(503, u”). Transport errors can also be linked to problems with mapping. For example, TransportError (400, u’mapper_pasing_exception’) can occur when you attempt to index a field with a data type that is different than its mapping.

8. Initialization/Startup Failures

Sometimes, seemingly trivial issues can prevent OpenSearch from starting. For instance, when using conflicting versions of OpenSearch, you may get error messages such as “OpenSearch java client initialization fails” or “\Common was unexpected at this time.”

How to Minimize Errors and Exceptions: Dealing with the Deeper Issues at Play

If you look beyond tackling one error message at a time, you’ll begin to notice that many errors and exceptions are linked to one of three deeper causes: issues with setup and configuration, indexing new information, or cluster slowness. Let’s take a look at some basic guidelines for tackling these problems.

Setup and configuration: It’s easy to set up OpenSearch quickly, but making sure it’s production grade requires mindfully configuring your settings. This can help avoid a broad range of errors and exceptions, such as bootstrap checks failure.
Indexing new information: In OpenSearch, you must use templates properly, know the scheme structure, and carefully name your variables accordingly. Paying careful attention to these parameters can help you avoid issues like mapping exceptions and bulk index errors.
Cluster slowness: As operations begin to scale, OpenSearch can sometimes slow down unexpectedly, with timeout errors popping up left and right. For this reason, it is crucial that you constantly monitor the activity of your cluster—observing error rate, error logs, and rejected metrics to make sure everything is operating as expected.

Conclusion

Errors and exceptions are bound to arise while operating OpenSearch. Although you can’t avoid them completely, there are some best practices you can employ to help reduce them and to solve problems more efficiently when they do arise. These include paying close attention to your initial setup and configuration and being particularly mindful when indexing new information. In addition, you should have strong monitoring and observability in your system, which is the first basic component of quickly and efficiently getting to the root of complex problems like cluster slowness. In short, instead of dreading their appearance, you can treat errors and exceptions as an opportunity to optimize your OpenSearch infrastructure.

To easily solve OpenSearch errors, we recommend you try AutoOps for OpenSearch. AutoOps diagnoses issues in OpenSearch based on hundreds of metrics pulled by a lightweight agent. Once diagnosed, the system not only provides root cause analysis, but also resolves the issues. Try it for free.

The post How to solve 8 common OpenSearch errors appeared first on Opster.

OpenSearch requirements in production

sharon — Thu, 17 Nov 2022 12:01:01 +0000

Last updated on: November 2022

OpenSearch, an open-source, full-text search engine, allows for massive volumes of data to be stored, searched, and analyzed rapidly in near real-time. OpenSearch is employed behind the scenes, integrating with backend infrastructure where it provides the underlying technology that powers applications.

OpenSearch teams have made a tremendous effort in designing OpenSearch so that it can be set up fairly quickly and reliably, without having to invest much thought in its initial configuration. When a new cluster is first created, the scale is usually small, and everything runs smoothly out-of-the-box.

However, unforeseen complications begin to arise once the OpenSearch cluster begins to scale. As the cluster is loaded with more and more data, and indexing and searches are run more frequently, companies begin to experience severe problems such as outages, degraded performance, data loss, and security breaches. Too often, by the time a company realizes that OpenSearch requires additional resources, time, and/or expertise, it has already become a central component of their operations.

At Opster, we’ve seen many potentially disastrous mistakes made when working with OpenSearch. In this blog post, we present five major concerns that should be addressed before your OpenSearch, whether already in production or not, can be considered truly production-ready.

Neglecting to Look Inside

It’s enticing to deploy OpenSearch and just forget about its inner workings. But, because OpenSearch can suddenly slow down, nodes can get disconnected, and systems can even crash unexpectedly. Without proper monitoring and observability, you won’t know why this happened, how it can be fixed, or how to avoid the problem in the future.Monitoring and observability are critical, not just for when things break down, but also for the relentless optimization required of enterprises that wish to maintain their competitive edge. While monitoring reveals whether or not a system is operating as expected, it can’t improve current performance, and it doesn’t explain why something isn’t working the way it should. This is where observability comes in.

Observability gives an end-to-end view of processes, detecting undesirable behavior (such as downtime, errors, and slow response time) and identifying the root causes of problems.

Observability is achieved using logs, metrics, and traces—three powerful tools that are often referred to as the three pillars of observability.

When complex distributed systems start to malfunction, good visibility is crucial for pinpointing the root of the problem and significantly reducing time to resolution. The OpenSearch community provides free open-source monitoring tools that can help enhance visibility, such as Cerbro.

Misconfigured Circuit Breakers

In OpenSearch, circuit breakers are used to limit memory usage so that operations do not cause an OutOfMemoryError. Sometimes, a modest adjustment to your circuit breakers can make the difference between high-performing clusters and detrimental downtime. OpenSearch queries, whether initiated directly by users or by applications, can become extremely resource-intensive. While the default circuit breaker settings may be adequate in some cases, often adjusting breaker limits is absolutely necessary to ensure that queries do not impede performance or cause outages due to running out of memory (OOM).

Poorly Configured Security Settings

It’s dangerously easy to misconfigure OpenSearch security settings. If you are not proactive about your security settings, your OpenSearch database can be exposed or leaked. Common security oversights include exposing the OpenSearch rest API to the public internet, not changing default passwords, and neglecting to encrypt data in transfer or at rest. These oversights can leave OpenSearch servers vulnerable to malware or ransomware and subject data to theft or corruption.

Even if your OpenSearch is configured properly with optimal security settings, unprotected OpenSearch Dashboards instances can still compromise your data. OpenSearch Dashboards is an open-source project that performs data analytics and visualization of OpenSearch data. The platform performs advanced analytics on data that it pulls from OpenSearch databases, which it presents graphically through charts, tables, and maps. The problem is that OpenSearch Dashboards isn’t equipped with comprehensive built-in security, especially when being used with the free open-source version of OpenSearch.

Disks and Data Loss

Developer forums are filled with confusion about lost data nodes and unassigned shards in OpenSearch. This calls to attention the necessity of handling disks mindfully to avoid losing data. If you’re not careful when selecting disks for your data nodes, you might find that shards are unassigned and that data is lost after restart. Ensure that data and master-eligible nodes are using persistent storage. in the case of ephemeral disks, however, this is not enough. It is common to select ephemeral disks for their high performance and cost-efficiency; but, without taking the proper precautions, this choice can lead to data loss. When using ephemeral disks, you must have more than one copy of each shard and have a reliable procedure in place to restore data in case all copies are gone.

In the case of ephemeral disks, however, this is not enough. It is common to select ephemeral disks for their high performance and cost-efficiency; but, without taking the proper precautions, this choice can lead to data loss. When using ephemeral disks, you must have more than one copy of each shard and have a reliable procedure in place to restore data in case all copies are gone.

Neglecting Backup and Restore

Although everyone agrees that backup and restoration are important, many companies do not have sufficient backup and restore strategies in place for their OpenSearch clusters.

There’s a lot to take into account when protecting data in OpenSearch. For starters, you should make sure that all your important information is backed up. This may seem obvious, but, because indices are added constantly, you may not have snapshots of all your vital indices, backup may not run as often as it should, and backup processes may fail silently—oversights that you may only discover after it’s too late. Keep in mind that running backup procedures is resource-intensive, so it should be done when the cluster is less loaded.

Even if your backup appears to be running perfectly, you should periodically execute restore procedures to make sure that the data is truly restorable. This can be very time-consuming, so it is advisable to predetermine the order of restoration, ensuring that the most vital data is taken care of first.

Sometimes it’s wiser not to use backup and restore at all. When OpenSearch mirrors another data source, i.e., it is not the single point of truth, it might be advisable to reconstruct the indices from scratch by reindexing data from the other single point of truth.This might take longer, depending on the nature of the data, but it can take the load off your OpenSearch backup processes, mitigating costs and reducing storage space.

Summary

OpenSearch is a powerful and widely-used search engine that is at the core of many of today’s technological platforms. It may be easy to manage at first, but as your business scales, you will encounter serious problems if you have not taken some necessary precautions. To ensure that your OpenSearch is fully prepared for production, it’s imperative that you avoid the major pitfalls detailed above.

To detect and resolve OpenSearch errors, we recommend you try the AutoOps platform. AutoOps diagnoses issues in OpenSearch based on hundreds of metrics pulled by a lightweight agent. Once diagnosed, the system not only provides root cause analysis, but also resolves the issues. Try it for free.

The post OpenSearch requirements in production appeared first on Opster.

How to Improve OpenSearch Search Performance

sharon — Tue, 15 Nov 2022 17:15:11 +0000

Last Updated : November 2022

If you’re suffering from poor search performance, you should run Opster’s Search Log Analyzer. With Opster’s Analyzer, you can easily locate slow searches and understand what led to them adding additional load to your system. You’ll receive customized recommendations for how to improve your search performance. The tool is free and takes just 2 minutes to run.

14 tips on how to reduce OpenSearch search latency and optimize search performance:

Size parameter
Assigning a huge value to size parameter causes OpenSearch to compute vast amounts of hits, which causes severe performance issues. Instead of setting a huge size, you should batch requests in small sizes.
Shards and replicas
Optimize necessary index settings that play a crucial role in OpenSearch performance, like the number of shards and replicas. In many cases having more replicas helps improve search performance. Please refer to Opster’s guide on shards and replicas to learn more.
Deleted documents
Having a large number of deleted documents in the OpenSearch index also causes search performance issues. Force merge API can be used to remove a large number of deleted documents and optimize the shards.
Search filters
Effective use of filters in OpenSearch queries can improve search performance dramatically as the filter clauses are 1) cached, and 2) able to reduce the target documents to be searched in the query clause.
Wildcard queries
Avoid wildcard, especially leading wildcard queries, which causes the entire OpenSearch index to be scanned.
Regex and parent-child
Note that Regex queries and parent-child can cause search latency.
Implementing features
There are multiple ways to implement a specific feature in OpenSearch. For example, Autocomplete can be implemented in various styles. This documentation gives a 360-degree view of both functional and non-functional features.
Multitude of small shards
Having many small shards could cause a lot of network calls and threads, which severely impact search performance.
Heavy aggregations
Avoid heavy aggregations that involve unique IDs.
Timeout and terminate
Timeout param and terminate after param can be useful when executing heavy searches, or when result data is vast.
Search templates
Use search templates to achieve better abstraction, meaning without exposing your query syntax to your users. Search templates also help you transfer less data over the network, which is particularly useful when you have large OpenSearch queries.
Multi search API
Use msearch whenever possible. In most of the applications it’s required to query multiple OpenSearch indices for a single transaction, and sometimes users do so in a serial order even when it’s not required. In both cases, when you need to query multiple indices for the same transaction and when the result of these queries are independent, you should always use msearch to execute the queries in parallel in OpenSearch.
Term queries
Use term query when you need an exact match and on keywords fields. By default, OpenSearch generates both text and keyword fields for every field that consists of a string value if explicit mapping is not supplied. Users tend to use the match query even on keyword data types like product-ids, which is costly as match query goes through an analysis operation. Always use term query on keyword data types and wherever you need exact searches for better performance.
Source filtering
_source filtering is a great way to improve the performance of OpenSearch queries when retrieving a large number of documents or documents of large sizes. By default, OpenSearch returns the complete source of matching documents. If you don’t need _source at all or need only values of specific fields, you can achieve this with _source filtering.

Say goodbye to search latency and related downtime – Opster’s Search Gateway transforms the way searches are handled in OpenSearch.

Aside from gaining deep visibility of searches and the option to group data by users and application, the Gateway provides users with the unique ability to block heavy searches and prevent them from degrading performance and breaking clusters. Learn more about the Search Gateway and book a demo to get started.

The post How to Improve OpenSearch Search Performance appeared first on Opster.

Improve your OpenSearch Reindex Performance with these Tips

sharon — Tue, 15 Nov 2022 16:10:54 +0000

Learn how to reindex OpenSearch more efficiently and improve OpenSearch reindexing performance by following these tips:

Disable Replicas
Disable replicas when building a new index from scratch that is not serving the search traffic. Replicas can be changed dynamically later on once re-indexing has been completed.
Disable Refresh Interval
Disable refresh interval again. It can be changed once re-indexing has been completed.
Use Bulk API
Use the bulk API with multiple clients to get the maximum throughput from OpenSearch (Benchmark OpenSearch cluster to avoid any performance issues).
Increase Buffer Size
Increase index buffer size and tune it.
Use Reindex API
If _source field is enabled and you are re-indexing in the case of changing analyzer on the existing fields (breaking changes), use Reindex API of OpenSearch.
Disable Merge Throttling
Disable merge throttling by changing the setting `indices.store.throttle.type` to none. If you have a massive write-heavy index, then you can make it permanent.
Ensure Optimal Scalability Settings
Choosing the optimal number of primary shards is crucial for scalability, which can’t be changed later on. Refer to Opster’s guide to shards and replicas to understand more. Also, make sure you don’t end up creating “hotspots” in the cluster.

To easily improve your indexing and search performance, we recommend you try AutoOps for OpenSearch. AutoOps detects issues and improves OpenSearch performance by analyzing shard sizes, threadpools, memory, snapshots, disk watermarks, and more. Try it for free.

The post Improve your OpenSearch Reindex Performance with these Tips appeared first on Opster.

OpenSearch Shards and Replicas: Getting Started Guide

sharon — Tue, 15 Nov 2022 16:09:53 +0000

Introduction

Published on : November 2022

OpenSearch enhances the power of Lucene by building a distributed system on top of it, and, in doing so, addresses the issues of scalability and fault tolerance. It also exposes a JSON-based REST API, making interoperability with other systems very straightforward.

Distributed systems like OpenSearch can be very complex, with many factors that can affect their performance and stability. Shards and replicas are among the most fundamental concepts in OpenSearch, and understanding how these work will enable you to effectively manage an OpenSearch cluster.

This article explains what shards and replicas are, their impact on an OpenSearch cluster, and what tools exist to tune them to varying demands.

Understanding Shards

Data in an OpenSearch index can grow to massive proportions. In order to keep it manageable, it is split into a number of shards. Each OpenSearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the OpenSearch index. Splitting indices in this way keeps resource usage under control. An Apache Lucene index has a limit of 2,147,483,519 documents.

Having shards that are too large is simply inefficient. Moving huge indices across machines is time- and labor-intensive process. First, the Lucene merges would take longer to complete and would require greater resources. Moreover, moving the shards across the nodes for rebalancing would also take longer and recovery time would be extended. Thus by splitting the data and spreading it across a number of machines, it can be kept in manageable chunks and minimize risks.

Having the right number of shards is important for performance. It is thus wise to plan in advance. When queries are run across different shards in parallel, they execute faster than an index composed of a single shard, but only if each shard is located on a different node and there are sufficient nodes in the cluster. At the same time, however, shards consume memory and disk space, both in terms of indexed data and cluster metadata. Having too many shards can slow down queries, indexing requests, and management operations, and so maintaining the right balance is critical.

It is when an index is created that the number of shards is set, and this cannot be changed later without reindexing the data. When creating an index, you can set the number of shards and replicas as properties of the index:

PUT /sensor
{
    "settings" : {
        "index" : {
            "number_of_shards" : 6, 
            "number_of_replicas" : 2 
        }
    }
}

The ideal number of shards should be determined based on the amount of data in an index. Generally, an optimal shard should hold 30-50GB of data. For example, if you expect to accumulate around 300GB of application logs in a day, having around 10 shards in that index would be reasonable.

During their lifetime, shards can go through a number of states, including:

Initializing: An initial state before the shard can be used.
Started: A state in which the shard is active and can receive requests.
Relocating: A state that occurs when shards are in the process of being moved to a different node. This may be necessary under certain conditions, for example, when the node they are on is running out of disk space.
Unassigned: The state of a shard that has failed to be assigned. A reason is provided when this happens, for example, if the node hosting the shard is no longer in the cluster (NODE_LEFT) or due to restoring into a closed index (EXISTING_INDEX_RESTORED).

In order to view all shards, their states, and other metadata, use the following request:

GET _cat/shards

To view shards for a specific index, append the name of the index to the URL, for example

sensor:
GET _cat/shards/sensor

This command produces output, such as in the following example. By default, the columns shown include the name of the index, the name (i.e. number) of the shard, whether it is a primary shard or a replica, its state, the number of documents, the size on disk, the IP address, and the node ID.

sensor 5 p STARTED    0  283b 127.0.0.1 ziap
sensor 5 r UNASSIGNED                   
sensor 2 p STARTED    1 3.7kb 127.0.0.1 ziap
sensor 2 r UNASSIGNED                   
sensor 3 p STARTED    3 7.2kb 127.0.0.1 ziap
sensor 3 r UNASSIGNED                   
sensor 1 p STARTED    1 3.7kb 127.0.0.1 ziap
sensor 1 r UNASSIGNED                   
sensor 4 p STARTED    2 3.8kb 127.0.0.1 ziap
sensor 4 r UNASSIGNED                   
sensor 0 p STARTED    0  283b 127.0.0.1 ziap
sensor 0 r UNASSIGNED

Understanding Replicas

While each shard contains a single copy of the data, an index can contain multiple copies of the shard. There are thus two types of shard, the primary shard and a copy, or replica. Each replica of the shard is always located on a different node, which ensures access to your data in the event of a node failure. In addition to redundancy and their role in preventing data loss and downtime, replicas can also help boost search performance by allowing queries to be processed in parallel with the primary shard, and therefore faster.

There are some important differences in how primary and replica shards behave. While both are capable of processing queries, indexing requests must first go through primary shards before they can be replicated to the replica shards. As noted above, if a primary shard becomes unavailable—for example, due to a node disconnection or hardware failure—a replica is promoted to take over its role.

While replicas can help in the case of a node failure, replicas use up memory and disk space, as do primary shards. They also use compute powers when indexing, so it is also important not to have too many. Another difference between the primary shards and replicas is that while the number of primary shards cannot be changed after the index has been created, the number of replicas can be altered at any time.

Another factor to consider with replicas is the number of nodes available. Replicas are always placed on different nodes from the primary shard, since two copies of the same data on the same node would add no protection if the node were to fail. As a result, for a system to support n replicas, there need to be at least n + 1 nodes in the cluster. For instance, if there are two nodes in a system and an index is configured with six replicas, only one replica will be allocated. On the other hand, a system with seven nodes is perfectly capable of handling one primary shard and six replicas.

Optimizing Shards and Replicas

Even after an index with the right balance of shards and replicas has been created, these need to be monitored, as the dynamics around an index change over time. For instance, when dealing with time series data, indices with recent data are generally more active than older ones. Without tuning these indices, they would all consume the same amount of resources, despite their very different requirements.

The rollover index API can be used to separate newer and older indices. It can be set to automatically create a new index once a certain threshold—an index’s size on the disk, number of documents, or age—is reached. This API is also useful for keeping shard sizes under control. Because the number of shards cannot be easily changed after index creation, if no rollover conditions are met, shards will continue to accumulate data.

For older indices that only require infrequent access, shrinking and force merging an index are both ways to reduce their memory and disk footprints. The former reduces the number of shards in an index, while the latter reduces the number of Lucene segments and frees up space used by documents that have been deleted.

Shards and Replicas As the Foundation of OpenSearch

OpenSearch has built a strong reputation as a distributed storage, search, and analytics platform for huge volumes of data. When operating at such scale, however, challenges will inevitably arise. This is why understanding shards and replicas is so important and fundamental to OpenSearch, as this can help to optimize the reliability and performance of the platform.

Knowing how they work and how to optimize them is critical for achieving a more robust and performant OpenSearch cluster. If you are experiencing sluggish query responses or outages on a regular basis, this knowledge may be the key to overcoming these obstacles.

To easily optimize your shards & replicas and resolve other issues, we recommend you try AutoOps for OpenSearch. AutoOps will detect issues and improve your OpenSearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more. Try it for free.

The post OpenSearch Shards and Replicas: Getting Started Guide appeared first on Opster.