Invalid unicode character code; is a surrogate code – How to solve this Elasticsearch error

Opster Team

March-22, Version: 1.7-8.0

Before you begin reading this guide, we recommend you try running the Elasticsearch Error Check-Up which analyzes 2 JSON files to detect many configuration errors.

To easily locate the root cause and resolve this issue try AutoOps for Elasticsearch & OpenSearch. It diagnoses problems by analyzing hundreds of metrics collected by a lightweight agent and offers guidance for resolving them.

Take a self-guided product tour to see for yourself (no registration required).

This guide will help you check for common problems that cause the log ” Invalid unicode character code; is a surrogate code ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: plugin, parser.

Log Context

Log “Invalid unicode character code; [{}] is a surrogate code”classname  is AbstractBuilder.java We extracted the following from Elasticsearch source code for those seeking an in-depth context :

private static String hexToUnicode(Source source; String hex) {
 try {
 int code = Integer.parseInt(hex; 16);
 // U+D800—U+DFFF can only be used as surrogate pairs and therefore are not valid character codes
 if (code >= 0xD800 && code <= 0xDFFF) {
 throw new ParsingException(source; "Invalid unicode character code; [{}] is a surrogate code"; hex);
 }
 return String.valueOf(Character.toChars(code));
 } catch (IllegalArgumentException e) {
 throw new ParsingException(source; "Invalid unicode character code [{}]"; hex);
 }

 

Watch product tour

Try AutoOps to find & fix Elasticsearch problems

Analyze Your Cluster
Skip to content