Sep 20, 20218 min read

[Elasticsearch 101] Basic, Mapping & Examples

Within a search engine, mapping defines how a document is indexed and how it indexes and stores its fields. We can compare mapping to a database schema in how it describes the fields and properties that documents hold, the datatype of each field (e.g., string, integer, or date), and how those fields should be indexed and stored by Lucene. It is very important to define the mapping after we create an index—an inappropriate preliminary definition and mapping may result in the wrong search results.

TL;DR:

About mapping
Elasticsearch GET mapping requests
Deprecations of Mapping Types
Date-Type Fields
Meta Fields
Two Mapping Types
When to use Text or Keyword Data Types
Prevent Elasticsearch Mapping Explosions

About mapping

Mapping is intended to define the structure and field types as required based on the answers to certain questions. For example:

Which string fields should be full text and which should be numbers or dates (and in which formats)?
When should you use the _all field, which concatenates multiple fields to a single string and helps with analyzing and indexing?
What custom rules should be set to update new field types automatically as they are added (e.g., the dynamic mapping type, which we will discuss further later on)?

Elasticsearch GET mapping requests

The basic request is rather simple, with an optional parameter focusing the GET _mapping request at a specific index:

GET /_mapping

GET /<index>/_mapping

You can also retrieve Elasticsearch mapping for multiple indices at once rather easily:

GET /,/_mapping

As a more concrete example, say you’re looking for particular parts of speech in your NLP indices:

GET /verbs,nouns/_mapping

An example of a return Elasticsearch GET _mapping JSON result would be:

{
        "verbs" : {
                "mappings" : {
                        "properties" : { },
                        "go" : {
                                "type" : "text"
                        },
                        "walk" : {
                                "type" : "text"
                        }
                }
        }
}

Deprecations of Mapping Types

With Elasticsearch 7.0.0, mapping types were deprecated (with limited support in Elasticsearch 6.0.0). However, knowing how they worked can help understand current versions of Elasticsearch, as well as aid in dealing with earlier versions. Each index had one or more mapping types that were used to divide documents into logical groups. Basically, a type in Elasticsearch represented a class of similar documents and had a name such as customer or item. Lucene has no concept of document data types, so Elasticsearch would store the type name of each document in a metadata field of a document called _type. When searching for documents within a particular type, Elasticsearch simply used a filter on the _type field to restrict the search.

In addition, mappings are the layer that Elasticsearch still uses to map complex JSON documents into the simple flat documents that Lucene expects to receive. Each mapping type had fields or properties that meta-fields and various data types would define.

Hence, mapping types would appear as such:

curl -X PUT 'http://localhost:9200/students' -d '{
   "mappings": {
     "student": { 
       "properties": { 
         "name":     { "type": "keyword"  },
         "degree"    { "type": "keyword" },
         "age":      { "type": "integer" }  
         },
       "properties": { 
         "performance": { "type": "keyword"  } 
         }
     }
   }
 }'

Combining the _type field with the _id field of each document generated a new _uid field that combined multiple documents in a unified index.

So to index a new student, for example, we would use:

curl -X PUT 'http://localhost:9200/students/student/1' -d '
{
"name" : "Isaac Newton", 
 "age":  14,
 "performance": "honor student"
}'

And when querying Elasticsearch for a student, we would use the mapping type by including it in the URL:


curl -X GET 'http://localhost:9200/students/student/_search' -d '
{
  "query": {
    "match": {
      "name": "Isaac Newton"
    }
  }
}'

Elasticsearch Mapping Types Alternatives

Two main alternatives to mapping types are recommended: 1) to index per document type, OR 2) to create a customer type field.

Index per document type

First, let’s look at indexing according to document type. Indices are independent from one another, so you can use the same name for a field type in each index without issue. As per the explanation a few paragraphs above, you lose the _uid field, but retain the _type and _id fields. If you are indexing comments on an e-commerce page, you would index comments and user but not combined in a single index. This also has the added advantages of 1) more accurate term statistics because of more precise, single entity documents, AND 2) that it will work better with Lucene’s dense data storage strategy setting for between 4,096 and 65,535 documents (65,535 being a block’s capacity).

Custom type field

Implement a custom type field that operates in a similar manner to the deprecated _typefield.

Of course, there is a limit to how many primary shards can exist in a cluster so you may not want to waste an entire shard for a collection of only a few thousand documents. In this case, you can implement your own custom type field which will work in a similar way to the old _type.

Data-Types field

When we create mapping, each mapping type will be a combination of multiple fields or lists with various types. For example, a “user” type may contain fields for title, first name, last name, and gender whereas an “address” type might contain fields for city, state, and zip code.

Elasticsearch supports a number of different data types for the fields in a document:

Core data types: String, Date, Numeric (long, integer, short, byte, double, and float), Boolean, Binary
Complex data types: Array: Array support does not require a dedicated type Object: Object for single JSON objects Nested: Nested for arrays of JSON objects
Geo data types: Geo-point: Geo_point for latitude/longitude points Geo-Shape: Geo_shape for complex shapes such as polygons
Specialized data types: IPv4: ip for IPv4 addresses Completion: completion to provide autocomplete suggestions Token count: token_count to count the number of tokens in a string Attachment: Mapper-attachments plugin which supports indexing attachments in formats such as Microsoft Office, Open Document, ePub, and HTML, into an attachment datatype

Note: In versions 2.0 to 2.3, dots were not a permitted form in field names. Elasticsearch 2.4.0 adds a system property called mapper.allow_dots_in_name that disables the check for dots in field names.

Meta Fields

Meta fields customize how a document’s associated metadata is treated. Each document has associated metadata such as the _index, mapping _type, and _id meta-fields. The behavior of some of these meta-fields could be custom when a mapping type was created.

Identity meta-fields
- _index: The index to which the document belongs.
- _uid: A composite field consisting of the _type and the _id.
- _type: The document’s mapping type.
- _id: The document’s ID.
Document source meta-fields
- _source: The original JSON representing the body of the document.
- _size:The size of the _source field in bytes, provided by the mapper-size plugin.
Indexing meta-fields
- _all: A catch-all field that indexes the values of all other fields.
- _field_names: All fields in the document that contain non-null values.
- _timestamp: A timestamp associated with the document, either specified manually or auto-generated.
- _ttl: How long a document should live before it is automatically deleted.
Routing meta-fields
- _parent: Used to create a parent-child relationship between two mapping types.
- _routing: A custom routing value that routes a document to a particular shard.
Other meta-field
- _meta: Application specific metadata.

Example

To create a mapping, you will need the Put Mapping API that will help you to set a specific mapping definition for a specific type, or you can add multiple mappings when you create an index.

An example of mapping creation using the Mapping API:

PUT 'Server_URL/Index_Name/_mapping/Mapping_Name'
{
            "type_1" : {
                        "properties" : {
                                    "field1" : {"type" : "string"}
                        }
            }
}

In the above code:

Index_Name: Provides the index name to be created
Mapping_Name: Provides the mapping name
type_1 : Defines the mapping type
Properties: Defines the various properties and document fields
{“type”}: Defines the data type of the property or field

Below is an example of mapping creation using an index API:

PUT /index_name

{
  "mappings":{
            "type_1":{
                        "_all" : {"enabled" : true},
                        "properties":{
                                    "field_1":{ "type":"string"},
                                    "field_2":{ "type":"long"}
                        }
            },
            "type_2":{
                        "properties":{
                                    "field_3":{ "type":"string"},
                                    "field_4":{ "type":"date"}
                        }
            }
  }
}

In the above code:

Index_Name: The name of the index to be created
type_1: Defines the mapping type
_all: The configuration metafield parameter. If “true,” it will concatenate all strings and search values
Properties: Defines the various properties and document fields
{“type”}: Defines the data type of the property or field

Two Mapping Types

Elasticsearch supports two types of mappings: “Static Mapping” and “Dynamic Mapping.” We use Static Mapping to define the index and data types. However, we still need ongoing flexibility so that documents can store extra attributes. To handle such cases, Elasticsearch comes with the dynamic mapping option that was mentioned at the beginning of this article.

Static Mapping

In a normal scenario, we know well in advance which kind of data the document will store, so we can easily define the fields and their types when creating the index. Below is an example in which we are going to index employee data into an index named “company” under the type “employeeInfo.”

Sample document data:

{
"name" : {"first" :"Alice","last":"John"},
"age" : 26,
"joiningDate" : "2015-10-15"
}

Example:

PUT /company

{
  "mappings":{
    "employeeinfo":{
      "_all" : {"enabled" : true},
      "properties":{
        "name":{
          "type":"object",
          "properties":{
            "field_1":{
              "type":"string"
            },
            "field_2":{
              "type":"string"
            }
          }
        },
        "age":{
          "type":"long"
        },
        "joiningDate":{
          "type":"date"
        }
      }
    }
  }
}

In the above API:

employeeinfo: Defines the mapping type name
_all: The configuration metafield parameter. If “true,” it will concatenate all strings and search values
Properties: Defines various properties and document fields
{“type”}: Defines the data type of the property or field

Dynamic Mapping

Thanks to dynamic mapping, when you just index the document, you do not always need to configure the field names and types. Instead, these will be added automatically by Elasticsearch using any predefined custom rules. New fields can be added both to the top-level mapping type and to inner objects and nested fields. In addition, dynamic mapping rules can be configured to customize the existing mapping.

Custom rules help to identify the right data types for unknown fields, such as mapping true/false in JSON to boolean, while integer in JSON maps to long in Elasticsearch. Rules can be configured using dynamic field mapping or a dynamic template. When Elasticsearch encounters an unknown field in a document, it uses dynamic mapping to determine the data type of the field and automatically adds the new field to the type mapping.

However, there will be cases when this will not be your preferred option. Perhaps you do not know what fields will be added to your documents later, but you do want them to be indexed automatically. Perhaps you just want to ignore them. Or, especially if you are using Elasticsearch as a primary data store, maybe you want unknown fields to have an exception to alert you of the problem. Fortunately, you can control this behavior with the dynamic setting, which accepts the following options:

true: Add new fields dynamically — this is the default
false: Ignore new fields
strict: Throw an exception if it encounters an unknown field

Example:

PUT /index_name

{
  "mappings": {
    "my_type": {
      "dynamic": "strict",
      "properties": {
        "title":  { "type":"string"},
        "stash":  {
          "type": "object",
          "dynamic":  true
        }
      }
    }
  }
}

In the above API:

index_name – creates an index with this name
my_type – defines the mapping type name
“dynamic”: “strict” – the “my_type” object will throw an exception if an unknown field is encountered
“dynamic”: true – the “stash” object will create new fields dynamically
_all – the configuration metafield parameter. If “true,” it will concatenate all strings and search values
properties – defines the various properties and document fields
{“type”} – defines the data type of the property or field

With dynamic mapping, you can add new searchable fields into the stash object:

Example:

PUT /my_index/my_type/1

{
   "title": "This doc adds a new field",
   "stash": { "new_field": "Success!" }
}

But trying to do the same at the top level will fail:

PUT /my_index/my_type/1

{
   "title": "This throws a StrictDynamicMappingException",
   "new_field": "Fail!"
}

When to use Text or Keyword Data Types

Ending this article with a practical tip, here is a rule of thumb for mapping in Elasticsearch:

Text data types – Use when you require full-text search for particular fields such as the bodies of e-mails or product descriptions
Keyword data types – Use when you require an exact-value search, particularly when filtering (“Find me all products where status is available”), sorting, or using aggregations. Keyword fields are only searchable by their exact value. Use keyword data types when you have fields like email addresses, hostnames, status codes, zip codes, or tags.

Prevent Elasticsearch Mapping Explosions

If you create too many fields, you can overload your memory. These settings will help:

index.mapping.total_fields.limit– The max number of indexable fields, which is set 1000by default but might not be enough. Therefore, if you need to go above that, you should increase the indices.query.bool.max_clause_count setting in kind to limit the number of query boolean clauses.
index.mapping.depth.limit– The max depth for a field, defined by the number of layers of objects which is measured as the number of inner objects.
index.mapping.nested_fields.limit– The max number of nested mappings in an index (default 50)
index.mapping.nested_objects.limit – The max number of nested JSON objects within a single document across all nested types (default 10000).

Elastic’s docs also recommend looking at this setting, even though it has little to do with stemming a mapping explosion:

index.mapping.field_name_length.limit– The max length of a field name. The default value is Long.MAX_VALUE (no limit).