Outstanding techniques for performance boost with MongoDB indexes

MongoDB, a popular NoSQL database, offers flexibility and scalability for handling large volumes of unstructured data, with a key feature being MongoDB indexes. This becomes crucial as your application grows, ensuring optimal performance. Effective utilization of indexes can significantly enhance application performance. This article will explore outstanding techniques for leveraging MongoDB indexes to enhance performance, accompanied by practical examples.

Why do we need indexes in MongoDB?

Swift Data Retrieval: Indexes provide a structured roadmap for quick document location, avoiding the need for time-consuming collection scans.
Reduced Response Times: Quick access to data means faster response times, crucial for applications with large datasets or high user loads.
Efficient Sorting and Filtering: Indexes streamline sorting and filtering processes, enhancing responsiveness and resource efficiency.
Aggregation Pipeline Optimization: Indexes expedite complex operations within aggregation pipelines, contributing to quicker and more resource-efficient outcomes.
Unique Constraints Support: Indexes enforce unique constraints, maintaining data integrity and accelerating queries reliant on unique identifiers.
Array and Text Search Efficiency: Indexing arrays and using text indexes streamline querying of fields with array values and support full-text search operations.
Reduced Disk I/O with Covered Queries: Indexes enable covered queries, reducing the need to load entire documents from disk and minimizing disk I/O operations for improved system performance.

Adding indexes to a MongoDB collection

Let's understand how we can add indexes to MongoDB collection. But before delving into the process of adding indexes to a MongoDB collection, let's first look into some essential prerequisites.

Prerequisites

MongoDB database server or MongoDB Atlas account.
We are going to use Mongosh to play around with the MongoDB database
Basic knowledge of MongoDB

How indexes work in MongoDB

Indexes in MongoDB serve as optimized data structures, specifically utilizing a B-tree (balanced tree) mechanism. These structures store a subset of the collection's data in a manner that enhances traversal efficiency.

When creating an index, MongoDB captures and organizes the values of designated fields or a combination of fields. The key feature of these indexes lies in their ordering, as they arrange the stored values based on the inherent order of the field values. This ordering enables swift and efficient operations such as equality matches and range queries.

Moreover, MongoDB can leverage the ordered nature of indexes to optimize query performance. When a query involves sorting, the database can utilize the existing index order to deliver sorted results more efficiently, reducing the need for additional processing.
In essence, MongoDB's index architecture significantly enhances query performance by pre-sorting and organizing a subset of the collection's data, facilitating rapid data retrieval and improving the overall efficiency of database operations.

Analyze query performance

As the saying goes “What gets measured gets improved.” So before jumping over the optimization we first need to measure our current performance and then after adding those optimization techniques, we can then measure the rest of the performances.

To enhance query performance in MongoDB, it's crucial to measure and analyze the current execution using the explain() method. This process helps identify areas for optimization. Let's examine a scenario with a collection named "orders."

testdb> db.orders.find({customer_name: 'Customer_18'}).explain("executionStats")

{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'testdb.orders',
    indexFilterSet: false,
    parsedQuery: { customer_name: { '$eq': 'Customer_18' } },
    queryHash: '7F83C870',
    planCacheKey: '7F83C870',
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'COLLSCAN',
      filter: { customer_name: { '$eq': 'Customer_18' } },
      direction: 'forward'
    },
    rejectedPlans: []
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 1,
    executionTimeMillis: 0,
    totalKeysExamined: 0,
    totalDocsExamined: 101,
    executionStages: {
      stage: 'COLLSCAN',
      filter: { customer_name: { '$eq': 'Customer_18' } },
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 102,
      advanced: 1,
      needTime: 100,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      direction: 'forward',
      docsExamined: 101
    }
  },
  command: {
    find: 'orders',
    filter: { customer_name: 'Customer_18' },
    '$db': 'testdb'
  },
  serverInfo: {
    host: '9b9537b16dce',
    port: 27017,
    version: '7.0.5',
    gitVersion: '7809d71e84e314b497f282ea8aa06d7ded3eb205'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600,
    internalQueryFrameworkControl: 'trySbeRestricted'
  },
  ok: 1
}

The query results provide various metrics for assessing the performance of our MongoDB collection. In this specific case, the table contains 101 documents.

executionStats: {
    executionSuccess: true,
    nReturned: 1,
    executionTimeMillis: 0,
    totalKeysExamined: 0,
    totalDocsExamined: 101,

Upon examining the execution statistics, it is evident that the "totalDocsExamined" metric is 101. This indicates that when searching the collection based on "customer_name," the system scans through all 101 documents to filter and retrieve the requested record.
Now that we have the basic performance metrics, we can begin the optimization process and later check these metrics again to see if there is any improvement. Let’s now create an index to see if we have seen any improvement

Create an index in MongoDB

To create an index in MongoDB you can use this command if you are using Mongosh

db.orders.createIndex( { customer_name: -1 } )

This command will create an index into the products collection on the field name. Here -1 signifies a descending order. Indexes in MongoDB can be either in ascending order (1) or descending order (-1).
To validate if the index was created correctly, run the below command which will list out all the indexes in the collection

db.orders.getIndexes()

This command will return us the indexes in the products collection

[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { customer_name: -1 }, name: 'customer_name_-1' }
]

Notice here that there are 2 indexes, the first one is the default index which is created by default by MongoDb on the _id field. The second one is the new index that we have just created.
Now if we analyze our performance again using the explain() query we can see an improvement

db.orders.find({customer_name: 'Customer_18'}).explain("executionStats")

 executionStats: {
    executionSuccess: true,
    nReturned: 1,
    executionTimeMillis: 0,
    totalKeysExamined: 1,
    totalDocsExamined: 1,

Here, we observe "totalDocsExamined: 1," indicating that when searching for an order with a specific customer name, the query is now targeting only one document instead of scanning through all the documents.
Now that we have seen the power of a simple index, let's look at the compound indexes.

Create compound Index

Compound indexes in MongoDB allow you to efficiently organize and sort data based on multiple fields within each document. By grouping data first by one field and then by subsequent fields, you enhance query performance.
For instance, in an e-commerce application, imagine a scenario where the store manager frequently queries low-stock items by name and quantity. Creating a compound index on both the item and quantity fields can significantly improve the speed of such queries. This optimization increases the likelihood of covered queries, where the index alone satisfies the query without needing to inspect the actual documents, resulting in better overall performance.
Here we are creating an index on both itemName and quantity.

db.inventory.createIndex({
  itemName: 1,   // 1 for ascending order
  quantity: -1   // -1 for descending order
})

Now we can query efficiently using a query, something like this:

db.inventory.find({
  itemName: "exampleItem",
  quantity: { $lte: 10 }
}).sort({
  itemName: 1,   // Sorting by 'itemName' (ascending) for better efficiency
  quantity: -1   // Sorting by 'quantity' (descending)
}).explain("executionStats");

In this example, the compound index on 'itemName' and 'quantity' is created using the createIndex command. The query then looks for items with a specific name and quantity less than or equal to 10, benefiting from the compound index. The explain("executionStats") is used to provide execution statistics, allowing you to verify the index usage and performance improvement.

Equality, Sort, Range (The ESR) rule

Now that we have understood the compound indexes, let's now understand how we can apply the ESR rule to organize the index keys and significantly improve the efficiency of compound indexes.
Equality: "Equality" involves exact matches on a single value. For an e-commerce example, consider searching for products with a specific category:

db.products.find({ category: "Electronics" })

To optimize this query, place fields requiring exact matches, such as category, first in the index. MongoDB allows flexibility in the order of index keys for equality matches but prioritizes fields with high selectivity to minimize scanned keys. Sort: "Sort" determines the result order. In an e-commerce context, sorting products by price after an equality match:

db.products.find({ category: "Clothing" }).sort({ price: 1 })

To improve performance, create an index on category and price:

db.products.createIndex({ category: 1, price: 1 })

Ensure the indexed fields are a subset of the query and include equality conditions for all prefix keys before sort keys.

Range: "Range" involves filters without exact matches. For an e-commerce application example, if we want to search for products within a price range then we can search the records based on the range like this:

db.products.find({ price: { $gte: 50, $lte: 200 } })

Here in the above query, we are searching for all the products whose price lie in the range of 50 to 200. Make range bounds as tight as possible and place range filters after sorting predicates for non-blocking index sorts. This ensures optimal query efficiency.
Additional Considerations: Inequality operators like $ne or $nin are treated as range operators. For e-commerce scenarios, queries involving product discounts or excluding specific brands fall into this category.
Consider an e-commerce query searching for high-end laptops from a specific brand with prices above $1000, sorted by model:

db.products.find({
  brand: 'TechCo',
  price: { $gt: 1000 },
  category: 'Laptops'
}).sort({ model: 1 })

Following the ESR Rule, the ideal index would be:

{ brand: 1, model: 1, price: 1 }

Compound Index: When queries involve a combination of keys, creating a compound index is more effective than having separate single-key indexes. This approach allows MongoDB to utilize the compound index for multiple queries. For example, if you frequently query on both "category" and "item," create a compound index:
```
db.products.createIndex({ "category": 1, "item": 1 })
```

This compound index supports queries on "category" alone and queries on both "category" and "item." It's important to note that a single compound index can support queries that search a "prefix" subset of its fields.

Prefix Indexes:

In some cases, using prefix indexes can offer better query performance, especially when dealing with large arrays. For example, an index like { x: 1, y: 1, z: 1 } can support queries that the following indexes support: { x: 1 } and { x: 1, y: 1 }. It's a versatile approach, and { x: 1, z: 1 } might provide better performance in certain scenarios.

db.products.createIndex({ "type": 1, "brand": 1, "model": 1 })

// This index supports queries like:
db.products.find({ "type": "laptop" })  // Prefix { "type": 1 }

db.products.find({ "type": "laptop", "brand": "XYZ" })  // Prefix { "type": 1, "brand": 1 }

Indexes for Text Search: For text search, MongoDB Atlas offers Atlas Search indexes. For self-managed deployments, MongoDB provides a text index type supporting string content search. To support string comparisons, operations must specify the same collation as the index.

For instance, if I know that users query the products that have “fresh” keywords in the description then I can use text search indexes like this

db.products.createIndex({ "description": "fresh" })

Collation Considerations: Collation settings are crucial when using indexes for string comparisons. Indexes with collation keys may have larger sizes, and the collation must match between the index and query operation. Compound indexes with non-string keys can still be utilized for comparisons on the index prefix keys with different collations.

Suppose your e-commerce application supports multiple languages, and you want to create an index for case-insensitive sorting on product names:

db.products.createIndex({ "name": 1 }, { collation: { locale: "en", strength: 2 } })

Now, queries with collation settings matching the index can be performed:

db.products.find({ "name": "laptop" }).collation({ locale: "en", strength: 2 })

Index Use and Sorting: Compound indexes can support both queries and sorting operations. Consider a compound index on "price" and "rating" for sorting operations:
```
db.products.createIndex({ "price": 1, "rating": 1 })
```
Now, you can efficiently query and sort products:
```
db.products.find({ "category": "electronics" }).sort({ "price": 1 })
```
In summary, understanding your query patterns and optimizing your indexes accordingly can significantly boost the performance of your MongoDB database.

Optimizing performance through full RAM utilization

To run the application in the offline mode, open the Chrome dev tools, go to the network tab, and change throttling to offline. Now our application will run in the offline mode.
To elevate the performance of your MongoDB system, it is imperative to optimize the utilization of indexes, particularly by ensuring that they fit seamlessly within the available RAM. This strategic approach mitigates the need for the system to resort to disk reads when accessing the index.
You can use this command to see the size of your indexes in bytes

db.products.totalIndexSize()
3754974720

In the above example, the index size stands at approximately 3.5 GB. To ensure optimal performance, it is essential not only to have sufficient RAM available but also to allocate adequate space for the remaining components of the working set.
For scenarios involving multiple collections, factor in the cumulative size of indexes across all collections. The simultaneous accommodation of both indexes and the working set within memory is crucial.
While there are instances where indexes may not necessarily need to fit entirely into memory, these are limited cases. Refer to the section on Indexes that Hold Only Recent Values in RAM for further insights.

Apply indexes on the correct column

Efficient MongoDB queries are crucial for optimization, and aligning your queries with the appropriate indexes or creating relevant indexes is essential. Take, for instance, the "products" collection, where there's a field named "isActive" containing either "true" or "false". Indexing this column alone would require MongoDB to scan through half of the documents for results.
To enhance query performance, consider creating a compound index, as we've previously discussed. In this case, you can establish a compound index on the "ProductType" and "isActive" fields. By doing so, you instruct MongoDB to specifically search for a particular product type with an active status.

db.products.createIndex({"ProductType": 1, "isActive": 1})

Optimizing MongoDB queries involves aligning them with appropriate indexes. Creating a compound index, such as on "ProductType" and "isActive," significantly improves query efficiency by specifying the fields for targeted searches, leading to enhanced overall performance.

Conclusion

In conclusion, we explored fundamental techniques to enhance MongoDB performance by leveraging indexes effectively. We began by understanding the pivotal role indexes play in optimizing data retrieval and response times. We did this by demonstrating the creation of single-key and compound indexes, with practical examples showcasing their impact on query efficiency.
Our journey extended to implementing the Equality, Sort, Range (ESR) rule, providing a strategic framework for organizing index keys based on specific query patterns. We then delved into considerations for choosing the right type of index, emphasizing scenarios where single-key, compound, prefix, text search, and collation-based indexes are most appropriate.
Furthermore, we emphasized the significance of full RAM utilization and aligning queries with the correct indexes. We did this by illustrating how these practices mitigate disk reads and enhance overall system performance. In adopting these techniques, we empower MongoDB users to optimize their databases, ensuring scalability and efficiency as their datasets evolve.

Resources

1.MongoDB docs, Flutter docs
2.MongoDB Indexes, Detailed documentation of MongoDB indexes

Outstanding techniques for performance boost with MongoDB indexes

Why do we need indexes in MongoDB?

Adding indexes to a MongoDB collection

How indexes work in MongoDB