MongoDB Essentials

Section 1

Introduction

MongoDB is a popular document-oriented NoSQL database that offers flexible schema design. It stores data in JSON-like documents, provides a rich query language, and supports client drivers in multiple programming languages. This Refcard covers MongoDB v4.4 onward up to v6.0. It is intended to help you get the most out of MongoDB and assumes that you already know the basics.

If you're just starting out, first, explore these resources:

For installation notes, see https://www.mongodb.com/docs/manual/installation
For a quick tutorial on basic operations, see https://www.mongodb.com/docs/manual/tutorial/getting-started
For a list of supported languages, see https://www.mongodb.com/languages

Section 2

Configuration Options

In this section, we cover configuration options for MongoDB.

Setup Options

Startup options for MongoDB can be set on the command line or in a configuration file. The syntax is slightly different between the two:

Table 1

Command Line	Config File
`--dbpath <path>`	`dbpath=<path>`
`--auth`	`auth=true`

Run mongod --help for a full list of options. Here are some of the most useful:

Table 2

Option	Description
`--config <filename>`	File for runtime configuration options
`--dbpath <path>`	The directory where the `mongod` instance stores its data
`--logpath <path>`	The path where MongoDB creates a file to send diagnostic logging information
`--logappend`	Appends new entries to the end of the existing log file when the `mongod` instance restarts
`--fork`	Enables daemon mode that runs the `mongod` process in the background
`--auth`	Enables authorization to control user access to database resources and operations
`--keyFile <path>`	Path to a shared secret that enables authentication on replica sets and sharding
`--bind_ip <options>`	Specifies where `mongod` should listen for client connections; this could be hostnames, IP addresses, and/or full Unix domain socket paths

View Options

If you started mongod with a bunch of options six months ago, how can you see which options you used? The shell has a helper:

    Shell
   
 
> db.serverCmdLineOpts()
{ "argv" : [ "./mongod", "--port", "30000" ], "parsed" : { }, "ok" : 1 }

The parsed field is a list of arguments read from a config file.

Section 3

Using the Shell

This section covers topics around MongoDB shell usage. Note that the mongo shell was deprecated in MongoDB v5.0 and removed from MongoDB v6.0. The replacement shell is mongosh.

Shell Help

There are a number of functions that give you a little help if you forget a command:

    Shell
   
 
​x
> // basic help
> help
​
  Shell Help:
​
    use       Set current database
    show      'show databases'/'show dbs': Print a list of all available databases.
​
              'show collections'/'show tables': Print a list of all collections for current database.
​
              'show profile': Prints system.profile information.
​
              'show users': Print a list of all users for current database.
​
              'show roles': Print a list of all roles for current database.
​
              'show log <type>': log for current connection, if type is not set uses 'global'
​
              'show logs': Print all logs.
                                               
​
    exit      Quit the MongoDB shell with exit/exit()/.exit
    quit      Quit the MongoDB shell with quit/quit()
​
    ...

Note that there are separate help functions for databases, collections, replica sets, sharding, administration, and more. Although not listed explicitly, there is also help for cursors:

    Shell
   
 
> // list common cursor functions
> db.foo.find().help()

You can use these functions and helpers as built-in cheat sheets. You can find the full list here: https://www.mongodb.com/docs/mongodb-shell/reference/access-mdb-shell-help

Seeing Function Definitions

If you don't understand what a function is doing, you can run it without the parentheses in the shell to see its source code:

    Shell
   
 
> // run the function
> db.serverCmdLineOpts()
{ "argv" : [ "./mongod" ], "parsed" : { }, "ok" : 1 }
> // see its source
> db.serverCmdLineOpts

This can be helpful for seeing what arguments the function expects or what errors it can throw, as well as how to run it from another language.

Using Edit

The shell has limited multi-line support, so it can be difficult to program in. The shell helper edit makes this easier, which opens up a text editor, allowing you to edit variables from there. For example:

    Shell
   
 
> x = function() { /* some function we're going to fill in */ }
>    edit x
<opens emacs with the contents of x>

Modify the variable in your editor, then save and exit. The variable will be set in the shell.

Either the EDITOR environment variable or a MongoDB shell variable EDITOR must be set to use edit. You can set it in the MongoDB shell as follows:

    Shell
   
> EDITOR="/user/bin/emacs"

Note that edit is not available from JavaScript scripts, only in the interactive shell.

Using .mongoshrc.js

If a .mongoshrc.js file exists in your home directory, it will run on shell startup automatically. Use it to initialize any helper functions you use regularly and remove functions you don't want to accidentally use. Use the --norc option to prevent .mongoshrc.js from being loaded.

For example, if you would prefer to not have dropDatabase() available by default, you could add the following lines to your .mongoshrc.js file:

    Shell
   
 
DB.prototype.dropDatabase = function() {
    print("No dropping DBs!");
}
db.dropDatabase = DB.prototype.dropDatabase;

The example above will change the dropDatabase() helper to only print a message, not to drop databases. Note: This technique should not be used for security because a determined user can still drop a database without the helper. However, removing dangerous admin commands, as shown in the example above, can help prevent fat-fingering.

Here are a couple suggestions for helpers you may want to remove from .mongoshrc.js:

DB.prototype.shutdownServer
DBCollection.prototype.drop
DBCollection.prototype.ensureIndex
DBCollection.prototype.reIndex
DBCollection.prototype.dropIndexes

Prompt Changes

The shell prompt can be customized by setting the prompt variable to a function that returns a string:

    Shell
   
 
prompt = function() { 
    try {
        db.getLastError();
    }
    catch (e) {
        print(e);
    }
    return (new Date())+"$";
}

If you set a prompt, it will be executed each time the prompt is drawn (thus, the example above would give you the time the last operation completed).

Try to include the db.getLastError() function call in your prompt. This is included in the default prompt and takes care of server reconnection and returning errors from writes. Also, always put any code that could throw an exception in a try/catch block, as shown in the example above. It's annoying to have your prompt turn into an exception!

Section 4

Diagnosing What's Happening

This section covers how to get detailed information on operations, index usage, replication status, and more.

Viewing and Killing Operations

You can see current operations with the currentOp function:

    Shell
   
 
> db.currentOp()
{
    "inprog" : [
        {
            "opid" : 123,
            "active" : false,
            "locktype" : "write",
            "waitingForLock" : false,
            "secs_running" : 200,
            "op" : "query",
            "ns" : "foo.bar",
            "query" : {
            }
            ...
        },
        ...
    ]
}

Using the opid field from above, you can kill operations:

    Shell
   
> db.killOp(123)

Not all operations can be killed or will be killed immediately. In general, operations that are waiting for a lock cannot be killed until they acquire the lock.

Index Usage

Use explain() to see which index MongoDB is using for a query. verbosity specifies the mode, which determines the amount of returned information. Possible modes include allPlansExecution (default), queryPlanner, and executionStats.

    Shell
   
 
> db.runCommand({
  explain: {
    count: "users",
    query: { age: { $gt: 30 } },
  },
  verbosity: "queryPlanner",
});
​
{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'test.users',
    indexFilterSet: false,
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: { stage: 'COUNT', inputStage: { stage: 'EOF' } },
    rejectedPlans: []
  },
  command: { count: 'users', query: { age: { '$gt': 30 } }, '$db': 'test' },
  serverInfo: {
    host: 'bdc9e348c602',
    port: 27017,
    version: '7.0.4',
    gitVersion: '38f3e37057a43d2e9f41a39142681a76062d582e'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
​
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600,
    internalQueryFrameworkControl: 'trySbeEngine'
  },
  ok: 1
}

There are several important fields in the output of explain():

explainVersion is the output format version.
command is the command being explained.
queryPlanner provides information about the selected and rejected plans by the query optimizer.
executionStats provides execution details of the accepted and rejected plans.
serverInfo provides information about the MongoDB instance.
serverParameters provides details about the internal parameters.

Types of Cursors

Here are some common cursor types in MongoDB:

Standard cursor is the default type returned by db.collection.find(). It iterates over query results in batches, retrieving data on demand from the server.
Change Stream cursor is a real-time data monitor, notifying you whenever a document in a collection is inserted, updated, deleted, or replaced.
Tailable cursor is a cursor for a capped collection that remains open after the client exhausts the results in the initial cursor.
Backup cursor is a type of tailable cursor that points to a list of backup files. Backup cursors are for internal use only.
Orphaned cursor is a cursor that is not correctly closed or iterated over in your application code. It can cause performance issues.

Hinting

Use hint() to force a particular index to be used for a query:

    Shell
   
> db.foo.find().hint({x:1})

System Profiling

You can turn on system profiling to see operations currently happening on a database. Note that there is a performance penalty to profiling, but it can help isolate slow queries.

    Shell
   
 
> db.setProfilingLevel(2) // profile all operations
> db.setProfilingLevel(1) // profile operations that take longer than 100ms
> db.setProfilingLevel(1, 500) // profile operations that take longer than 500ms
> db.setProfilingLevel(0) // turn off profiling
> db.getProfilingLevel(1) // see current profiling setting

Profile entries are stored in a capped collection called system.profile within the database in which profiling was enabled. Profiling can be turned on and off for each database.

Replica Sets

To find replication lag information for each secondary node, connect to the primary node of the replica set and run this command:

    Shell
   
 
> rs.printSecondaryReplicationInfo()
source: m1.demo.net:27002
    syncedTo: Mon Feb 01 2023 10:20:40 GMT-0800 (PST)
    20 secs (0 hrs) behind the primary

The above command prints a formatted output of the replica set status. You can also use db.printReplicationInfo() to retrieve the replica set member's oplog. Its output is identical to that of rs.printReplicationInfo().

To see a member's view of the entire set, connect to it and run the following command:

    Shell
   
> rs.status()

This command returns a structured JSON output and shows you what it thinks the state and status of the other members are. Running rs.status() on a secondary node will show you which node the secondary is syncing from in the syncSourceHost field.

Sharding

To see your cluster's metadata (shards, databases, chunks, etc.), execute the following command from the MongoDB shell (mongosh) connected to any member of the sharded cluster:

    Shell
   
> db.printShardingStatus()

If verbosity is set to true, it displays full details of the chunk distribution across shards along with the number of chunks on each shard:

    Shell
   
> db.printShardingStatus(true)

sh.status can also be executed on a mongos instance to fetch sharding configuration. Its output is the same as that of printShardingStatus:

    Shell
   
> sh.status()

You can also connect to the mongos and see data about your shards, databases, collections, or chunks by using use config, then querying the relevant collections:

    Shell
   
 
> use config
switched to db config
> show collections
changelog
chunks
collections
csrs.indexes
databases
migrationCoordinators
mongos
rangeDeletions
settings
shards
tags
version

Always connect to a mongos to get sharding information. Never connect or write directly to a config server; always use sharding commands and helpers.

After maintenance, sometimes mongos processes that were not actually performing the maintenance will not have an updated version of the config. Either bouncing these servers or running the flushRouterConfig command is generally a quick fix to this issue:

    Shell
   
 
> use admin
> db.runCommand({flushRouterConfig:1})

Often this problem will manifest as setShardVersion failed errors. Don't worry about setShardVersion errors in the logs, but they should not trickle up to your application. Note that you shouldn't get the errors from a driver unless the mongos it's connecting to cannot reach any config servers.

Section 5

Index Options

The table below provides several index options. For a complete list, refer to: https://www.mongodb.com/docs/manual/reference/method/db.collection.createIndex

Table 3

Index Option	Description
`unique`	If not specified, MongoDB generates the index name by concatenating the names of indexed fields and the sort order.
`name`	The directory where the `mongod` instance stores its data
`partialFilterExpression`	If specified, the index will only reference documents that match the provided filter expression.
`sparse`	If set to `true`, the index will only reference documents with the specified field; it is `false` by default.
`expireAfterSeconds`	Time to live (in seconds) that controls how long MongoDB retains documents in this collection.
`hidden`	Controls whether the index is hidden from the query planner.
`storageEngine`	Specifies storage engine during index creation.

Section 6

Query Operators

Queries are generally of the form: {key : {$op : value}}

For example: {age : {$gte : 18}}

There are three exceptions to this rule — $and, $or, and $nor — which are all top level: {$or : [{age: {$gte : 18}}, {age : {$lt : 18}, parentalConsent:true}}]}

Updates are always of the form: {key : {$mod : value}}

For example: {age : {$inc : 1}}

The symbols in Table 4 indicate the following:

✓ = matches
X = does not match

Table 4

Operator	Example Query	Result
`$gt`, `$gte`, `$lt`, `$lte`, `$ne`	`{numSold : {$lt:3}}`	✓ `{numSold: 1}` X `{numSold: "hello"}` X `{x : 1}`
`$in`, `$nin`	`{hand : {$all : ["10","J","Q","K","A"]}}`	✓ `{hand: ["7", "8", "9", "10", "J", "Q", "K", "A"]}` X `{hand:["J","Q","K"]}`
`$all`	`{hand : {$all : ["10","J","Q","K","A"]}}`	✓ `{hand: ["7", "8", "9", "10", "J", "Q", "K", "A"]}` X `{hand:["J","Q","K"]}`
`$not`	`{ $nor: [{ status: "active" }, { age: { $gte: 65 } }] }`	✓ `{ "status": "active", "age": 70 }` X `{ "status": "inactive", "age": 45 }`
`$mod`	`{age : {$mod : [10, 0]}}`	✓ `{age: 50}` X `{age: 42}`
`$exists`	`{phone: {$exists: true}}`	✓ `{phone: "555-555-5555"}` X `{phones: ["555-555-5555", "1-800-555-5555"]}`
`$type*`	`{age : {$type : 2}}`	✓ `{age : "42"}` X `{age : 42}`
`$size`	`{"top-three":{$size:3}}`	✓ `{"top-three":["gold","silver","bronze"]}` X `{"top-three":["blue ribbon"]}`
`$regex`	`{role: /admin./i} {role: {$regex:'admin.', $options: 'i' }}`	✓ `{"top-three":["gold","silver","bronze"]}` X `{"top-three":["blue ribbon"]}`
`$all`	`{ genres: { $all: ["fiction", "mystery"] } }`	✓ `{"title": "The Da Vinci Code", "genres": ["fiction", "mystery", "thriller"]}` X `{"title": "Harry Potter and the Sorcerer's Stone", "genres": ["fantasy", "adventure"]}`
`$size`	`{ players: { $size: 5 } }`	✓ `{"team_name": "Red Team", "players": ["John", "Emma", "Sarah", "Michael", "David"]}` X `{"team_name": "Blue Team", "players": ["Alice", "Bob", "Charlie"]}`

Section 7

Update Operators

Table 5 includes commonly used MongoDB update operations:

Table 5

Modifier	Start Doc	Example Mod	Result
`$set`	`{x:"foo"}`	`{$set:{x:[1,2,3]}}`	`{x:[1,2,3]}`
`$unset`	`{x:"foo"}`	`{$unset:{x:true}}`	`{}`
`$inc`	`{countdown:5}`	`{$inc:{countdown:-1}}`	`{countdown:4}`
`$push`, `$pushAll`	`{votes:[-1,-1,1]}`	`{$push:{votes:-1}}`	`{votes:[-1,-1,1,-1}}`
`$pull`, `$pullAll`	`{blacklist:["ip1","ip2","ip3"]}`	`{$pull:{blacklist:"ip2"}}`	`{blacklist:"ip1","ip3"}`
`$pop`	`{queue:["1pm","3pm","8pm"]}`	`{$pop:{queue:-1}}`	`{queue:["3pm","8pm"]}`
`$addToSet`, `$each`	`{ints:[0,1,3,4]}`	`{$addToSet:{ints:{$each:[1,2,3]}}}`	`{ints:[0,1,2,3,4]}`
`$rename`	`{nmae:"sam"}`	`{$rename:{nmae:"name"}}`	`{name:"sam"}`
`$bit`	`{permission:6}`	`{$bit:{permissions:{or:1}}}`	`{permission:7}`
`$min`	`{"temp":25}`	`{$min: { temp:20}}`	`{"temp":20}`
`$setOnInsert`	`{"name":"bob"}`	`{$setOnInsert: {resetPassword: true }}`	`{"name": "bob", "resetPassword": true}`
`$sort`	`{ "scores": [5, 8, 3, 9] }`	`{ $sort: { scores: 1 } }`	`{ "scores": [3, 5, 8, 9] }`

Section 8

Aggregation Pipeline Operators

The aggregation framework can be used to perform everything from simple queries to complex aggregations. To use it, pass the aggregate() function a pipeline of aggregation stages:

    Shell
   
 
> db.collection.aggregate({$match:{x:1}}, 
... {$limit:10}, 
... {$group:{_id : "$age"}})

Table 6 contains list of operators for the available stages:

Table 6

Operator	Description
`{$project : projection}`	Includes, excludes, renames, and munges fields
`{$match : match}`	Queries and takes an argument identical to that passed to `find()`
`{$limit : num}`	Limits results to `num`
`{$skip : skip}`	Skips `num` results
`{$sort : sort}`	Sorts results by the given fields
`{$group : group}`	Groups results using the expressions given (see Table 7)
`{$unwind : field}`	Explodes an embedded array into its own top-level documents

To refer to a field, use the syntax $fieldName. For example, this projection would return the existing time field with a new name, "time since epoch": {$project: {"time since epoch": "$time"}}

$project and $group can both take expressions, which can use the $fieldName syntax as shown below:

Table 7

Expression OP Example	Description
`$add : ["$age", 1]`	Adds `1` to the `age` field.
`$divide : ["$sum", "$count"]`	Divides the `sum` field by `count`.
`$mod : ["$sum", "$count"]`	The remainder of dividing `sum` by `count`.
`$multiply : ["$mph", 24, 365]`	Multiplies `mph` by 24*365.
`$subtract : ["$price", "$discount"]`	Subtracts `discount` from `price`.
`$strcasecmp : ["ZZ", "$name"]`	`1` if name is less than `ZZ`, `0` if name is `ZZ`, `-1` if name is greater than `ZZ`.
`$substr : ["$phone", 0, 3]`	Gets the area code (first three characters) of `phone`.
`$toLower : "$str"`	Converts `str` to all lowercase.
`$toUpper : "$str"`	Converts `str` to all uppercase.
`$ifNull : ["$mightExist", $add : ["$doesExist", 1]]`	If `mightExist` is not `null`, it returns `mightExist`. Otherwise, it returns the result of the second expression.
`$cond : [exp1, exp2, exp3]`	If `exp1` evaluates to `true`, it returns `exp2`. Otherwise, it returns `expr3`.

Section 9

Making Backups

One of the ways to back up MongoDB data is to make a copy of the database files while they are in a consistent state (i.e., not in the middle of being read from/to).

1. Use the fsyncLock() command, which flushes all in-flight writes to disk and prevents new ones:

    Shell
   
 
> db.fsyncLock()
{
  info: 'now locked against writes, use db.fsyncUnlock() to unlock',
  lockCount: Long('1'),
  seeAlso: 'http://dochub.mongodb.org/core/fsynccommand',
  ok: 1
}

2. Copy data files to a new location.

3. Use the fsyncUnlock() command to unlock the database:

    Shell
   
 
> db.fsyncUnlock()
{ info: 'fsyncUnlock completed', lockCount: Long('0'), ok: 1

Note: To restore from this backup, copy the files to the correct server's dbpath and start the mongod.

Alternatively, if you have a filesystem that does filesystem snapshots, your journal is on the same volume, and you haven't done anything stripy with RAID, you can take a snapshot without locking. In this case, when you restore, the journal will replay operations to make the data files consistent.

There are several other options for backing up your MongoDB data:

Percona Backup for MongoDB (PBM) – An open-source and distributed solution for making consistent backups and restoring MongoDB sharded clusters and replica sets. You can either use the command-line interface for backups on a running server or manage backups from a web interface with PBM and Percona Monitoring and Management.
mongodump and mongorestore – mongodump is used to create a binary export of MongoDB data, while mongorestore is used to import this data back into a MongoDB instance.
Snapshots copies – filesystem snapshots capture a consistent state of MongoDB data files at a point in time for fast and efficient backups.

Section 10

Replica Set Maintenance

Replica sets allow a MongoDB deployment to remain available during the majority of a maintenance window.

Keeping Members From Being Elected

To permanently stop a member from being elected, change its priority to 0:

    Shell
   
 
> var config = rs.config()
> config.members[2].priority = 0
> rs.reconfig(config)

To prevent a secondary from being elected temporarily, connect to it and issue the freeze command:

    Shell
   
> rs.freeze(10*60) // # of seconds to not become primary

The freeze command can be handy if you don't want to change priorities permanently but need to do maintenance on the primary node.

Demoting a Member

If a member is currently primary and you don't want it to be, use stepDown:

    Shell
   
> rs.stepDown(10*60) // # of seconds to not try to become primary again

Starting a Member as a Stand-Alone Server

For maintenance, often, it is desirable to start up a secondary and be able to do writes on it (e.g., for building indexes). To accomplish this, you can start up a secondary as a stand-alone mongod temporarily.

If the secondary was originally started with the following arguments:

    Shell
   
$ mongod --dbpath /data/db --replSet setName --port 30000

Then shut it down cleanly and restart it with:

    Shell
   
$ mongod --dbpath /data/db --port 30001

Note that the dbpath does not change but the port does, and the replSet option is removed (all other options can remain the same). This mongod will come up as a stand-alone server. The rest of the replica set will be looking for a member on port 30000, not 30001, so it will just appear to be "down" to the rest of the set.

When you are finished with maintenance, restart the server with the original arguments.

Section 11

User Management

To check current user privileges:

    Shell
   
 
> db.runCommand(
...   {
...     usersInfo:"manager",
...     showPrivileges:true
...   }
... )

To create a superAdmin:

    Shell
   
 
> use sensors
switched to db sensors
> db.createUser(
...   {
...     user: "sensorsUserAdmin",
...     pwd: "password",
...     roles:
...     [
...       {
...         role: "userAdmin",
...         db: "sensors"
...       }
...     ]
...   }
... )

To view user roles:

    Shell
   
 
> use sensors
switched to db sensors
> db.getUser("sensorsUserAdmin")
{
        "_id" : "sensors.sensorsUserAdmin",
        "user" : "sensorsUserAdmin",
        "db" : "sensors",
        "roles" : [
                {
                        "role" : "userAdmin",
                        "db" : "sensors"
                }
        ]
}

To show role privileges:

    Shell
   
> db.getRole( "userAdmin", { showPrivileges: true } )

To grant a role:

    Shell
   
 
> db.grantRolesToUser(
...     "sensorsUserAdmin",
...     [
...       { role: "read", db: "admin" }
...     ]
... )

To revoke a role:

    
  
> db.revokeRolesFromUser(
...     "sensorsUserAdmin",
...     [
...       { role: "userAdmin", db: "sensors" }
...     ]
... )

Section 12

MongoDB Restrictions

Below are common limitations in MongoDB. For a full list, see https://www.mongodb.com/docs/manual/reference/limits/.

The maximum document size is 16 megabytes.
The index entry total size must be less than 1,024 bytes.
A collection can have up to 64 indexes.
The index name (namespace included) cannot be longer than 127 bytes (for version 4.0 or earlier).
A replica set can have up to 50 members.
A shard key can have 512 bytes at most (for version 4.2 or earlier).
A shard key is always immutable (for version 4.2 or earlier).
MongoDB non-indexed field sort will return results only if this operation doesn't use more than 32 megabytes of memory.
Aggregation pipeline stages are limited to 100 megabytes of RAM. When the limit is exceeded, an error is thrown. The allowDiskUse option allows aggregation pipeline stages to use temporary files for processing.
A bulk operation is limited to 1,000 operations.
A database name is case-sensitive and may have up to 64 characters. They cannot contain ., $, or \0 (the null character). Names can only contain characters that can be used on your filesystem as filenames. Admin, config, and local are reserved database names. (Note that you can store your own data in them, but you should never drop them.)
Collections names cannot contain $ or null, start with the system. prefix, or be an empty string. Names prefixed with system. are reserved by MongoDB and cannot be dropped — even if you created the collection. Periods are often used for organization in collection names, but they have no semantic importance.
Field names cannot contain null.