What are Embedded Databases and when to use them

· 5 min read September 4, 2022


What is a Database And DBMS.

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just database.

Different Types of databases (DBMS) .

There are different types of databases like sql and NonSql . In memory or persisted , embedded or server based databases .

Most common type of databases that we are aware of are server databases . like Mysql , Postgresql and mongodb .

These databases are spun up as a separate process on top of OS , On a VM , bare metal or cloud . The database listens on a particular port and clients on the same machine or any where else can communicate with the database through the exposed port . In this architecture the database can live on a separate host from the user .

While server based databases are most common . there is another type called embedded databases which are really interesting , and are of great use .

server based database architecture

In this post we are going discuss :

  • What is an embedded database.
  • Creating a simple embedded db.
  • Benefits of Embedded Databases.
  • Usecases for Embedded Databases.
  • Popular Embedded Databases

What is an embedded database.

There are two definitions for Embedded databases :

  • Database for Embedded Systems such as mobile or consumer devices . These need to have a small footprint and provide adequate performance in an environment with limited resources like Cpu and memory

  • Databases embedded in applications . The appication doesn’t need to communicate with the database through a server as the database lives in the application itself .

In Both definitions the an embedded database is a set of libraries which provide builtin database functionality with out having a separate database process running .

embedded database architecture

Creating a simple embedded db.

Lets imagine of an ecommerce monolith application with a products service . Recently you have noticed that some of the queries are expensive . To mitigate the issue you are supposed to implement some on the go caching . For some hypothetical issues you cant use centralized cache like Redis .

So you create a in-memory hashmap (name it ProductsCache ) as an cache . we will store the queries and their results for the first time and return the value from hashmap for subsequent reads .

# Stupid in memory cache using python .

class ProductsCache :

   def __init__(self) :
      self._cache = {}
   def get(self,query:str) :
      self._cache.get(query,None) # Return None If not in cache 

   def put(self,query:str,result:any) :
      self._cache[query] = result 

Our ProductsCache is an example of a simple embedded database . The ProductsCache lives in the same process as of the application and we can communicate with it directly and we cant access it outside of the program . In a sense it is confined to our program or is embedded in it .

Note : An embedded database doesn’t need to be in-memory and can also persist data to disk .

Well our ProductsCache is a good example for a simple embedded db .
But the production will bring a lot of heat . Sorry to say but our silly cache won’t survive the requirements like the persistence of cache , concurrent read \ writes , and a bunch of other things .

Rather than implementing these things ourselves we should
use a time tested existing solution ( Don’t Reinvent the Wheel) .

Embedded Databases have been in existence for a lot of time and are interesting in the features and usecases they have .

Lets see some of the popular ones :

  • Sqllite : Sqllite is a C-language library that implements a small , fast , self contained , highly reliable , full featured SQL datastore . It is builtin in all mobile phones . It is a relational database and use SQL to query data . sqlie3 architecture

  • LevelDb : Although SQLite works well in the majority of situations, it has a serious flaw. Since SQLite is a single-thread database, concurrent access is not supported. In circumstances with heavy throughput, it performs incredibly slowly due to its inability to multi-thread.

    Google developed LevelDB, which supported multi-threading, in response to this restriction.

    It is a fast key-value storage library that provides an ordered mapping from string keys to string values .

  • RocksDb : is a fork of LevelDb , developed by Meta to optimized for flash and memory. Rockdb detail

Most of these embedded databases are written in C (for performance) but a have wrapper for most of the programming language .

Benefits of Embedded Databases.

Due to the nature of their architecture and the fact that embedded databases don’t need a server these databases provide some pretty benefits like :

  • High Performance :- Embedded Databases have a simple architecture , they don’t need a bulky server module to run . Most of communication happens in the same process so the Latency is very less and write throughput is also large . Which makes these databases very performant for particular tasks .

       # Latency Comparisons 
       L1 cache reference ......................... 0.5 ns
       Branch mispredict ............................ 5 ns
       L2 cache reference ........................... 7 ns
       Mutex lock/unlock ........................... 25 ns
       Main memory reference ...................... 100 ns             
       Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
       Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
       SSD random read ........................ 150,000 ns  = 150 µs
       Read 1 MB sequentially from memory ..... 250,000 ns  = 250 µs
       Round trip within same datacenter ...... 500,000 ns  = 0.5 ms
       Read 1 MB sequentially from SSD* ..... 1,000,000 ns  =   1 ms
       Disk seek ........................... 10,000,000 ns  =  10 ms
       Read 1 MB sequentially from disk .... 20,000,000 ns  =  20 ms
       Send packet CA->Netherlands->CA .... 150,000,000 ns  = 150 ms
    - credits :
  • Low resource consumption :- Embedded database have a low footprint and can be as as small as 1 MB . This can be a game changer for resource scarce conditions like the browser , iot devices and mobile phones .

  • No Administration Overhead :- Embedded databases need not to worry Administration .

Usecases for Embedded Databases.

Despite being past and performant most embedded databases lack common features like ACID transactions , Sharding , Indexing .

So embedded databases have particular niche usecases like :

  • When you need very low latencies but don’t need to worry about stuff like indexing , replication example an persitent or in inmemory cache

  • When Storing data on embedded systems or mobile applications where it is safe to store data locally

  • Storing local data in browsers using Indexed Db .

Some Thing to say ?