An Introduction to Azure Cosmos DB
hello guys, I have been thinking of creating a blog post series about one of the most popular service in azure echo-system, the cosmos db, for a long time. In one of my earlier projects I had the opportunity to work in this database. So here I am sharing everything I have learned so far about the Cosmos Db.
Cosmos DB is Microsoft’s massively scalable PaaS No-SQL database service. If you are somebody who worked in relational database alone in their entire life (like me), you might be wondering what’s meant by “No-SQL”. Well, I also had that doubt initially. So lets understand the answer to the classic question.
Why the world require No-SQL?
No-SQL is a term coined to solve the problem of “Big Data” through Volume, Velocity and Variety. Relational database like our old SQLServer are not designed to solve the problem of Big Data. SQLServer like databases was ideal in case of relatively small amount of data.
But when the Volume of the data to be managed grows, we have to either increase the RAM, increase the storage, or go for a faster processor i.e, Scaling-Up. In this case, eventually we will hit a ceiling where we no longer can increase the physical capability of an individual server. So the only alternative is to put the workload on a number of servers instead of going for increasing a single server’s computational power. With this approach we no longer have to worry about the computational power of a single machine, instead we can simply keep adding more machines, as the volume of the data to be stored increases. This is called Scaling-Out. This concept is the root of all the No-SQL buzz word.
This also helps in solving the Velocity aspect of the Big Data, which is all about retrieving millions of records per seconds pace. With lots of servers in place, this object also is achievable.
Let’s look at some of the main characteristics of Cosmos DB
- 99.9999% SLA
- Automatic horizontal partitioning — as the volume of the data increases, servers are automatically added to handle the storage and throughput (velocity)
- Globally distributed (with a simple click your db is replicated to multiple regions in the world)
- Supports different model/API’s like JSON, SQL API, Graph etc.
Okay, enough theories. Let’s dive in and create our first Cosmos DB database in azure portal.
Click on the Azure Cosmos DB in the home page, and select your favorite API’s from the list. Here I have chosen the SQL API, which is a special type pf SQL tailored to handle JSON documents.
After usual stuffs, just hit Create, and your first Cosmos DB is ready.
Lets go to the overview page.
Here from the “Keys” option, we can select the primary connection string which can be used in our dev environment.
For creating data in our newly created cosmos db, we need to head over to the “Data Explorer” option from the menu. Then click on the “New Container” button, and a pop up will open up. A container is basically equivalent to our old table in a SQL Server database.
As you can see from the above screenshot, I have created a database named “CosmosWorld” and a new container in it called “Students”, with partition key as zipCode. Here the partition key is a critical concept. Partitioning means how the cosmos db physically stores multiple documents together based on a partition key we choose. i.e, even if there are millions of students in the container, cosmos db will group students who have the same zip code in the same physical server.
Click OK and we will see the newly created container in action. Currently there are no items.
So click the “New item” and you will see provision to add a JSON document.
here I entered the first student’s details, and when I hit “Save” it will add some additional properties to the document.
the most important property is “id”. An id property combined with the partition key will be unique for each document.
Here if we notice closely, we can see some striking differences between our traditional SQL and new No-SQL world. i.e, if we want to store the parents details or hobby details of a student, we might have to create a separate table, and store the primary key of the student as a foreign key in those table. Here everything is added in a single document.
Also we are not restricted to a strict schema. That means in the next student record, I can add a whole new property like “Pets” and add some values into it. This implies our No-SQL database is schema-free. That is we are not forced to keep the same schema always. Doesn’t it look cool?
Here as you can see that I have added 3 students records. Let’s look at how we can query them from the portal.
click on the highlighted option where it will open up a new query window. I have typed a simple select query to retrieve all the students who has the zip code of ‘12345’. As you can see it has returned 2 records from the database.
So give the cosmos DB a try, and let me know how you felt about it.
Thanks for reading!