Simple application for ingestion and data access – real time (SAIDAR)

SAIDAR is a simple application to address a common business use case – continuously streaming data to be stored and accessible realtime. Data in consideration here is of fairly simple format like events getting generated from a web server / storage appliance / application code. This data is primarily subjected to time series analysis.

Solution offered here is nothing new in terms of architecture or accomplishment. There are a variety of third party tools (open source and propriety). A good example is .

Intent of this solution is to enable developers to deploy something quickly without going to their IT groups requesting for these special data storage / management software. It is a custom code that can be extended.

In Part-1 of this series, I will be explaining the architecture. Part-2 will come up sometime in future with code level details. First version of the tool will work only in standalone mode. The next revision will enable multi node deployment using which the benefits of a cluster based operation can be fully exploited.

Saidar Architecture Depiction

The architecture has 2 types of components : Control Center and Services.

Control Center

Scheduler – Manages the various scheduling tasks like polling for streamed data, in-memory index update etc
Configurator – Settings of various features is controlled through this interface
Monitoring – Metrics and health tracker

Services

Ingestion Service – Handles entry of data into the tool via HTTP, File or DB channels. It also has user extension points to help convert incoming data into Saidar compliant format
Indexer Service – Data received through ingestion service is persisted first as an index. Raw data is sent to a different service. Also, In Memory copy of the index is updated at regular intervals. This service supports an index scan feature for client searches. Another smart feature is to rank the various nodes of the memory tree and promote/demote nodes based on access pattern
Data Storage Service – Data that needs to be persisted to disk is packaged in a smart way so that retrieval of information can minimize the number of disk seeks
Query Service – Handles the tasks of interpreting the client query, searching both index and data storage and finally aggregating the response based on user query. This supports user defined code since aggregation is very specific to implementation
Access Service – Client entry point to the tool is enabled through a RESTful webservice interface and also via TCP/IP

In the subsequent posts, I will explain more on the technical components and their performance characteristics. The goal is to keep the design simple and easy to deploy and use. Challenge will be to check to what extent this tool can handle large datasets.

Vivid…

Simple application for ingestion and data access – real time (SAIDAR)

Leave a comment Cancel reply

Share this:

Related posts

Leave a comment Cancel reply