The ELI infrastructure of GRICAD

Introduction

Eli is a platform built upon dedicated servers to easily create and share multiple high performance instances of the Elasticsearch/Logstash/Kibana (ELK) stack for research projects and GRICAD monitoring infrastructure. Using other big-data products is considered, but the optimization has been primarily made for the ELK stack.

With Eli, any user can ask for a dedicated ELK instance (or other data analysis tool) for their project. The data ingestion is set up with the help of a GRICAD system operator and the user gets a personnal access to the Elastic API and a Kibana frontend.

Eli_layout_simple.png

Hardware infrastructure

As of November 2021, the hardware is :

  • 3 Dell R640 nodes:
    • 2 x Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz (12 cores)
    • 192 GB RAM
    • 10 HD / RAID 5: 2,4 To, 10 000 tr/min, SAS 12 Gbit/s

“Elasticsearch / Kibana / Logstash” (alias ELK)

Elastic provides big data tools such as Elasticsearch, Kibana and Logstash. Elasticsearch is a no-sql database allowing fast “index” creation and search. It provides a RESTful API, which is not secured by default. It is the central tool of this stack. Kibana is a web tool designed to create or explore visualizations of the Elasticsearch indices. Logstash is a tool for easy data ingestion in Elasticsearch.

High performance Elasticsearch can be achieved thanks to clustering: it can run accross several nodes, each node having a separate role (master, client, data, ingest). GRICAD is able to provide you with a virtual ELK cluster delivering high performance accross a partition of the physical hardware described above, secured by a proxy.

Data volumes and indexing

The provided instance can get input data from our Bettik (BeeGFS) distributed storage. The indexed data is then stored on persistent volumes belonging to your personnal ELi instance. There’s an infinite way of getting the data indexed: from a live stream, from a file, from an application, from network queries, … That is why you will have to work with a GRICAD system operator, who will bring you the skills to set up the best configuration for your needs, before being able to do the actual data analysis. The configuration will generally result in a fine tuned Logstash instance doing the data ingest as a one-shot operation or on a regular-basis recurring event.

What does an Eli instance actualy looks like?

From the user point-of-view, you get 2 URLs: https://eli.univ-grenoble-alpes.fr/kibana-<instance_name> and https://eli.univ-grenoble-alpes.fr/elastic-<instance_name>. Both URLs are password protected. You can directly browse on the first to start using Kibana on the indices or your Eli instance. The second URL is the API of elasticsearch, which can be used in scripts or programs.

Here is a sample of a python program linked to the chronosheep instance to search for bin intervals: elastic_moutons_search.py

Here is a sample Kibana screenshot:

Eli_kibana_screenshot.png

Here is a sample API query:

Eli_api_screenshot.png

WARNING: Data ingest from the instance’s logstash is not directly manageable by the user himself. The logstash config needs fine tunning and specific skills to set up the filters. It is also sometimes necessary to preliminary initiate the index for correct value casting. Please, request help from a GRICAD sysop.

Other (No)SQL databases

Other databases systems may be available (Postgres, MongoDB,…). Feel free to ask.