Diplomarbeit zum Thema "Time Series Databases"

Introduction (3P)
1. Why Time-Series Databases? (1P)
2. Use Cases of Time-Series Databases (2P)
Overview of Popular Time-Series Databases (7P)
1. Comparison (3P)
How Time-Series Databases Work (2P)
1. Data Ingestion (0.5P)
2. Data Storage (0.5P)
3. Querying Mechanisms (1P)
TS & Cloud (2P)
1. Advantages/Disadvantages of Cloud Hosting
2. Hybrid
Optimization Techniques for Time-Series Databases (25P)
1. Compression Techniques (5P)
2. Indexing Techniques (5P)
3. Partitioning Techniques (5P)
4. Data Retention Strategies (5P)
5. Advantages of Low-Level Languages for TSDBs (5P)
Performance Measurement Overview (1P)
Benchmarking and Evaluation of Time-Series Databases (5P)
1. Benchmarking Methodology (2P)
2. Benchmarking Results (3P)
Discussion of Benchmark Results (2P)
TS & AI/Machine Learning (3P)
Conclusion & Usage in our project (1P)

Introduction

The amounts of data increase exponentially. Along the most used databases are Oracle and Mysql. Those databases are great at storing large amounts of data. What they leak in is the ability to select huge amounts of data over a long time period. Time series databases fill this gap. They are optimized to store and query large amounts of data over a long time period.

Why Time-Series Databases?

Conventional databases like Postgres are commonly used to handle large amounts of user data such as chats, accounts, and relationships. For even larger datasets, No-SQL databases like MongoDB are often horizontally scaled to accommodate nearly unlimited growth.

However, consider a scenario where logs are collected, stored and analyzed.

1000 sensors
every 1 second a new observation

one year has $31536000$ seconds

\text{1000} * 31536000 = 31536000000

That are 31 billion observations, for just one year. Even if this data doesn't need that much space, it can still lead to performance issues to query it. Time series databases can have a serious performance boost in this scenario.

Time series databases are databases that are specialized and optimized for time stamps and data in a big time frame. For example, TimescaleDB, a Postgres extension, is able to improve performance to up to 350 times faster queries, 44% faster ingests, and 95% storage savings with time-series data. @https://www.tigerdata.com/blog/postgresql-timescaledb-1000x-faster-queries-90-data-compression-and-much-more