Diplomarbeit zum Thema "Time Series Databases"
Table of Contents
- Introduction (3P)
- Why Time-Series Databases? (1P)
- Use Cases of Time-Series Databases (2P)
- Overview of Popular Time-Series Databases (7P)
- Comparison (3P)
- How Time-Series Databases Work (2P)
- Data Ingestion (0.5P)
- Data Storage (0.5P)
- Querying Mechanisms (1P)
- TS & Cloud (2P)
- Advantages/Disadvantages of Cloud Hosting
- Hybrid
- Optimization Techniques for Time-Series Databases (25P)
- Compression Techniques (5P)
- Indexing Techniques (5P)
- Partitioning Techniques (5P)
- Data Retention Strategies (5P)
- Advantages of Low-Level Languages for TSDBs (5P)
- Performance Measurement Overview (1P)
- Benchmarking and Evaluation of Time-Series Databases (5P)
- Benchmarking Methodology (2P)
- Benchmarking Results (3P)
- Discussion of Benchmark Results (2P)
- TS & AI/Machine Learning (3P)
- Conclusion & Usage in our project (1P)
Introduction
The amounts of data increase exponentially. Along the most used databases are Oracle and Mysql. Those databases are great at storing large amounts of data. What they leak in is the ability to select huge amounts of data over a long time period. Time series databases fill this gap. They are optimized to store and query large amounts of data over a long time period.
Why Time-Series Databases?
Conventional databases like Postgres are commonly used to handle large amounts of user data such as chats, accounts, and relationships. For even larger datasets, No-SQL databases like MongoDB are often horizontally scaled to accommodate nearly unlimited growth.
However, consider a scenario where logs are collected, stored and analyzed.
- 1000 sensors
- every 1 second a new observation
one year has seconds
That are 31 billion observations, for just one year. Even if this data doesn't need that much space, it can still lead to performance issues to query it. Time series databases can have a serious performance boost in this scenario.
Time series databases are databases that are specialized and optimized for time stamps and data in a big time frame. For example, TimescaleDB, a Postgres extension, is able to improve performance to up to 350 times faster queries, 44% faster ingests, and 95% storage savings with time-series data. @https://www.tigerdata.com/blog/postgresql-timescaledb-1000x-faster-queries-90-data-compression-and-much-more
What is time series data?
Time-series data are a sequence of successive points in time. This could be any record with a time stamp. Common examples are as follows.
- IOT sensors
- stock marked information
- logs, error rates or request counts
- weather patterns/climate data
- sales or demand forecasting
Comparison of the most popular time series databases
::: {#tab:placeholder} Name Description Pros Cons
InfluxDB
TimescaleDB Postgres extension RaimaDB Redis
: Caption :::
TimescaleDB
Timescale is a Postgres extension. It benefits from the feature-rich and well-tested features of Postgres.
TODO: other tsdbs
Evaluating performance
Time-serious databases can be measured in various aspects:
Storage Efficiency
Time-series data is often highly compressible as values tend to repeat. The right compression algorithm can save over 90% of storage costs
compression methods:
- delta encoding: stores just the difference (delta), this makes this compression method highly efficient if the data repeats often