Tim Blog

Himmel oder Hölle

hive

Hive is a data repo based on Hadoop. It maps structed data file into database table and offer sql access features feasible for static batch process large latency no add or changes on data ...

hbase

Hbase is a noSQL database to store huge amount of data and at the same time offer random access to data. It is based on Hadoop system to provide read/write access(all files are saved in hdfs) HBase...

hadoop

the core component of Hadoop are yarn, hdfs and MapReduce hdfs hdfs is a file system that manage files that located in multi-webstations. It offers possibilities of stream data access and large dat...

Flume

Apache Flume is an open-source, powerful, reliable and flexible system used to collect, aggregate and move large amounts of unstructured data from multiple data sources into HDFS/Hbase (for example...

FastAPI

FastAPI is a high performance web application framework based on python 3.6+, it is similar to flask but more advanced than flask github page: https://github.com/tiangolo/fastapi advantages: Asy...

ElasticSearch

ElasticSearch is an open-source search engine, which is not only for log analysis, but also support any other data search, search, and collection scenarios. It is based on Lucene library, and provi...

ETL

ETL is abbreviation of Extractin,Transformation and Load, which describes the data processing from data source to destination process ETL extract distributed heterogeneous data sources to middle la...

Airflow

airflow is an scheduling tool in python, it use DAG to define the whole workflow DAG(Directed Acyclic Graph): all tasks in the same DAG have same scheduling time DAG run: when a DAG is trigger...

docker

dockerfile tips caching For an efficient use of the caching mechanism , we need to place the instructions for layers that change frequently after the ones that incur less changes. application’s dep...

python testing

mutation test Mutation testing algorithmically modifies source code and checks if any “mutants” survived each test #angle.py def hours_hand(hour, minutes): base = (hour % 12 ) * (360 // 12) ...