Tim Blog

Himmel oder Hölle

tornado

Tornado is a lightweight web application framework with python based on MVC(Model-View-Controller) model. Tornado is specifically built to handle high parallel asynchronous processes. By using no...

tez

tez is a lightweight wrapper on top of pytorch, which is used to simplify training pipeline logics. tez tez splits map and reduce tasks into smaller tasks, these smaller tasks can be combined more...

sqoop

Sqoop (sql-to-Hadoop) is an tool to convet data between ralational database and hadoop graph TD; raw_data-->MapReduce_clean; MapReduce_clean-->Hbase; Hbase-->Hive; Hive--&...

sqlite

sqlite is a relational database management system based on SQL. It is serverless, lightweight, and requires zero-configuration. It reads and writes directly to a disk file that can be easily copied...

spark

data processing data skewness Data skew primarily refers to a non uniform distribution in a dataset. this causes one task takes much more time than other tasks when shuffling by key is used. Exact ...

search engine

search engine mainly contain four steps: web scraping, indexing, searching in index db, ordering the search results. during searching, TERM operator queries inverted list of every emerging word, A...

pig

pig is a ad-hoc script language for big data analyse components PigLatin language to query and analyse big data. data is organized in this way: field(data block)->tuple(ordered set of fields)-&...

mesos

Mesos is a cluster management platform, which allocate resource to distributed frameworks, such as Spark, Storm, Hadoop, Marathon. Mesos can provide isolated resource for different frameworks to in...

kubernetes

kubernetes is a container automatic operation platform, which is used to manage deployment, scheduling and scaling between node clusters # to deploy pod on kubernetes using kind:deployment kubectl...

kafka

kafka is an open-source information system, which is able to provide uniform, high volumn and low latency data. kafka classify messages according to topic and store them afterwards. Features: S...