Toggle navigation
Tim Blog
Home
About
Tags
Tim Blog
Himmel oder Hölle
tornado
Tornado is a lightweight web application framework with python based on MVC(Model-View-Controller) model. Tornado is specifically built to handle high parallel asynchronous processes. By using no...
Posted by neverset on September 5, 2020
tez
tez is a lightweight wrapper on top of pytorch, which is used to simplify training pipeline logics. tez tez splits map and reduce tasks into smaller tasks, these smaller tasks can be combined more...
Posted by neverset on September 5, 2020
sqoop
Sqoop (sql-to-Hadoop) is an tool to convet data between ralational database and hadoop graph TD; raw_data-->MapReduce_clean; MapReduce_clean-->Hbase; Hbase-->Hive; Hive--&...
Posted by neverset on September 5, 2020
sqlite
sqlite is a relational database management system based on SQL. It is serverless, lightweight, and requires zero-configuration. It reads and writes directly to a disk file that can be easily copied...
Posted by neverset on September 5, 2020
spark
data processing data skewness Data skew primarily refers to a non uniform distribution in a dataset. this causes one task takes much more time than other tasks when shuffling by key is used. Exact ...
Posted by neverset on September 5, 2020
search engine
search engine mainly contain four steps: web scraping, indexing, searching in index db, ordering the search results. during searching, TERM operator queries inverted list of every emerging word, A...
Posted by neverset on September 5, 2020
pig
pig is a ad-hoc script language for big data analyse components PigLatin language to query and analyse big data. data is organized in this way: field(data block)->tuple(ordered set of fields)-&...
Posted by neverset on September 5, 2020
mesos
Mesos is a cluster management platform, which allocate resource to distributed frameworks, such as Spark, Storm, Hadoop, Marathon. Mesos can provide isolated resource for different frameworks to in...
Posted by neverset on September 5, 2020
kubernetes
kubernetes is a container automatic operation platform, which is used to manage deployment, scheduling and scaling between node clusters # to deploy pod on kubernetes using kind:deployment kubectl...
Posted by neverset on September 5, 2020
kafka
kafka is an open-source information system, which is able to provide uniform, high volumn and low latency data. kafka classify messages according to topic and store them afterwards. Features: S...
Posted by neverset on September 5, 2020
← Newer Posts
Older Posts →
FEATURED TAGS
machine learning
tensorflow
python
big data
docker
data engineering
NLP
visualization
database
linux
data enginerring
frontend
deep learning
nlp
java
react
pytorch
ABOUT ME
Ziel ist der Himmel
✉️ neverset123@aliyun.com