Tim Blog

Himmel oder Hölle

time series

time series to supervised learning from pandas import DataFrame from pandas import concat def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): """ 将时间序列重构为监督学习数据集. 参数: data: 观测值序列,类...

pytorch tips

hooks get feature map and gradient map of selected layer with register_forward_hook,register_full_backward_hook class SaveValues(): def __init__(self, layer): self.model = None ...

parquet

parquet format is compressed and efficient way to store data get metadata get row/column/shema import pyarrow as pa import pyarrow.parquet as pq import os ts=pq.read_metadata(first_pq) ts.num_row...

sedona

sedona’s architecture ##

gcn

using one node to represent a group of nodes in graph. example https://colab.research.google.com/drive/1Ksca_p4XrZjeN0A6jT5aYN6ARvwFVSbY?usp=sharing

hierarchical clustering

The hierarchical tree is composed of small trees, each small tree represents a class, and the height of the small tree is the distance between two points or two classes, so the closer the distance ...

distance algorithem

Euclidean distance drawback: does not fit for demonsion higher than 3d different features have different units, cannot be standarized.  from scipy.spatial import distance  distance.euclidean...

file operation

There are libraries that more efficient than with open function to deal with files in file system Path.open from pathlib2 import Path example_path = Path('./info.csv') with example_path.open() as...

data sampling

SMOTE MOTE oversamples by synthesizing new samples that are close in distance to existing ones within the same class. from imblearn.over_sampling import SMOTE smote = SMOTE() # initializing X_trai...

itertools

usage count(start[, step]) cycle(p) repeat(elem [,n]) chain(*iterables): chain up multi iterables into one iterable(list, tuple, set, generator(