Commit Graph

5 Commits

Author SHA1 Message Date
Gea-Suan Lin
18fbfa7292 Rename tokenize to tokenizer. 2024-02-09 11:47:13 +08:00
Gea-Suan Lin
ce79d2b245 Implement tokenize(). 2024-02-09 11:46:19 +08:00
Gea-Suan Lin
a5b6a3c7a1 Rewrite splitter.
Merge all english characters (like "apple", not "ap" "pp" "pl" "le"),
but keep splitting on Chinese words.
2024-02-09 11:25:26 +08:00
Gea-Suan Lin
9e455bb15a Implement gram-related functions. 2024-01-31 09:43:04 +08:00
Gea-Suan Lin
86bf78c762 Read artifact. 2024-01-31 09:42:42 +08:00