Commit Graph

8 Commits

Author SHA1 Message Date
Gea-Suan Lin
4a1e2b9c5e Fix test function naming and the actual test. 2024-02-19 10:42:23 +08:00
Gea-Suan Lin
57c153a6c3 Add more test about bigram. 2024-02-16 20:55:23 +08:00
Gea-Suan Lin
55ad14e790 Add test cases for bigram. 2024-02-16 20:54:28 +08:00
Gea-Suan Lin
8c3985c386 Use testify. 2024-02-16 20:49:00 +08:00
Gea-Suan Lin
6247ed36cd Add a simple test case. 2024-02-16 20:43:26 +08:00
Gea-Suan Lin
ce79d2b245 Implement tokenize(). 2024-02-09 11:46:19 +08:00
Gea-Suan Lin
a5b6a3c7a1 Rewrite splitter.
Merge all english characters (like "apple", not "ap" "pp" "pl" "le"),
but keep splitting on Chinese words.
2024-02-09 11:25:26 +08:00
Gea-Suan Lin
9e455bb15a Implement gram-related functions. 2024-01-31 09:43:04 +08:00