Gea-Suan Lin
|
1de46569e8
|
Add a simple test case for tokenizer.
|
2024-02-16 20:59:15 +08:00 |
|
Gea-Suan Lin
|
57c153a6c3
|
Add more test about bigram.
|
2024-02-16 20:55:23 +08:00 |
|
Gea-Suan Lin
|
55ad14e790
|
Add test cases for bigram.
|
2024-02-16 20:54:28 +08:00 |
|
Gea-Suan Lin
|
8c3985c386
|
Use testify.
|
2024-02-16 20:49:00 +08:00 |
|
Gea-Suan Lin
|
6247ed36cd
|
Add a simple test case.
|
2024-02-16 20:43:26 +08:00 |
|
Gea-Suan Lin
|
18fbfa7292
|
Rename tokenize to tokenizer.
|
2024-02-09 11:47:13 +08:00 |
|
Gea-Suan Lin
|
ce79d2b245
|
Implement tokenize().
|
2024-02-09 11:46:19 +08:00 |
|
Gea-Suan Lin
|
a5b6a3c7a1
|
Rewrite splitter.
Merge all english characters (like "apple", not "ap" "pp" "pl" "le"),
but keep splitting on Chinese words.
|
2024-02-09 11:25:26 +08:00 |
|
Gea-Suan Lin
|
9e455bb15a
|
Implement gram-related functions.
|
2024-01-31 09:43:04 +08:00 |
|
Gea-Suan Lin
|
86bf78c762
|
Read artifact.
|
2024-01-31 09:42:42 +08:00 |
|