Commit Graph

16 Commits

Author SHA1 Message Date
Gea-Suan Lin
ce79d2b245 Implement tokenize(). 2024-02-09 11:46:19 +08:00
Gea-Suan Lin
b133273065 Add more data. 2024-02-09 11:29:49 +08:00
Gea-Suan Lin
f6469cc843 Say it's experimental. 2024-02-09 11:28:16 +08:00
Gea-Suan Lin
d5d696c8ad Add license section. 2024-02-09 11:27:21 +08:00
Gea-Suan Lin
a5b6a3c7a1 Rewrite splitter.
Merge all english characters (like "apple", not "ap" "pp" "pl" "le"),
but keep splitting on Chinese words.
2024-02-09 11:25:26 +08:00
Gea-Suan Lin
28c1df566d Implement the first part of tfidf. 2024-01-31 09:43:30 +08:00
Gea-Suan Lin
9e455bb15a Implement gram-related functions. 2024-01-31 09:43:04 +08:00
Gea-Suan Lin
86bf78c762 Read artifact. 2024-01-31 09:42:42 +08:00
Gea-Suan Lin
07ebff32f8 Set internal/** as dependencies. 2024-01-31 09:03:11 +08:00
Gea-Suan Lin
043a95631b Add one more entry including some English. 2024-01-31 09:01:37 +08:00
Gea-Suan Lin
093e3d65fd Import article data. 2024-01-29 05:12:30 +08:00
Gea-Suan Lin
f3da8c3be3 Add LICENSE file. 2024-01-29 00:55:28 +08:00
Gea-Suan Lin
a51f716a23 Add skeleton of ir-bm25. 2024-01-29 00:50:23 +08:00
Gea-Suan Lin
6ee597dc7f Add a skeleton of ir-tfidf and its related settings. 2024-01-29 00:49:19 +08:00
Gea-Suan Lin
ee7ccd6887 Run go mod init. 2024-01-29 00:46:20 +08:00
Gea-Suan Lin
6d8fb3d837 Init. 2024-01-29 00:46:01 +08:00