Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What tools/data do you use for pos-tagging? I'm guessing it has to be fast, to run without a google data center :)


I'm using RDRPosTagger[1], though I've optimized the code a bit so that it's not just algorithmically efficient, but to use the language in a way that is fast. It isn't perfect, but it's good enough to be useful.

Language detection and sentence splitting are the other two slow bits of processing.

[1] https://github.com/datquocnguyen/RDRPOSTagger




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: