Khmer Word Segmentation Tool / Demo

Khmer Word Segmentation Tool [Demo]

NIPTICT has released Khmer Word Segmentation Tool(including model and script). These data are useful for Khmer Natural Language Processing(NLP). It is available as a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International(CC BY-NC-SA 4.0)License

Prerequisites

Usage

  1. Prepare your Khmer text in a plan text file with UTF-8 format(example: khmer.txt)
  2. Download this tool and extract it. In terminal,go to directory km-5tag-seg-1.0 Run the following command:$ ./km-5tag-seg-test.sh model/km-5tag-seg-model sample/khmer.txt sample-out/
  3. You will see the output files in directory sample-outkhmer.txt.c
    khmer.txt.w

    • khmer.txt.c : segmented output containing compound word(read this paper for compound word definition)
    • khmer.txt.w : segmented output containing word form only

30 Aug., 2016