Install SRILM on Ubuntu

-Please scroll down for English-

Tiếp nối bài cài đặt SRILM trên Window thì giờ mình đã thành công cài SRILM trên Ubuntu, đơn giản hơn rất nhiều :)) . Hướng dẫn cụ thể như sau:

  • Download SRILM phiên bản mới nhất (bản hiện tại là srilm-1.7.1), di chuyển file vào “/Home”.
  • Bật Terminal, gõ các dòng sau(Mình mặc định thư mục cài đặt là “/usr/share/srilm“, trường hợp bạn muốn chọn khác thì cứ thay đổi bằng đường dẫn tương ứng):

    mkdir /usr/share/srilm
    mv srilm-1.7.1.tar.gz /usr/share/srilm/
    cd /usr/share/srilm
    tar xvf srilm-1.7.1.tar.gz

  • Mở file Makefile

    sudo gedit Makefile

  • Chỉnh sửa Makefile: Ở dòng thứ 7 bạn tìm dòng có nội dung “# SRILM = /home/speech/stolcke/project/srilm/devel”. Bạn bỏ dấu “#” và thay thành dòng sau:

    SRILM = /usr/share/srilm

  • Save và đóng file. Quay lại Terminal gõ tiếp các lệnh sau, lưu ý cấp quyền superuser. Nếu gặp lỗi “tcsh: command not found” thì gõ thêm dòng “sudo apt-get install tcsh” rồi thử lại. Nếu là Ubuntu 32 bit:

    sudo tcsh
    sudo make NO_TCL=1 MACHINE_TYPE=i686-gcc4 World
    sudo ./bin/i686-gcc4/ngram-count -help

  • Nếu là Ubuntu 64 bit:

    sudo tcsh
    sudo make NO_TCL=1 MACHINE_TYPE=i686-m64 World
    sudo ./bin/i686-m64/ngram-count -help

Chạy thử SRILM:

Bạn vào link này để down ngữ liệu và code liên quan. Copy 2 file corpus.txt và vocab.txt(danh sách từ trong corpus) vào ‘\usr\share\srilm\bin\i686-gcc4′(nếu là 32 bit) hoặc “\usr\share\srilm\i686-m64″(nếu là 64 bit). Copy thủ công có thể không được, bạn có thể làm bằng cách mở Terminal, di chuyển đến thư mục chứa 2 file này và gõ dòng sau

sudo cp vocab.txt '/usr/share/srilm/bin/i686-m64'
sudo cp corpus.txt '/usr/share/srilm/bin/i686-m64'

Bây giờ thì di chuyển vào thư mục trên và chạy thử chương trình

cd '/usr/share/srilm/bin/i686-m64'
sudo ./ngram-count -vocab vocab.txt -text corpus.txt -order 3 -write count.txt -unk

sudo ./ngram-count -vocab vocab.txt -read count.txt -order 3 -lm lm.lm -gt1min 3 -gt1max 7 - gt2min 3 - gt2max 7 -gt3min 3 - gt3max 7

Nếu chạy đúng thì bạn sẽ có 2 file mới là “count.txt” và “lm.lm”. Mở ra xem thử nhé :))

 

Note:

  • Ở những dòng thực thi bạn gặp trường hợp “Permission denied” thì thử thêm “sudo” vào đầu, tức cấp quyền superuser. Chẳng hạn thường dòng 4 sẽ găp lỗi, bạn thay thành dòng “sudo tar xvf srilm-1.7.1.tar.gz“.

Tham khảo:

[1] http://www.cs.brandeis.edu/~cs114/CS114_docs/SRILM_Tutorial_20080512.pdf

[2] http://askubuntu.com/questions/507659/how-do-i-install-srilm-on-ubuntu-14-04


 

To continue SRILM installion guide post on Window, I now successfully installed SRILM on Ubuntu, which is much simpler than previous one :))

  • Download SRILM latest version (current version is srilm-1.7.1), move downloaded file to “/Home”.
  • Open Terminal, type below commands (default directory is  “/usr/share/srilm“, in case you want to change, then replace it with equivalent link):

    mkdir /usr/share/srilm
    mv srilm-1.7.1.tar.gz /usr/share/srilm/
    cd /usr/share/srilm
    tar xvf srilm-1.7.1.tar.gz

  • Open Makefile file

    sudo gedit Makefile

  • Change Makefile: In line 7, you look for line with content looks like  “# SRILM = /home/speech/stolcke/project/srilm/devel”. Remove “#” and replace with this line:

    SRILM = /usr/share/srilm

  • Save and close file. Go back to  Terminal, use superuser permission. If you meet error “tcsh: command not found” then type “sudo apt-get install tcsh” before try it again.If yours is  Ubuntu 32 bit:

    sudo tcsh
    sudo make NO_TCL=1 MACHINE_TYPE=i686-gcc4 World
    sudo ./bin/i686-gcc4/ngram-count -help

  • Or if it is Ubuntu 64 bit:

    sudo tcsh
    sudo make NO_TCL=1 MACHINE_TYPE=i686-m64 World
    sudo ./bin/i686-m64/ngram-count -help

Running SRILM:

Access this link to download related corpus and code. Copy 2 file corpus.txt and vocab.txt(vocabulary of words in corpus) into ‘\usr\share\srilm\bin\i686-gcc4′(for 32 bit) or “\usr\share\srilm\i686-m64″(for 64 bit). Manual copy might do not work, you could use Terminal, go to folder containing those files and type:

sudo cp vocab.txt '/usr/share/srilm/bin/i686-m64'
sudo cp corpus.txt '/usr/share/srilm/bin/i686-m64'

Now move to above folder and run the program

cd '/usr/share/srilm/bin/i686-m64'
sudo ./ngram-count -vocab vocab.txt -text corpus.txt -order 3 -write count.txt -unk

sudo ./ngram-count -vocab vocab.txt -read count.txt -order 3 -lm lm.lm -gt1min 3 -gt1max 7 - gt2min 3 - gt2max 7 -gt3min 3 - gt3max 7

If you do it correctly, you will get 2 new files named “count.txt” and “lm.lm”. Have a look at them to see what had happened :))

Note:

  • In command lines, you might get in trouble with errors such  as “Permission denied”, then you should add “sudo” ahead, which gives you superuser permission. For example if line 4 gets error, you should change it to “sudo tar xvf srilm-1.7.1.tar.gz“.

Reference:

[1] http://www.cs.brandeis.edu/~cs114/CS114_docs/SRILM_Tutorial_20080512.pdf

[2] http://askubuntu.com/questions/507659/how-do-i-install-srilm-on-ubuntu-14-04

 

Advertisements

5 comments

  1. Pingback: Install SRILM on Windows | hoxuanvinh
  2. Pingback: Install MOSES for amateur in Ubuntu | hoxuanvinh
  3. Pingback: Error in installing SRLIM in Ubuntu 16.10 – 1OO Club
  4. Adnan Ali · June 19, 2017

    How to Create language model in SRILM of language other than English (Urdu)? I have installed it. but dont know what to do further, Can you please help me with this a bit.

    Like

    • hoxuanvinh · June 19, 2017

      Hi Adnan, it’s been a while since the last time I used SRILM, but I will try my best to help you.
      Can you please specify what do you need now?
      In my understanding, you want to ask what to do with a model once learned by SRILM? To my best knowledge, you can read output file and use for other tasks, which is demonstrated with MOSES in my blog for machine translation. Or, if you want to use it as a preprocessing step in a pipeline system, I suggested using a library such as gensim or other Python library.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.