Instruction Guide to use HPC (High Performance Computing) or NSCC (National Supercomputer Centre) server for Singaporean students/researchers

Thank you Nguyen Quoc Phong, Duong Hai Long, and admins of 2 servers for useful comments when I first approach HPC and NSCC.


CONTENT

  • HPC
  • NSCC
  • Sunfire
  • Usecase of Keras package

Recently, I have a small project requiring a vast amount of computation which my old laptop could not handle. Thus, I decided to roll up my sleeves and try to use SoC (School of Computing) servers, namely, Sunfire. Unfortunately, it is Unix-based OS, which is a little bit difficult to get used to with and old (according to information on the website). If some of you know how to execute Sunfire or any other server, do not hesitate to contact me and I would update this post).

Updated 01: I have found out a way to utilize Sunfire and update in this post. I also realized why the performance is lower than expected, which is because I did not submit a job in a proper way. Should you care about computation limitation in the chosen server, which could create a big gap, for instance between the provided and non-provided GPU  servers.

Updated 02: Thanks to helpful instruction guide from NSCC, I figured out how to submit a job. However, several examples are error-prone due to typos or missing arguments, hence I made a modified version, which could be found here. I reported this issue to them and hope it would be fixed soon. I also included an extra GPU code sample for illustration.

I asked my senior and he recommended to use HPC (High Performance Computing) instead. After a couple of hours for requesting and installing required software, I made it and thought maybe some of you would find this helpful.

High Performance Computing

This server is only for NUS student (Take a look below for alternative solution)
1) You register for an account via HPC Portal: https://nusit.nus.edu.sg/services/hpc/getting-started/introductory-guide-for-new-hpc-users/

2) It took about 1-2 hours for your application to be accepted. The result would be sent to your NUS email.

3) Follow instructions in the email. I tried a few thing and found below setting worked best for me (note that I can access to it from Prince George President’s Park Residences and only need to transfer file and run terminal):

SSH

Open SSH Secure Shell and enter hostname (server name), username, click Connect and enter password

4) In my project, I would like to use keras library, however, I have no right to install anything (required superuser right etc). You can have following options:

  • Send an email to admin and ask them to install it for you.
  • Currently, I could not use pip or install virtualenv because require “admin right” or could not append to PYTHONPATH.
  • Download package from github, upload to server, and install manually.

Load python 2.7 or python 3.5. After that, you can use them with python or python3, respectively.

module load python2.7
module load python3.5

Install package by extracting downloaded compressed file, move to its folder and compile with “–user” flag (two dash follows right by user).

python setup.py install --user
python3 setup.py install --user
  • Uninstalling packages:

1. If you install the packages as –user, you can find it in .local/lib directory in your home directory. You shouldn’t have issue with “pip uninstall” if you install the package as –user. Alternatively you can just find the directory and delete it yourself.
2. I don’t think you can install pip as –user, because it requires some access to the admin directory.

Last time, because admin told me that I could not run Tensorflow (another Python library) on GPU. Thus, they suggested to use NSCC resources. So, I also tried to figure how this guy works.



National Supercomputing Centre  Singapore

This server is public and easy to register among its stakeholders: A*STAR, NTU, NUS, and SUTD. However, the good news is, other researchers who are living in Singapore can also apply to use. One more thing,  rumors said that because it is still on beta phase, there is no guarantee about stability. When training, you should create checkpoint to prevent server from limiting your time to access to resource, hence ending your task without noticing.

It would not be so painful to access to NSCC if you have successfully accessed to HPC. Here are what I have done:

1) Download User Enrollment Guide (the second option) from https://help.nscc.sg/user-guide/. Basically, you have to sign in website with your account, confirm identity, and set up new password.

2) Follows instructions in https://help.nscc.sg/vpnmicrosoft/.

3) Install Putty or SSH Client to run.

4) Not surprisingly, we still have to send email to admin to ask for installing new packages. I have not tried to install package as in HPC because they are already available. Feel free to share with me any problem when installing.


Sunfire

This server is for School of Computing Staff & Students’ use only. I haven’t used it for keras task, but it’s a very handful if you want to set up your own academic web page to advertise yourself.

1) Instruction link: http://www.comp.nus.edu.sg/~cs1101x/3_ca/labs/lab0/unix_intro.html.

Other file which is extremely useful, I downloaded and share in my Drive.

2) Register for an account in SoC, remember your alias name as it would be used as username when login. For example, mine is hxvinh.

3) Install Putty or SSH Client to run.


Usecase of Keras Package

In my project, I want to install keras and this is what I actually did to install it. I chose python3 to run it.

HPC

Problem 1:
1.1) I imported 3 things: “tensorflow”, “theano”, and “keras”. Only last two worked.
module load python3
python3
import tensorflow
import theano
import keras
Thus I changed default backend of keras from “tensorflow” into “theano”.  I looked for  “~/.keras/keras.json” to change default backend from “tensorflow” to “theano”.
vim ~/.keras/keras.json
1.2) Now I could “import keras” without error. However, when running a simple Multi Layer Perceptron I coded by myself and save the best weights after each epoch into hdf5 file, it said “can not import h5py”. I installed h5py package manually with latest version. Rerun program and new problem, when the network try to save it weight, error on screen said “segmentation fault (core dumped)”. Hmm, I have not seen this problem before in Python, only in C. Then I suspected that it might be because theano use some C file. Dead end.
Problem 2:
2.1) Ok, I changed to new strategy, installed virtual environment with env/virtualenv.
python3 -m venv virtual_keras
I followed step 1.1, 1.2. In 1.2, I could not install h5py because it was not in PYTHONPATH. Thus I
export PYTHONPATH:$PYTHONPATH:~virtual_keras
No use, same error.
Problem 3:
3.1) Then I noticed when installing “h5py”, in terminal it said, “cannot find Cython”. Thus I installed Cython package, then reinstall, still cannot import h5py because “undefined symbol: H5DOwrite_chunk” . Then I switched to older version: “h5py_2.4.0”, it worked like a charm.
3.2) I rerun my code and no problem.

NSCC

I used python2 here. No problem and run smoothly, one small note is that command line to load python is slightly different.

module avail
module load python
python execute.py

One last comment from my other senior: “for NUS HPC, you can try theano as back-end for keras. tensorflow can also be run but not with GPU because their GPU is too old. the GPU queue for NUS HPC is almost always free, whereas NSCC is almost always full”.

Hope you find this useful, and I am very glad to receive any comments or sharing about problems with the servers.

Advertisements

5 comments

  1. Anonymous · September 17, 2017

    Thanks a lot. Very useful post!

    Like

  2. Anonymous · December 12

    Many thanks for your information. I am also running Tensorflow on NSCC. However, the bad news is that as a personal user, NSCC only gives us a quota of 100000 CPU hours.

    Like

    • hoxuanvinh · December 13

      Agree, as a first time user, I found it quite limited as well. However, I’m positive that not many people have used up that much time if you convert to hour unit. Apart from that, the NSCC support team is very active and I believe they are willing to help us if this happens.
      It’s worth noting you tried GPU node rather than CPU for task relating to tensorflow. I have updated an instruction guide above for this matter.

      Like

      • Anonymous · December 13

        Found it! A lot of thanks! Actually, I have tried the GPU node before but failed. Your .pbs file is really helpful.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.