Computer

Setting up Windows

Installing WSL

  • Install Windows Subsystem for Linux and the VcXsrv X11 server.
  • Install the following programs
    • gnuplot - a plotting program
    • ImageMagick - a software suite to create, edit, compose, or convert bitmap images

Open a WSL terminal and run:

sudo apt update
sudo apt install gnuplot imagemagick

Installing common GUI tools

Setting up ssh-agent

  • Generate a key pair following the first bullet point under SSH-Agent , then skip to the last bullet point, and install weasel-pageant.
  • Open a WSL terminal and follow “On Ubuntu or Debian” in GPG-Agent.
  • Install the Windows version of GPG-agent following “On Windows”.
  • Add the lines under “Useful common settings”.

Setting up .ssh/config

Follow the steps in Customizing .ssh/config and set up Sharing sessions over a single connection and Host alias. Leave Multi-hop for later when you need it but do keep in mind of this option.

Windows Subsystem for Linux

Windows Subsystem for Linux (WSL) is a compatibility layer for running Linux binary executables (in ELF format) natively on Windows 10. This means you can download an executable compiled for Linux and run it unmodified under Windows. You can also use `apt install` to access the repertoire of all Ubuntu packages. You can also install a different distribution such as Fedora and openSUSE via Windows Store.

X11 and DBus

Since WSL does not currently support X11 and Unix sockets, which DBus uses by default, we need to do the following:

  • Install VcXsrv or Xming.
  • export DISPLAY=:0.0. You can add this to your shell init script:
echo "export DISPLAY=:0.0" >> ~/.bashrc
  • In /etc/dbus-1/session.conf, replace unix:tmpdir=/tmp with tcp:host=localhost,port=0. If the file or the line does not exist, simply add it.
  • Suppress a few other benign warnings:
echo "export NO_AT_BRIDGE=1" >> ~/.bashrc
sudo dbus-uuidgen > /etc/machine-id

Cygwin

Install packages from the command line

  • For simple usage, you can run Cygwin setup.exe directly:
setup-x86.exe -q -P packagename1,packagename2

Set different permissions for user and group

It is impossible to set different permissions for user and group, if group is set to None. To change this, simply set a valid group:

chown -R :Users directories_or_files

Install SSHD daemon

  • Use Cygwin setup.exe to install OpenSSH and cygrunsrv;
  • Start a Cygwin bash shell as an Administrator;
  • Run ssh-host-config [-y], answer yes to all questions except the one that asks if you want to use a different user name instead of "cyg_server";
  • Enter "ntsec", "tty ntsec" or "binmode ntsec" to the prompt "Enter the value of CYGWIN for the daemon", which will enable communication with the NT authentication database;
  • Run the following commands to change permissions of relevant files:
chown cyg_server:root /etc/ssh*
chmod go-rwx /etc/ssh*_key
chown -R cyg_server:root /var/empty
chmod 755 /var/empty
chown cyg_server:root /var/log/sshd.log
chmod 644 /var/log/sshd.log
  • Open Computer Management, go to Services and Applications -> Services, right-click CYGWIN sshd and select Properties. In the Properties dialog box, go to the Log On tab, select Log on as this account, and specify the domain/user name and password. Click Apply.
  • SSHD daemon will now start at boot. You can start it right away with cygrunsrv -S sshd, net start sshd, or /usr/sbin/sshd;
  • Add a rule in Advanced Windows Firewall settings to allow C:\cygwin64\usr\sbin\sshd.exe. You can limit Services to "Apply to services only" and Local port to "Specific Ports" (e.g., 22).

Install cron daemon

  • Use Cygwin setup.exe to install cron and cygrunsrv;
  • Start a Cygwin bash shell as an Administrator;
  • Add cron as a Windows service and start it:
cygrunsrv -I cron -p /usr/sbin/cron -a -n
cygrunsrv -S cron # or "net start cron" or "/usr/sbin/cron -a -n"
  • As usual, use crontab to add tasks; maybe something like updating DDNS services every hour if you don't have static IP addresses:
0 * * * * ip=$(curl -s "http://freedns.afraid.org/dynamic/update.php?SPECIAL_STRING"|/usr/bin/gawk 'NF==6{print $3}NF==7{print $4}'); curl "https://www.dtdns.com/api/autodns.cfm?id=yourname.suroot.com&pw=PASSWORD&ip=$ip" &> /dev/null; curl --user your@email.com:PASSWORD "https://www.dnsdynamic.org/api/?hostname=yourname.dnsd.info&myip=$ip" &> /dev/null; curl --user USERNAME:PASSWORD "https://members.dyndns.org/nic/update?hostname=yourname.dyndns.org&myip=$ip" &> /dev/null

Install zeromq / pyzmq

This software is needed if, for example, you want to pip install ipython[notebook]. As of 2015/01/08, you have to first install zeromq separately. Follow the "To build on UNIX-like systems" section, but you don't need the build-essential and uuid/e2fsprogs packages.

To compile zeromq, you need to apply the hacks here. Basically, force libzmq.la to compile as C++ library:

mv tools/curve_keygen.c tools/curve_keygen.cpp
sed -i 's/\.c\>/&pp/' tools/Makefile.am
rm -f tools/.deps/curve_keygen.Po
./autogen.sh
make

Then you will need to create a shared library manually following the suggestions here:

export ARTIFACT=/usr/local # the path you have supplied to ./configure --prefix=$ARTIFACT

# shared libraries broken with 0MQ on Cygwin64
# manual shared library link and install from static library

gcc -shared -o cygzmq.dll -Wl,--out-implib=libzmq.dll.a -Wl,--export-all-symbols -Wl,--enable-auto-import -Wl,--whole-archive src/.libs/libzmq.a -Wl,--no-whole-archive -lstdc++

mkdir -p $ARTIFACT/bin
install cygzmq.dll $ARTIFACT/bin

mkdir -p $ARTIFACT/lib
install libzmq.dll.a $ARTIFACT/lib

Then you can install ipython or do pip install pyzmq.

Linux Backup Commands

Command Backup that preserves hard links Hard link backups
GNU cp cp -av /source/directory /target/directory cp -avl /source/directory /target/directory
rsync rsync -avzPH --stats --delete --delete-excluded /source/directory /target/directory N/A
cpio find /source/directory -print -depth|cpio -pdum /target/directory find /source/directory -print -depth|cpio -pduml /target/directory
tar tar -cSf - /source/directory|tar -xvSpf - -C /target/directory N/A
dump (dump -0uanf - /source/directory|restore -xyvf - /target/directory) >& log N/A

Tar select files in a directory tree

find . -name 'test.*' | tar czvf test.tar.gz -T -
find . -name test.\* -print0 | xargs -0 tar czvf test.tar.gz
find . -name "test.*" -exec tar -rvf test.tar '{}' \;
# find . -name "test.*" -exec tar -cvf test.tar '{}' \; +  # not tested
find . -name "test.*" | cpio -o -H ustar | gzip -c > test.tar.gz

Remote Access

If you ever want to work from home, connecting remotely to your computer is a must. In order for things to work the way you expect, you need to keep a few things in mind.

SSH

X Forwarding

  1. To connect to your office desktop (MSI and College of Science and Engineering Labs also offer SSH connection to their Linux machines), you will need an SSH client on your home computer or laptop. If you are working from a Mac or a Linux machine, you already have one built in. Just open a terminal and go. If you are working from a Windows PC, you can use MobaXterm (with X11 server and support for RDP, VNC, SSH, telnet, rsh, FTP, SFTP and XDMCP) or PuTTY.
  2. To run your X11 programs, you need some kind of X-emulator to mimic the X Window System. If you work from a Mac or a Linux machine at home, you already have one built in. If you work from a Windows PC and the client program you are using does not have it bundled, then you need VcXsrv or Xming.
  3. Be mindful of the firewall. Your desktop is set to deny access to any IP address coming from off-campus computers. To get around this, you can either first connect to one of MSI's machines, and then connect to your computer, or to allow your home IP address Configure the tcp wrapper.
  4. To avoid all the trouble of getting SSH and X11 forwarding, as well as other Unix stuff, to work in Microsoft Windows, without the need for dual-booting, you can use Windows Subsystem for Linux, cygwin, or install a Linux distribution within your Windows operating system using VirtualBox.
  5. If you just want to access the library, then either VPN or SSH Tunneling should be sufficient.

SSH Tunneling

You can set up machines to which you have SSH access as Internet proxy via SSH Tunneling. This is useful for journal/database services which are restricted to university IP addresses. Firefox-proxy-setting.png

  • Establish a SSH connection with dynamical port forwarding: under Linux or Mac OS X, just open a terminal and enter: ssh -D 2001 -fN your_user_name@your_machine.dept.univ.edu. If you use PuTTY on Windows, load a saved session or create a new one, then in the left panel of the Configuration window, under Connection -> SSH -> Tunnels, enter 2001 as Source port, check Dynamic, click Add, and finally click Open to connect. You can save this back to the session if you like.
  • Direct the application to use the SOCKS proxy: set up your browser to use SOCKS v4 proxy at host 127.0.0.1 on port 2001. For Firefox, the setting is under Options -> Advanced -> Network -> Connection Settings. For Chrome, it is under Settings -> Show advanced settings -> Network -> Change proxy settings. You can also use add-ons such as FoxyProxy to easily switch between different proxy settings.
  • Check your IP address at, e.g. https://ipleak.net. It should show the IP address of the remote machine now.

SSH-Agent/Keychain/Pageant

To avoid having to enter a password every time you SSH, you can use ssh-agent, pageant from PuTTY or gpg-agent.

  • First generate a key pair and copy the public key to the target remote system. Make sure to provide a passphrase.
ssh-keygen -t rsa -b 2048 -f ~/.ssh/id_rsa
scp ~/.ssh/id_rsa.pub user@remote.system:.
ssh user@remote.system 'cat id_rsa.pub >> .ssh/authorized_keys && rm id_rsa.pub'

On Windows, simply use PuttyGen from PuTTY to generate a SSH-2 RSA key.

  • Do the following every time you begin a new terminal session:
eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa

This will export a few environment variables so that tools such as ssh know where to request keys.

On Windows, use Pageant to serve keys. Many other software tools (i.e., WinSCP) know how to use PuTTY keys too. You can also create a shortcut to launch PuTTY immediately after adding a key:

C:\PuTTY\pageant.exe C:\path\to\id_rsa.ppk -c C:\PuTTY\putty.exe
  • A nuisance with the approach so far is that the relevant environment variables are not preserved between shell sessions. As a result, you have to start many ssh-agent instances. To locate existing ssh-agent sessions (and start new ones if none is found), use keychain (a single-file shell script). If it is not provided by your package manager, simply download and unzip it somewhere. Then do the following or add them to ~/.profile:
eval $(/path/to/keychain --eval ~/.ssh/id_rsa)
eval $(/mnt/c/path/to/weasel-pageant/weasel-pageant -r) # for weasel-pageant
eval $(ssh-pageant -r -a "/tmp/.ssh-pageant-$USER") # for ssh-pageant

This solution replaces ssh-agent + keychain, and will require Pageant to be running.

GPG-Agent

GnuPG is an encryption and digital signing software package used for emails (i.e., Mailvelope or Mymail-Crypt for GMail, and EnigMail for Thunderbird), signing code, etc. It provides its own key daemon which can be a drop-in replacement for ssh-agent + keychain and Pageant.

  • Useful common settings in gpg-agent.conf

The following specifies that a passphrase is remembered for 57600 seconds (16 hours) from its last usage so you don't need to enter it again if you are actively working and making connections. The passphrase will however expire at least once every 999999 seconds (~ 11.6 days).

default-cache-ttl 57600
default-cache-ttl-ssh 57600
max-cache-ttl 999999
max-cache-ttl-ssh 999999
  • On Ubuntu or Debian:
sudo apt install gnupg-agent dbus-user-session
echo enable-ssh-support >> ~/.gnupg/gpg-agent.conf

and comment out “use-ssh-agent” from /etc/X11/Xsession.options to disable ssh-agent.

Then start gpg-agent and import existing SSH keys via ssh-add:

gpg-agent --daemon
ssh-add ~/.ssh/id_rsa

It will first ask you for the passphrase to decrypt the private key, and immediately ask you again for a passphrase to encrypt and store it with gpg-agent. This step needs to be run only once, and gpg-agent will remember the authentication keys across sessions. In addition, gpg-agent itself will be started automatically due to /etc/X11/Xsession.d/90gpg-agent. When a program needs to authenticate, gpg-agent will prompt you for the passphrase if the key has not been cached.

  • On MacOS:
brew install gpg pinentry-mac
echo 'pinentry-program /usr/local/bin/pinentry-mac' >> ~/.gnupg/gpg-agent.conf
echo enable-ssh-support >> ~/.gnupg/gpg-agent.conf

then start gpg-agent and import existing keys as above. Unlike in Linux, the environmental variables are not set up automatically, so you need to do or add the following to ~/.profile

export SSH_AUTH_SOCK="$(gpgconf --list-dirs agent-ssh-socket)"
  • On Windows, use the installer provided by GnuPG, and add enable-putty-support to %AppData%\gnupg\gpg-agent.conf. You can create new authentication-only keys for gpg-agent, but it is not clear how to import existing SSH keys (like we can do with ssh-add under Linux or MacOS). You can, however, copy ~/.gnupg/sshcontrol and relevant keys from .gnupg/private-keys-v1.d from your Linux/Mac machine to %AppData%\gnupg. Optionally, you can create a shortcut to start gpg-agent:
"%ProgramFiles(x86)%\GnuPG\bin\gpg-connect-agent.exe" /bye

PuTTY, WinSCP and other programs should then be able to authenticate as usual.

Customizing .ssh/config

Sharing sessions over a single connection

Add the following to the end of ~/.ssh/config:

Host *
  ServerAliveInterval 300
  ServerAliveCountMax 3
  TCPKeepAlive no
  ControlMaster auto
  ControlPath ~/mux-%L-%r@%h:%p
  ControlPersist 30m

Host alias

If you would like to be able to ssh hostA instead of ssh userX@hostA.dept.univ.edu, add the following to ~/.ssh/config:

Host hostA
  HostName hostA.dept.univ.edu
  User userX

Multi-hop

  • If you need to SSH to a machine via a proxy node, you can do:
ssh -t user1@proxy.system ssh user2@remote.system

This quickly becomes cumbersome, especially if you use scp or rsync too. Automate it in ~/.ssh/config:

Host mesabi ln000? cn0???
  HostName remote.system
  User user2
  ProxyCommand ssh user1@proxy.system nc %h %p

You can then simply do: ssh mesabi.

  • Logging into some HPC systems (e.g., very.secure.system) requires a hardware token in addition to password. To avoid doing it every time, you can first set up ~/.ssh/authorized_keys in very.secure.system, and SSH to it on a different, long-running computer:
ssh -L1322:localhost:22 -fN user@very.secure.system

This step needs the hardware token. Now on other computers you use, add to ~/.ssh/config:

Host cascade glogin*
  HostName localhost
  Port 1322
  User user
  ProxyCommand ssh user2@long.running.system nc %h %p

after which you can ssh cascade without needing the hardware token. It will connect to very.secure.system via port 1322 at long.running.system. However, ALCF's machines have disabled authentication by SSH keys, so this method will not work there.

Remote Desktop

If you want remote desktop functionality, there are a few options. NoMachine and TeamViewer are so fast that you can even stream HD videos over Internet.

  • VNC is a cross-platform solution. A server program needs to be run on the remote end, and a viewer program is used to connect to the server. Mac OS X has built-in VNC server support (System Preferences -> Sharing -> Screening Sharing), but it is not as fast as other implementations (such as RealVNC, TightVNC, and UltraVNC).
  • NoMachine started as an improvement of the X Window System by compressing and transferring only the changed region of the screen and wrapping the connections in SSH for encryption. Version 4 began to provide remote desktop between machines that have both installed the software, in addition to connecting to NoMachine servers that are specifically set up. Open-source server implementations(FreeNX, Neatx) exist for Linux. MSI and College of Science and Engineering Labs provides NX servers, from which you can connect to other HPC machines and your workstation. The sessions are preserved unless you terminate them, so you will not lose any ongoing work because of dropped connections and can pick up where you were when you go back and forth between office and home.
  • TeamViewer supports remote control, online meeting, and VPN. It runs on Microsoft Windows, Mac OS X, Linux, Android, iOS, Windows RT, and Windows Phone.

Mount ext2/3/4, NTFS file systems

Mac OS X does not have native kernel support for the ext2/3/4 file systems used by most Linux distributions and the NTFS file system used by Windows. MacFUSE, OSXFUSE, and Fuse4X are loadable kernel modules for Mac OS X based on the GPLed Linux module FUSE, which provides a "bridge" to the kernel interfaces and allows file system code to run in user space. fuse-ext2, NTFS-3G for Mac OS X, and NTFS for Mac OS X are file systems built on top of FUSE.

  • First of all, download and install MacFUSE. With 64-bit OS X, you also need to install macfuse-core-10.5-2.1.9.dmg and MacFUSE.prefPane-2.0-64-bit-2009-09-10.zip in sequence. (MacFUSE is no longer maintained by its original author. Alternatively, you can use OSXFUSE which is a drop-in replacement for MacFUSE. You need to enable MacFuse Compatibility Layer during the installation of OSXFUSE.)
  • For ext2/3/4 support, install fuse-ext2. If the hard drive does not auto-mount, mount it manually with something like "mkdir /Volumes/disk2s1 && fuse-ext2 /dev/disk2s1 /Volumes/disk2s1"
  • Download and install NTFS-3G for Mac OS X from here. If needed, mount manually using "mount -t ntfs-3g /dev/disk2s1 /Volumes/disk2s1"

MySQL

Three Ways of Backup

  • Dump (backup) MySQL databases to XML
mysqldump -u $USER --password=$PASSWORD --all-databases --xml > mysql.sql.xml
  • MediaWiki MySQL XML dump
php dumpBackup.php --full > group_wiki.sql.xml
  • Dump MySQL databases using crontab
nice -n 19 mysqldump -u $USER -p$PASSWORD $DATABASE --default-character-set=$CHARSET -c | nice -n 19 gzip -9 > mysql-$DATABASE-$(date '+%Y%m%d').sql.gz
set CHARSET=binary or utf8
set DATABASE=--all-databases or group_wiki or group_web

Miscellaneous

List User

select * from mysql.user;
select User from mysql.user;

Delete User

delete from mysql.user WHERE User='name';

Show character set

status: Server characterset

SHOW CREATE TABLE text: DEFAULT CHARSET

FreeBSD

Upgrade installed software using Packages

  1. ports-mgmt/portmaster: has no dependency but needs an up-to-date ports tree
    • portmaster -P: use packages, but build port if not available;
    • portmaster -PP: fail if no package is available;
    • portmaster -r: build the specified port and all ports that depend on it;
    • portmaster --clean-distfiles: delete out-dated distfiles not referenced by any installed port.
  2. ports-mgmt/portupgrade: needs the ports tree
    • portupgrade -aP: will upgrade all your packages and build those missing in the latest version from the ports tree;
    • portupgrade -aPP: will upgrade only when packages are available;
    • portmaster -r: act on all those packages depending on the given packages as well;
    • portmaster -R: act on all those packages required by the given packages as well;
    • portupgrade -aD: delete failed distfiles for all packages.
  3. pkg_upgrade in sysutils/bsdadminscripts: does not need the ports tree
  4. Sample workflow:
    • Security patches:
    freebsd-update fetch
    freebsd-update install

    freebsd-update does not change the patch level shown by "uname -a" (such as from 9.0-RELEASE-p3 to 9.0-RELEASE-p5), unless the kernel is also updated. The file "/var/db/freebsd-update/tag" will always contain the actual patch level information.

    • Major and minor version update:
    freebsd-update -r 8.2-RELEASE-p9 upgrade
    portmaster -f #all third party software needs to be rebuilt and re-installed, as they may depend on libraries which have been removed during the upgrade process
    freebsd-update install #tie up all the loose ends in the upgrade process
    • Update Ports Collection:
    portsnap fetch update
    • Upgrade Ports:

    First read /usr/ports/UPDATING for additional steps users may need to perform when updating a port. Then use either portmaster or portupgrade to perform the actual upgrade.

    portupgrade -aP

    or

    portupgrade -PrR [package_name]

System Settings

Disabling the hardware bell/beep

Type the following command to disable for current session:

sysctl hw.syscons.bell=0

Make sure settings remains same after you reboot the laptop, enter:

echo "hw.syscons.bell=0" >> /etc/sysctl.conf

/etc/rc.conf

defaultrouter="111.222.111.254"
hostname="server.dept.univ.edu"
ifconfig_bge0="inet 111.222.111.222  netmask 255.255.255.0"
ifconfig_xl0="inet 10.1.255.127  netmask 255.255.0.0"
nfs_client_enable="YES"
nfs_server_enable="YES"
rpcbind_enable="YES"
sshd_enable="YES"
apache22_enable="YES"
mysql_enable="YES"
fusefs_enable="YES"
zfs_enable="YES"
ipfilter_enable="YES"
ipfilter_rules="/home/admin/scripts/FreeBSD/ipf.rules"
ipmon_enable="YES"
ipmon_flags="-Ds"
#inetd_enable="YES"
#ntpd_enable="YES"
#cvslockd_enable="YES"
#ftpd_enable="YES"

/etc/ntp.conf

restrict default ignore
restrict 127.0.0.1
restrict -6 ::1
restrict 128.101.162.0 mask 255.255.255.0 nomodify notrap
server 127.127.1.0
fudge 127.127.1.0 stratum 10

Rocks cluster

Management of computer clusters

Intelligent Platform Management Interface (IPMI) is usually available for server machines. It can use the dedicated IPMI Ethernet port or share the first LAN port (so make sure the first port is connected to the internal network switch) for remote monitoring and control.

KVM switch can be used for non-server workstations or older machines.

Please refer to the User Manuals page for details on how to use IPMI or KVM. SuperMicro has a suite of Server Management Utilities to perform health monitoring, power management and firmware maintenance (BIOS and IPMI/BMC firmware upgrade). Rocks also bundles the OpenIPMI console interface.

Installation

Follow the Users Guide in the Support and Docs section of Rocks cluster’s web site.

  • Reserve a certain amount of disk space for compute nodes that will not be overwritten when reinstalling happens. 20G seems enough for the operating system and software. Remember: the gateway should be 128.101.162.54!
  • Update kernel to the latest version. Update to newer versions when they come out.
yum --enablerepo base upgrade kernel
yum --enablerepo base upgrade kernel-devel
yum --enablerepo base upgrade kernel-headers
cp /var/cache/yum/base/packages/kernel*.rpm /export/rocks/install/contrib/6.1.1/x86_64/RPMS/
cd /export/rocks/install; rocks create distro
reboot

Check if you indeed has the desired version, then kickstart the nodes.

uname -r
while read cn; do rocks run host $cn '/boot/kickstart/cluster-kickstart'; sleep 3; done < <(rocks list host compute|cut -d ':' -f 1)
  • Create user accounts (see Add a user) before installing anything else so that there is less chance that the desired UID/GIDs conflict with software-generated accounts, and set disk quota (see Implement disk quota) to prevent any user inadvertently generating a huge amount of data from affecting the entire system.
  • Install ZFS on Linux (see Use the ZFS file system)
  • Install the most recent Torque roll
rocks add roll /path/to/torque/roll.iso
rocks enable roll torque
cd /export/rocks/install; rocks create distro
rocks run roll torque | sh
reboot

Configuring Environment Modules package

It is recommended that modulefiles are stored in a directory shared among all nodes. For example, create the directory under /share/apps, and add it to /usr/share/Modules/init/.modulespath:

mkdir /share/apps/modulefiles
echo "/share/apps/modulefiles" >> /usr/share/Modules/init/.modulespath

Finally, make sure the .modulespath file is broadcast to all nodes (see how to keep files up to date on all nodes using the 411 Secure Information Service).

Using the ZFS file system

Due to the licensing of the software, ZFS on Linux is supplied in source code only even if you have already selected the zfs-linux roll when installing Rocks cluster. Please refer to zfs-linux Roll: Users Guide for how to build the binaries.

  • Create a zpool for each additional hard drive that is not used as the system disk, and create a ZFS file system for each active user with compression, NFS sharing, and quota turned on. Compression with ZFS carries very little overhead and because of the reduced file size it sometimes even improves IO.
zpool create space raidz2 /dev/sda /dev/sdb ... raidz2 /dev/sdp /dev/sdq ... raidz2 sdx sdy ... spare sdz ...
zfs set atime=off space
zfs set compression=gzip space

for u in active_user1 active_user2 ...; do
  zfs create space/$u
  zfs set compression=lz4 space/$u
  zfs set sharenfs=on space/$u
  zfs set quota=100G space/$u
  chown -R $u:$u /space/$u
done

To make these file systems available as /share/$USER/spaceX, add the following line to the end of /etc/auto.share

* -fstype=autofs,-Dusername=& file:/etc/auto.zfsfs

And create /etc/auto.zfsfs with the following contents, and propagate it using 411.

* -nfsvers=3 cluster.local:/&/${username}

You need to enable the share points on every boot by adding to /etc/rc.d/rc.local the following line:

zfs share -a

For how to enable them automatically, see ZFS Administration, Part XV- iSCSI, NFS and Samba.

NOTE: Sometimes “zfs share -a” does not populate “/var/lib/nfs/etab” and make /share/$USER/space available on other nodes. A work-around is simply to execute “zfs set sharenfs=on space/SOME_USER” on any user before calling “zfs share -a”.

Automatic backup

ZFS uses copy-on-write and, as a result, snapshots can be created very quickly and cheaply. Create the following script as /etc/cron.daily/zfs-snapshot to keep the last 7 daily, 5 weekly, 12 monthly, and 7 yearly backups.

#!/bin/bash

snapshot() {
  local root=$1
  local prefix=$2
  local keep=$3

  zfs list -t filesystem -o name -H -r "$root" | while read fs; do
    [ "$fs" == "$root" ] && continue

    # echo "zfs snapshot $fs@$prefix-$(date '+%Y%m%d')"
    zfs snapshot "$fs@$prefix-$(date '+%Y%m%d')"

    zfs list -t snapshot -o name -s creation -H -r "$fs" | grep "$prefix" | head -n "-$keep" | while read ss; do
      # echo "zfs destroy $ss"
      zfs destroy $ss
    done
  done
}

snapshot "space" "daily" 7
[ $(date +%w) -eq 0 ] && snapshot "space" "weekly" 5
[ $(date +%-d) -eq 1 ] && snapshot "space" "monthly" 12
[ $(date +%-j) -eq 1 ] && snapshot "space" "yearly" 7

Periodic error checking

Hard drives can have silent data corruption. ZFS can detect and correct these errors on a live system. Create the following script as /etc/cron.monthly/zfs-scrub (or in /etc/cron.weekly if using cheap commodity disks):

#!/bin/sh

zpool scrub space

Slurm

Add new queues to /etc/slurm/partitions:

PartitionName=E5_2650v4 DEFAULT=YES STATE=UP TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0" DefaultTime=60 DefMemPerCPU=512 nodes=compute-0-[0-139]
PartitionName=4170HE DEFAULT=YES STATE=UP TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0" DefaultTime=60 DefMemPerCPU=512 nodes=compute-2-[0-31]

And make the following changes in /etc/slurm/slurm.conf:

AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=all
FairShareDampeningFactor=5
GresTypes=gpu
PriorityType=priority/multifactor
PriorityFlags=FAIR_TREE
PriorityDecayHalfLife=14-0
PriorityUsageResetPeriod=NONE
PriorityFavorSmall=NO
PriorityMaxAge=1-0
PriorityWeightAge=10
PriorityWeightFairshare=10000
PriorityWeightJobSize=0
PriorityWeightPartition=10000
PriorityWeightQOS=0
PriorityWeightTRES=cpu=0,mem=0,gres/gpu=0

SelectType=select/cons_res
SelectTypeParameters=CR_Core
TmpFs=/state/partition1

Finally, update compute node attributes, sync the configuration to all nodes, and set a maximum walltime:

rocks report slurm_hwinfo | sh
rocks sync slurm
sacctmgr modify cluster where cluster=cluster set maxwall=96:00:00

Slurm by default forbids logging in to compute nodes unless the user has jobs running on that node. If this behavior is not desired, disable it by:

rocks set host attr attr=slurm_pam_enable value=false
rocks sync slurm

Reservation

You can use reservations to drain the cluster for maintenance.

scontrol create reservation starttime=2018-07-06T09:00:00 duration=600 user=root flags=maint,ignore_jobs nodes=ALL

Configuring Torque compute node settings

Edit /var/spool/torque/server_priv/nodes to include node specifications, such as:

compute-0-0 np=8  ntE5-2609 ps2400 E5-26xx
compute-1-0 np=8  ntE5430   ps2660 E54xx
compute-2-0 np=8  ntE5420   ps2500 E54xx
compute-3-0 np=8  ntE5410   ps2330 E54xx
compute-4-0 np=8  ntE5405   ps2000 E54xx
cluster.dept.univ.edu np=4 ntE5405 ps2000 E54xx

after which restart pbs_server by executing “service pbs_server restart”. In this example, the prefixes “nt” and “ps” (configured in maui.cfg) are used to denote node type and processor speed information.

Making your frontend run queued jobs for PBS (Torque/Maui)

If you have installed the Torque roll, issue the following commands as root on the frontend.

The first line setting $frontend just assures that the name matches that returned by /bin/hostname (which is generally the FQDN). They must match, or pbs_mom will refuse to start/work.

The next two lines set the number of cores to be used for running jobs. You probably should reserve a few cores for all the Rocks overhead processes, and for interactive logins, compiling, etc. In this example, we save 4 cores for the overhead and assign the rest for jobs. This is accomplished by setting the “np = $N” (np means number of processors) value.

export frontend=`/bin/hostname`
export N=`cat /proc/cpuinfo | grep processor | wc -l`
export N=`expr $N - 4` # reserve 4 cores
#
qmgr -c "create node $frontend"
qmgr -c "set node $frontend np = $N"
qmgr -c "set node $frontend ntype=cluster"
service pbs_server restart

Alternatively, you can edit /opt/torque/server_priv/nodes by hand, and do “service pbs_server restart” to make it re-read the file. Next, make sure pbs_mom is started on the frontend:

scp compute-0-0:/etc/pbs.conf /etc
chkconfig --add pbs_mom
service pbs_mom start

If you have no compute nodes, you can create /etc/pbs.conf by hand. It should look like this:

pbs_home=/opt/torque
pbs_exec=/opt/torque
start_mom=1
start_sched=0
start_server=0

You should now be able to see the frontend listed in the output of “pbsnodes -a”, and any jobs submitted to the queue will run there.

Creating additional queues in Torque

Run the following commands as root to create two queues, E5-26xx and E54xx, which include only nodes with the corresponding features, as can be defined in /var/spool/torque/server_priv/nodes (see Configure Torque compute node settings).

qmgr -c "create queue E5-26xx queue_type=execution,started=true,enabled=true,resources_max.walltime=360:00:00,resources_default.walltime=24:00:00,resources_default.neednodes=E5-26xx"
qmgr -c "create queue E54xx queue_type=execution,started=true,enabled=true,resources_max.walltime=360:00:00,resources_default.walltime=24:00:00,resources_default.neednodes=E54xx"

NOTE: Separate queues are not necessary for requesting jobs to be run on certain machines. Similar effect can be accomplished by specifying node features in the submission script, for example:

#PBS -l nodes=1:E5-26xx:ppn=1

Configuring Maui scheduler behavior

Change the settings in /opt/maui/maui.cfg to the following, and add the parameters if not already present. Restart maui to incorporate the changes: service maui restart

# Job Prioritization: http://www.adaptivecomputing.com/resources/docs/maui/5.1jobprioritization.php

QUEUETIMEWEIGHT       1
XFACTORWEIGHT         86400
XFMINWCLIMIT          00:15:00
FSWEIGHT              86400
FSUSERWEIGHT          1

# Fairshare: http://www.adaptivecomputing.com/resources/docs/maui/6.3fairshare.php

FSPOLICY              DEDICATEDPS
FSDEPTH               7
FSINTERVAL            1:00:00:00
FSDECAY               0.80

# Backfill: http://www.adaptivecomputing.com/resources/docs/maui/8.2backfill.php

BACKFILLPOLICY        BESTFIT
BACKFILLMETRIC        PROCSECONDS
RESERVATIONPOLICY     CURRENTHIGHEST

# Node Allocation: http://www.adaptivecomputing.com/resources/docs/maui/5.2nodeallocation.php

NODEALLOCATIONPOLICY  PRIORITY
NODECFG[DEFAULT]      PRIORITYF='-LOAD - 5*USAGE'

# Creds: http://www.adaptivecomputing.com/resources/docs/maui/6.1fairnessoverview.php

USERCFG[DEFAULT]      FSTARGET=25.0

# Node Set: http://www.adaptivecomputing.com/resources/docs/maui/8.3nodesetoverview.php

NODESETDELAY          0:00:00
NODESETPRIORITYTYPE   MINLOSS
NODESETATTRIBUTE      FEATURE
NODESETPOLICY         ONEOF
NODESETLIST           E5-26xx E54xx
NODESETTOLERANCE      0.0

# Node Attributes: http://www.adaptivecomputing.com/resources/docs/maui/12.2nodeattributes.php

FEATURENODETYPEHEADER nt
FEATUREPROCSPEEDHEADER ps$

HTCondor

Basic settings

To implement wall time limit (specify “+WallTime = SECONDS” in the job submission file), default file system behavior, and ignore console activity, create /opt/condor/etc/config.d/98Rocks.conf with the following contents and propagate it using 411:

DefaultWallTime = 12 * $(HOUR)
EXECUTE = /state/partition1/condor_jobs
FILESYSTEM_DOMAIN = cluster.group.dept.univ.edu
MaxWallTime = 96 * $(HOUR)
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOTS_CONNECTED_TO_CONSOLE = 0
SLOTS_CONNECTED_TO_KEYBOARD = 0
SLOT_TYPE_1 = 100%
SLOT_TYPE_1_PARTITIONABLE = TRUE
START = ifThenElse(isUndefined(WallTime), $(DefaultWallTime), WallTime) <= $(MaxWallTime)
SYSTEM_PERIODIC_REMOVE = RemoteUserCpu + RemoteSysCpu > CpusProvisioned * ifThenElse(isUndefined(WallTime), $(DefaultWallTime), WallTime) || \
                         RemoteWallClockTime > ifThenElse(isUndefined(WallTime), $(DefaultWallTime), WallTime)
TRUST_UID_DOMAIN = True
UID_DOMAIN = group.dept.univ.edu

Then create the job directory on all compute nodes:

rocks run host command='mkdir -p /state/partition1/condor_jobs; chmod 755 /state/partition1/condor_jobs'

MPI jobs

Enable MPI:

rocks set attr Condor_EnableMPI true
rocks sync host condor frontend compute

Put the following files (named condor_openmpi.sh and condor_parallel_hosts.sh) in $MPIHOME/bin directory:

#!/bin/bash

##**************************************************************
## This is a script to run openmpi jobs under the Condor parallel universe.
## Collects the host and job information into $_CONDOR_PARALLEL_HOSTS_FILE
## and executes
##   $MPIRUN --prefix $MPI_HOME --hostfile $_CONDOR_PARALLEL_HOSTS_FILE $@
## command
## The default value of _CONDOR_PARALLEL_HOSTS_FILE is 'parallel_hosts'
##
## The script assumes:
##  On the head node (_CONDOR_PROCNO == 0) :
##    * $MPIRUN points to the mpirun command
##    * condor_parallel_hosts.sh is in $PATH.
##  On all nodes:
##    * openmpi is installed into $MPI_HOME directoy
##**************************************************************

#----------------------------
MPIRUN=mpirun
MPI_HOME=$(which $MPIRUN)
MPI_HOME=${MPI_HOME%/bin/$MPIRUN}
_CONDOR_PARALLEL_HOSTS_FILE=parallel_hosts
_CONDOR_TEMP_DIR=/state/partition1
#----------------------------

_CONDOR_LIBEXEC=`condor_config_val libexec`
_CONDOR_PARALLEL_HOSTS=$MPI_HOME/bin/condor_parallel_hosts.sh
_CONDOR_SSH_TO_JOB_WRAPPER=$MPI_HOME/bin/condor_ssh_to_job_wraper.sh

# Creates parallel_hosts file containing contact info for hosts
# Returns on head node only
$_CONDOR_PARALLEL_HOSTS
ret=$?
if [ $ret -ne 0 ]; then
    echo Error: $ret creating $_CONDOR_PARALLEL_HOSTS_FILE
    exit $ret
fi

# Starting mpirun cmd
#exec $MPIRUN --prefix $MPI_HOME --mca orte_rsh_agent $_CONDOR_SSH_TO_JOB_WRAPPER --hostfile $_CONDOR_PARALLEL_HOSTS_FILE $@
exec $MPIRUN --prefix $MPI_HOME --hostfile $_CONDOR_PARALLEL_HOSTS_FILE -map-by core -bind-to core --tmpdir $_CONDOR_TEMP_DIR $@

rm -f $_CONDOR_PARALLEL_HOSTS_FILE
#!/bin/bash

##**************************************************************
## This script collects host and job information about the running parallel job,
## and creates a hostfile including contact info for remote hosts
##**************************************************************

## Helper fn for getting specific machine attributes from $_CONDOR_MACHINE_AD
function CONDOR_GET_MACHINE_ATTR() {
    local attr="$1"
    awk '/^'"$attr"'[[:space:]]+=[[:space:]]+/ \
        { ret=sub(/^'"$attr"'[[:space:]]+=[[:space:]]+/,""); print; } \
        END { exit 1-ret; }' $_CONDOR_MACHINE_AD
    return $?
}

## Helper fn for getting specific job attributes from $_CONDOR_JOB_AD
function CONDOR_GET_JOB_ATTR() {
    local attr="$1"
    awk '/^'"$attr"'[[:space:]]+=[[:space:]]+/ \
        { ret=sub(/^'"$attr"'[[:space:]]+=[[:space:]]+/,""); print; } \
        END { exit 1-ret; }' $_CONDOR_JOB_AD
    return $?
}

## Helper fn for printing the host info
function CONDOR_PRINT_HOSTS() {
    local clusterid=$1
    local procid=$2
    local reqcpu=$3
    local rhosts=$4
    # tr ',"' '\n' <<< $rhosts | /bin/grep -v $hostname | \
    tr ',"' '\n' <<< $rhosts | \
    awk '{ sub(/slot.*@/,""); if ($1 != "") { slots[$1]+='$reqcpu'; subproc[$1]=id++; } } \
        END { for (i in slots) print i" slots="slots[i]" max_slots="slots[i]; }'
        #END { for (i in slots) print i"-CONDOR-"'$clusterid'".1."subproc[i]" slots="slots[i]" max_slots="slots[i]; }'
}

# Defaults for error testing
: ${_CONDOR_PROCNO:=0}
: ${_CONDOR_NPROCS:=1}
: ${_CONDOR_MACHINE_AD:="None"}
: ${_CONDOR_JOB_AD:="None"}

##**************************************************************
## Usage: CONDOR_GET_PARALLEL_HOSTS_INFO [hostfile]
## If hostfile omitted 'parallel_hosts' is used.
## Return:
##   The function returns with error status on main process (_CONDOR_PROCNO==0).
##   The function never returns on on the other nodes (sleeping).
## The created file structure:
##   HostName1'-CONDOR-'CLusterID.ProcId.SubProcId 'slots='Allocated_CPUs 'max_slots='Allocated_CPUs
##   HostName2'-CONDOR-'CLusterID.ProcId.SubProcId 'slots='Allocated_CPUs 'max_slots='Allocated_CPUs
##   HostName3'-CONDOR-'CLusterID.ProcId.SubProcId 'slots='Allocated_CPUs 'max_slots='Allocated_CPUs
##   ...
##**************************************************************
#function CONDOR_GET_PARALLEL_HOSTS_INFO() {
    # getting parameters if _CONDOR_PARALLEL_HOSTS_FILE not set
    : ${_CONDOR_PARALLEL_HOSTS_FILE:=$1}
    # setting defaults
    : ${_CONDOR_PARALLEL_HOSTS_FILE:=parallel_hosts}
    #local hostname=`hostname -f`
    if [ $_CONDOR_PROCNO -eq 0 ]; then
    # collecting info on the main proc
        #clusterid=`CONDOR_GET_JOB_ATTR ClusterId`
        #local ret=$?
        #if [ $ret -ne 0 ]; then
        #    echo Error: get_job_attr ClusterId
        #    return 1
        #fi
        #local line=""
        #condor_q -l $clusterid | \
        cat $_CONDOR_JOB_AD | \
        awk '/^ProcId.=/ { ProcId=$3 } \
             /^ClusterId.=/ { ClusterId=$3 } \
             /^RequestCpus.=/ { RequestCpus=$3 } \
             /^RemoteHosts.=/ { RemoteHosts=$3 } \
             END { if (ClusterId != 0) print ClusterId" "ProcId" "RequestCpus" "RemoteHosts  }' | \
        while read line; do
            CONDOR_PRINT_HOSTS $line
        done | sort -d > ${_CONDOR_PARALLEL_HOSTS_FILE}
    else
    # endless loop on the workers
        while true ; do
            sleep 30
        done
    fi
#    return 0
#}

To request a parallel job, add the following to the job submission script:

machine_count = NODES
request_cpus = CORES_PER_NODE
universe = parallel

And use condor_openmpi.sh instead of mpirun for parallel execution.

SGE

Enter

qconf -mconf

, and make the following changes:

min_uid                      500
min_gid                      500
execd_params                 ENABLE_ADDGRP_KILL=true
auto_user_fshare             1000
auto_user_delete_time        0

Enter

qconf -msconf

, and make the following changes:

job_load_adjustments              NONE
load_adjustment_decay_time        0
weight_tickets_share              10000
weight_ticket                     10000.0

Enter

qconf -mq all.q

, and make the following changes:

load_thresholds       NONE
h_rt                  96:00:00

Create a file (say “tmp_share_tree”):

id=0
name=Root
type=0
shares=1
childnodes=1
id=1
name=default
type=0
shares=1000
childnodes=NONE

And use it to create a share tree fair share policy:

qconf -Astree tmp_share_tree && rm tmp_share_tree

.

Kill zombie jobs

SGE sometimes fails to kill all processes of a job. Use the following script to clean up these zombie processes (as well as rogue sessions by users who directly ssh to compute nodes):

#!/bin/bash

launcher_pid=($(gawk 'NR==FNR{shepherd_pid[$0];next} ($1 in shepherd_pid){print $2}' <(pgrep sge_shepherd) <(ps -eo ppid,pid --no-headers)))
# Assume regular users have UIDs >=600
rogue_pid=($(gawk 'NR==FNR{launcher_pid[$0];next} ($1>=600)&&(!($2 in launcher_pid)){print $3}' <(printf "%s\n" "${launcher_pid[@]}") <(ps -eo uid,sid,pid --no-headers)))

# Do not allow any rogue processes if there are >1 jobs running on the
# same node; if a single job has the entire node, then allow the job
# owner to run unmanaged processes, while making sure that zombie
# processes from this user are still killed; if no jobs are running,
# then allow unmanaged processes (e.g., testing)
[ ${#launcher_pid[@]} -eq 0 ] && exit 0
uid=($(ps -p "$(echo ${launcher_pid[@]})" -o uid= | sort | uniq))
if [ ${#uid[@]} -gt 1 ]; then
  # echo ${rogue_pid[@]}
  kill -9 ${rogue_pid[@]}
elif [ ${#uid[@]} -eq 1 ]; then
  stime=$(gawk '{print $22}' /proc/${launcher_pid[0]}/stat)
  for (( i=0; i<${#rogue_pid[@]}; i++ )); do
    rogue_uid=$(ps -p ${rogue_pid[i]} -o uid=)
    if [ -n "$rogue_uid" ] && { [ $rogue_uid -ne $uid ] || [ $(gawk '{print $22}' /proc/${rogue_pid[i]}/stat) -lt $stime ]; }; then
      # echo ${rogue_pid[i]}
      kill -9 ${rogue_pid[i]}
    fi
  done
fi

It can be enforced as a system cron job, by adding into extend-compute.xml, between “” and “”:

<file name="/etc/cron.d/kill-zombie-jobs" perms="0600">
*/15 * * * * root /opt/gridengine/util/kill-zombie-jobs.sh
</file>

Remember to escape ampersand, quotes, and less than characters if you use extend-compute.xml to create this script.

Disabling hyper-threading

Based on some crude benchmarks, Intel Hyper-Threading appears to be detrimental to CPU-intensive work loads. It can be turned off in BIOS via IPMI, but if there are too many nodes and IPMI does not allow scripting the command, an alternative exists by extending compute nodes. First figure out the CPU layout using the lstopo program of hwloc, and add the following between “” and “” in extend-compute.xml (assuming 24--47 are virtual cores):

<file name="/etc/rc.d/rocksconfig.d/post-89-disable-hyperthreading" perms="0755">
#!/bin/sh
for i in {24..47}; do echo 0 > /sys/devices/system/cpu/cpu$i/online; done
</file>

Installing Software

After installing a new software package, add an entry, either a single file or a directory named some_software, in the directory /share/apps/modules/modulefiles. In the case of multiple files (representing different software versions) existing in the directory, create file .version to specify the default version.

Using Rocks Rolls

Refer to the Roll Developer’s Guide in the Support and Docs section of Rocks cluster’s web site for how to create your own Rolls.

rocks set host attr localhost roll_install_on_the_fly true shadow=yes # for installing Service Pack Rolls
rocks add roll /path/to/rollname.iso
rocks enable roll rollname
cd /export/rocks/install; rocks create distro
rocks run roll rollname | sh
reboot

After the the frontend comes back up you should do the following to populate the node list:

rocks sync config

then kickstart all your nodes

while read cn; do rocks run host $cn '/boot/kickstart/cluster-kickstart'; sleep 3; done < <(rocks list host compute|cut -d ':' -f 1)

If installing Service Pack Rolls, it is critical that you run cluster-kickstart-pxe as it will force the compute nodes to PXE boot. It is important that you PXE boot the nodes for the first install, because with a PXE boot based install, the nodes with get their initrd from the frontend and inside the initrd is a new tracker-client that is compatible with the new tracker-server. After the first install, a new initrd will be on the hard disk of the installed nodes and then it is safe to run /boot/kickstart/cluster-kickstart.

while read cn; do rocks run host $cn '/boot/kickstart/cluster-kickstart-pxe'; sleep 3; done < <(rocks list host compute|cut -d ':' -f 1)

Using YUM repositories

Several YUM repositories are configured but disabled by default. Add “--enablerepo=REPO_NAME” to yum commands to temporarily enable REPO_NAME.

yum repolist all #Display all configured software repositories
yum clean all #clean cache
yum [--enablerepo=REPO_NAME] check-update #update package information
yum list openmotif* #list packages
yum install openmotif openmotif-devel #requirement for Grace and NEdit

Adding a software package distributed as RPMs

Create a roll first:

cd /export/rocks/install/contrib/5.4/x86_64/RPMS
wget http://url/to/some_software.rpm
cd /export/rocks/install/site-profiles/5.4/nodes
cp skeleton.xml extend-compute.xml

Edit extend-compute.xml, remove unused “” lines

cd /export/rocks/install; rocks create distro

Now reinstall the compute nodes:

while read cn; do rocks run host $cn '/boot/kickstart/cluster-kickstart-pxe'; sleep 3; done < <(rocks list host compute|cut -d ':' -f 1)

Adding a software application distributed as source code

Install it into the /share/apps/some_software directory. A typical process is shown below:

wget http://url/to/some_software.tar.bz2
tar xjf some_software.tar.bz2 -C some_software
cd some_software
./configure --prefix=/share/apps/some_software
make -j 8
sudo make install clean

Uninstalling Software

Removing Rolls

rocks disable roll rollname
rocks remove roll rollname
cd /export/rocks/install; rocks create distro
rocks sync config
while read cn; do rocks run host $cn '/boot/kickstart/cluster-kickstart'; sleep 3; done < <(rocks list host compute|cut -d ':' -f 1)

Upgrade

  • Create an update roll:
rocks create mirror http://mirror.centos.org/centos/6/updates/x86_64/Packages/ rollname=CentOS_6_X_update_$(date '+%Y%m%d')
rocks create mirror http://mirror.centos.org/centos/6/os/x86_64/Packages/  rollname=Centos_6_X

X should be the current minor release number (i.e., X should be 10 if latest stable version of Centos is 6.10).

Add the created update roll created to the installed distribution

rocks add roll CentOS_6_X_update_$(date '+%Y%m%d')-*.iso
rocks add roll Centos_6_X-*.iso
rocks enable roll Centos_6_X
rocks enable roll CentOS_6_X_update_$(date '+%Y%m%d')
cd /export/rocks/install; rocks create distro
  • New installed nodes will automatically get the updated packages. It is wise to test the update on a compute nodes to verify that updates did not break anything. To force a node to reinstall, run the command:
rocks run host compute-0-0 /boot/kickstart/cluster-kickstart

If something goes wrong you can always revert the updates removing the update roll.

rocks remove roll CentOS_6_X_update_$(date '+%Y%m%d')
rocks remove roll Centos_6_X
cd /export/rocks/install; rocks create distro
  • After you tested the update on some nodes with the previous step, you can update the frontend using the standard yum command
yum update

Updating zfs-linux

Use the opportunity of the kernel update to rebuild and reinstall zfs-linux by following the steps on Users Guide: Updating the zfs-linux Roll:

cd ~/tools
git clone https://github.com/rocksclusters/zfs-linux.git
make binary-roll

rocks remove roll zfs-linux
rocks add roll zfs-linux*.iso
rocks enable roll zfs-linux
cd /export/rocks/install; rocks create distro

zfs umount -a
service zfs stop
rmmod zfs zcommon znvpair zavl zunicode spl zlib_deflate

rocks run roll zfs-linux | sh

Additional notes for Rocks 6

Apache httpd updates on Rocks 6 break the 411 service that runs on unencrypted HTTP protocol. Fix with the following:

echo 'HttpProtocolOptions Unsafe' >> /etc/httpd/conf/httpd.conf
service httpd restart

Backup

Create a Restore Roll that will contain site-specific info and can be used to upgrade or reconfigure the existing cluster quickly.

cd /export/site-roll/rocks/src/roll/restore
make roll

Adminstration

Adding a user

  • /usr/sbin/useradd -u UID USERNAME creates the home directory in /export/home/USERNAME (based on the settings in /etc/default/useradd) with UID as the user ID. If the desired user ID or the group ID has already been used, change them using:
usermod -u NEW_UID EXISTING_USER
# or
groupmod -g NEW_GID EXISTING_GROUP
  • rocks sync users adjusts all home directories that are listed as /export/home as follows:
  1. edit /etc/password, replacing /export/home/ with /home/
  2. add a line to /etc/auto.home pointing to the existing directory in /export/home
  3. 411 is updated, to propagate the changes in /etc/passwd and /etc/auto.home

In the default Rocks configuration, /home/ is an automount directory. By default, directories in an automount directory are not present until an attempt is made to access them, at which point they are (usually NFS) mounted. This means you CANNOT create a directory in /home/ manually! The contents of /home/ are under autofs control. To “see” the directory, it's not enough to do a ls /home as that only accesses the /home directory itself, not its contents. To see the contents, you must ls /home/username.

Implementing disk quota

  • Edit /etc/fstab, look for the partitions you want have quota on (“LABEL=” or “UUID=”), and change “defaults” to “grpquota,usrquota,defaults” in that line.
  • Reboot, check quota state and turn on quota:
quotacheck -guvma
quotaon -guva
  • Set up a prototype user quota:
edquota -u PROTOTYPE_USER # -t DAYS to edit the soft time limits
  • Duplicate the quotas of the prototypical user to other users:
edquota -p PROTOTYPE_USER -u user1 user2 ...
  • To get a quota summary for a file system:
repquota /export

Exporting a new directory from the frontend to all the compute nodes

  • Add the directory you want to export to the file /etc/exports.

For example, if you want to export the directory /export/scratch1, add the following to /etc/exports:

/export/scratch1 10.0.0.0/255.0.0.0(rw)

This exports the directory only to nodes that are on the internal network (in the above example, the internal network is configured to be 10.0.0.0)

  • Restart NFS:
/etc/rc.d/init.d/nfs restart
  • Add an entry to /etc/auto.home (or /etc/auto.share).

For example, say you want /export/scratch1 on the frontend machine (named frontend-0) to be mounted as /home/scratch1 (or /share/scratch1) on each compute node. Add the following entry to /etc/auto.home (or /etc/auto.share):

scratch1 frontend-0:/export/scratch1

or

scratch1 frontend-0:/export/&
  • Inform 411 of the change:
make -C /var/411

Now when you login to any compute node and change your directory to /home/scratch, it will be automounted.

Keeping files up to date on all nodes using the 411 Secure Information Service

Add the files to /var/411/Files.mk, and execute the following:

make -C /var/411
rocks run host command="411get --all" #force all nodes to retrieve the latest files from the frontend immediately

Removing old log files to prevent /var filling up

Place the following in /etc/cron.daily:

#!/bin/sh

rm -f /var/log/*-20??????
rm -f /var/log/slurm/*.log-*
rm -f /var/lib/ganglia/archives/ganglia-rrds.20??-??-??.tar

Cleaning up temporary directories on compute nodes

Add a system cron job between “” and “” in extend-compute.xml:

<file name="/etc/cron.weekly/clean-scratch" perms="0700">
#!/bin/sh
find /tmp /state/partition1 -mindepth 1 -mtime +7 -type f ! -wholename /state/partition1/condor_jobs -exec rm -f {} \;
find /tmp /state/partition1 -mindepth 1 -depth -mtime +7 -type d ! -wholename /state/partition1/condor_jobs -exec rmdir --ignore-fail-on-non-empty {} \;
</file>

This will be picked up by /etc/anacrontab or /etc/cron.d/0hourly.

Managing firewall

The following rules allow access to the web server from UMN IPs:

rocks remove firewall host=cluster rulename=A40-HTTPS-PUBLIC-LAN
rocks add firewall host=cluster rulename=A40-HTTPS-PUBLIC-LAN service=https protocol=tcp chain=INPUT action=ACCEPT network=public flags='-m state --state NEW --source 128.101.0.0/16,134.84.0.0/16,160.94.0.0/16,131.212.0.0/16,199.17.0.0/16'
rocks remove firewall host=cluster rulename=A40-WWW-PUBLIC-LAN
rocks add firewall host=cluster rulename=A40-WWW-PUBLIC-LAN service=www protocol=tcp chain=INPUT action=ACCEPT network=public flags='-m state --state NEW --source 128.101.0.0/16,134.84.0.0/16,160.94.0.0/16,131.212.0.0/16,199.17.0.0/16'
rocks sync host firewall cluster

These add a few national labs to the allowed IPs for SSH:

rocks remove firewall global rulename=A20-SSH-PUBLIC
rocks add firewall global rulename=A20-SSH-PUBLIC service=ssh protocol=tcp chain=INPUT action=ACCEPT network=public flags='-m state --state NEW --source 128.101.0.0/16,134.84.0.0/16,160.94.0.0/16,131.212.0.0/16,199.17.0.0/16,140.221.69.0/24,130.20.235.0/24,134.9.50.0/24,131.243.2.0/24,128.55.209.0/24,160.91.205.0/24,132.175.108.0/24'
rocks sync host firewall

Alternatively, install DenyHost, which will read the log file for SSH authentication failures and then add the offending IPs to /ect/hosts.deny.

yum --enablerepo=epel install denyhosts
chkconfig denyhosts on
service denyhosts start
vim /etc/denyhosts.conf # configuration file

Changing the public IP address on the frontend

It is strongly recommended that the Fully-Qualified Host Name (e.g., cluster.dept.univ.edu) be chosen carefully and never be modified after the initial setup, because doing so will break several cluster services (e.g., NFS, AutoFS, and Apache). If you want to change the public IP address, you can do so by:

rocks set host interface ip frontend iface=eth1 ip=xxx.xxx.xxx.xxx
rocks set attr Kickstart_PublicAddress xxx.xxx.xxx.xxx
# Edit the IP address in /etc/hosts, /etc/sysconfig/network-scripts/ifcfg-eth1, /etc/yum.repos.d/rocks-local.repo
# It's important to enter the following commands in one line if you are doing this remotely, as the network interface will be stopped by the first command.
ifdown eth1; ifup eth1
rocks sync config
rocks sync host network