Using the ~orka CLI

~orka CLI provides a command-line interface to ~orka services. In summary, it allows you to:

Installing ~orka CLI

Setup user environment to run ~orka (on Debian 8.x)

Installing requirements

You need to have the following prerequisite software installed on your local box, in order running the ~orka cli commands:

Software Requirement
git to clone the ~orka source code from github
python ~orka and synnefo based on python language
python-dev, libffi-dev, libssl-dev needed dependencies for cryptography
gcc needed dependency for paramiko sshv2 implementation
pip to install kamaki from the pip repository
kamaki the synnefo CLI to interact with ~okeanos

Type the following commands to perform all the above requirements setup:

sudo apt-get update
sudo apt-get install -y git
sudo apt-get install -y python python-dev gcc libffi-dev libssl-dev
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
sudo pip install kamaki

Adding [orka] section to kamaki config file

You must open ~/.kamakirc and append these two lines:

[orka]                                                              
base_url = < e-science -IP- or -url address- >

Virtual environment

Optional but highly recommended is to install and use the orka package in a virtual environment:

sudo pip install virtualenv
mkdir .virtualenvs
cd .virtualenvs
virtualenv --system-site-packages orkaenv
. ../.virtualenvs/orkaenv/bin/activate
(with deactivate from command line you exit the virtual env)

The following commands get the source code from github and install orka (either directly or in a virtual environment):

cd
git clone <escience git repo> 
cd e-science/orka
[sudo if not using virtualenv] python setup.py install

Now ~orka CLI commands are usable from anywhere.

How to run ~orka commands

The basic format is $ orka [command] "arguments"

Optional arguments for all ~orka commands:

--auth_url="authentication url (default value='https://accounts.okeanos.grnet.gr/identity/v2.0')",
--token="an ~okeanos token (default value read from ~/.kamakirc)",
--server_url="orka web application url (default value read from ~/.kamakirc)",

"images" command

Images command has no required positional or optional arguments.

{orka images} command example

example for listing available images and their pithos uuid values:

orka images

"create" command

Required positional arguments for create command:

name: "name of the cluster" 
cluster_size: "total VMs, including master node" 
cpu_master: "master node: number of CPU cores" 
ram_master: "master node: ram in MB",
disk_master: "master node: hard drive in GB",
cpu_slave: "each slave node: number of CPU cores",
ram_slave: "each slave node: ram in MB",
disk_slave: "each slave node: hard drive in GB",
disk_template: "Standard or Archipelago"
project_name: "name of a ~okeanos project, to pull resources from"

Optional arguments for create command:

--image="Operating System (default value='Debian Base')",

(available images can be found with orka images command) --replication_factor="HDFS replication factor. Default is 2", --dfs_blocksize="HDFS block size (in MB). Default is 128", --personality="Defines a file that includes a public key to be injected to the master VM"

Create Hadoop cluster from a pre-configured image

Using the --image=Hadoop-2.5.2 argument creates the Hadoop cluster much faster because it utilises a specially created ~okeanos VM image with Java and YARN pre-installed. Omitting this argument ensures that the latest stable YARN version will be installed (but at the cost of much lower speed).

{orka create} command examples

example for creating a cluster with default optionals (without hadoop_image):

orka create Yarn_Test 2 2 2048 10 2 1024 10 Standard <project_name>

example for creating a cluster with a specific image:

orka create Yarn_Test 2 2 2048 10 2 1024 10 Standard <project_name> --image=<hadoop_image_name>

The following displays all the messages from the system for Hadoop cluster creation:

"list" command

Optional arguments for list command:

--status="One of:ACTIVE, PENDING, DESTROYED (case insensitive, shows only clusters of that status)"
--verbose (outputs full cluster details. Default off)

{orka list} command example

example for listing user clusters:

orka list --status=active --verbose

"info" command

Required positional arguments for info command:

cluster_id: "Cluster id in e-science database"

(cluster_id can be found with orka list command)

{orka info} command example

example for cluster info:

orka info <cluster_id>

"hadoop" command

Required positional arguments for hadoop command:

hadoop_status: "START | FORMAT | STOP (case insensitive)"
cluster_id: "Cluster id in e-science database"

(cluster_id can be found with orka list command)

{orka hadoop} command examples

example for hadoop start:

orka hadoop start <cluster_id>

example for hadoop stop:

orka hadoop stop <cluster_id>

"destroy" command

Required positional arguments for destroy command:

cluster_id: "Cluster id in e-science database"

(cluster_id can be found with orka list command)

{orka destroy} command example

example for destroy cluster:

orka destroy <cluster_id>

"node" command

orka node command provides sub-commands for adding or deleting a node to/from a Hadoop-YARN cluster.

"node add" command

Required positional arguments for node add command:

cluster_id: "Cluster id in e-science database"

(cluster_id can be found with orka list command)

{orka node add} command example

example for adding node to a cluster:

orka node add <cluster_id>

"node remove" command

Required positional arguments for node remove command:

cluster_id: "Cluster id in e-science database"

(cluster_id can be found with orka list command)

{orka node remove} command example

example for removing node from a cluster:

orka node remove <cluster_id>

"file" command

orka file command provides sub-commands for putting files to Hadoop filesystem from local, ftp/http and pithos sources, and getting files from Hadoop filesystem to local and pithos destinations.
It also provides a list sub-command for listing pithos files in the URI format expected by orka CLI.

"file list" command

orka file list is used for returning pithos object paths in the format expected by the source parameter of orka file put.

Optional arguments for file list command:

--container="a pithos+ container descriptor"

{orka file list} command example

orka file list --container=/<container_name>

"file get" command

Required positional arguments for file get command:

cluster_id: "Cluster id in e-science database"
source: "Hadoop Filesystem object"
destination: "Local or Pithos+ path"

Pithos destination is denoted by prepending "pithos://" to the object descriptor

{orka file get} command examples

orka file get <cluster_id> <hdfs_file_path> <local_file_path>

orka file get <cluster_id> <hdfs_file_path> pithos://<pithos_file_path>

"file put" command

Required positional arguments for file put command:

cluster_id: "Cluster id in e-science database"
source: "Local or ftp/http or pithos+ path"
destination: "Hadoop Filesystem object"

Optional arguments for file put command:

--user="remote user for ftp/http authentication if required"
--password="remote user password"

{orka file put} command examples

example for pithos source:

orka file put pithos://<pithos_file_path> <hdfs_file_path>

(Properly formatted source can be returned by orka file list command)

example for remote server source:

orka file put <cluster_id> <remote_http_or_ftp_url> <hdfs_file_path>

example for local filesystem source:

orka file put <cluster_id> <local_file_path> <hdfs_file_path>

"file mkdir" command

Required positional arguments for file mkdir command:

cluster_id: "Cluster id in e-science database"
directory: "destination directory on HDFS"

Optional arguments for file put command:

-p (recursive folder creation)

{orka file mkdir} command examples

example for HDFS folder creation in HDFS home:

orka file mkdir <cluster_id> <directory>

example for recursive HDFS folder creation:

orka file mkdir -p <cluster_id> <directory_with_non_existant_parent>

VRE commands

orka vre commands are used to manage VM appliances which cover a wide range of open-software stacks needed for everyday Research and Academic activities. These images are ~okeanos pre-cooked VMs created from Docker Hub images.

Required positional arguments for vre create command:

name: "name of the VRE server",
cpu: "number of CPU cores of VRE server",
ram: "ram in MiB of VRE server",
disk: "hard drive in GiB of VRE server",
disk_template: "Standard or Archipelago",
project_name: "name of a ~okeanos project, to pull resources from",
image: "name of VRE image"

Optional arguments for vre create command:

admin_password: "Admin password for VRE servers. Default is auto-generated",
admin_email: "Admin email for VRE DSpace image. Default is admin@dspace.gr"

admin_password must contain only uppercase and lowercase letters and numbers and be at least eight characters long.

{orka vre create} command examples

example for orka vre create with Drupal and DSpace images:

orka vre create Drupal_Test 2 2048 20 Standard <project_name> Drupal-7.37 --admin_password=My21PaSswOrd
orka vre create DSpace_Test 2 2048 20 Standard <project_name> DSpace-5.3 --admin_password=sOmEoTheRPassWorD --admin_email=mymail@gmail.com

Required positional arguments for vre destroy command:

server_id: "VRE Server id in e-science database"

(server_id is returned after creation of VRE server and will be added later to orka vre list command)

{orka vre destroy} command examples

example for orka vre destroy:

orka vre destroy <server_id>

vre images command has no required positional or optional arguments.

{orka vre images} command example

example for listing available VRE images and their pithos uuid values:

orka vre images

Optional arguments for vre list command:

--status="Choose from:ACTIVE, PENDING, DESTROYED (case insensitive, shows only VRE servers with specified status)"
--verbose (outputs full VRE server details. Default: off)

vre list command has no required positional arguments.

{orka vre list} command example

example for listing user VRE servers:

orka vre list --status=active --verbose

VRE images are built using widely used Docker images pulled from http://hub.docker.com Repository. Components (i.e. Docker layers) inside the VM are not directly accessible from the Linux host’s regular filesystem. In general, in order to access a docker container’s bash, type:

docker exec -t -i <container_name> bash

For example, to access the mysql layer (db) in the Drupal or Mediawiki image:

docker exec -t -i db bash
mysql -p

and give the admin_password when prompted for password.

In order to change the mysql root password, type:

docker exec -t -i db bash -c "mysqladmin -p<old_password> password <new_password>"

then, stop the docker service:

service docker stop

and find the config.json of the corresponding container, open the file and change the variable MYSQL_ROOT_PASSWORD = new_password

To find the config.json of the mysql (named db) container:

myvar=$(docker inspect db | grep "Id" | sed 's/[" ,:]//g' | sed 's/Id//g')
cd /var/lib/docker/containers/$myvar

Finally, start docker and containers, as the drupal example below:

service docker start
docker start db
docker start drupal

In case of Redmine image, to access the postgresql database:

docker exec -t -i redmine_postgresql_1 bash
psql -U redmine -d redmine_production -h localhost

and give the admin_password when prompt for password.

In order to change the postgresql password, type:

docker exec -t -i redmine_postgresql_1 bash

enter the postgresql prompt:

sudo -u postgres psql -U postgres -d redmine_production

and change the password:

alter user redmine password '<new_password>';

then, stop the docker service:

service docker stop

and find the config.json of the corresponding container, open the file and change the variable DB_PASS = new_password. The same should be done for the file /usr/local/redmine/docker-compose.yml

To find the config.json of the postgresql (named redmine_postgresql_1) container:

myvar=$(docker inspect redmine_postgresql_1 | grep "Id" | sed 's/[" ,:]//g' | sed 's/Id//g')
cd /var/lib/docker/containers/$myvar

Finally, start docker and containers:

service docker start
docker start redmine_postgresql_1
docker start redmine_redmine_1

For DSpace image, the database resides in the same container with DSpace. So, in order for the postgresql database to be accessed, the following commands are needed:

docker exec -t -i dspace bash
psql -U dspace -d dspace -h localhost

and give the admin_password. If the postgresql dspace password is changed, it must be also changed in the file /dspace/config/dspace.cfg, which is inside the DSpace container.

The entry db.password= inside the dspace.cfg file must be changed to reflect the change in postgresql. After the change, stop and start the docker dspace container. It needs around 4 minutes to be up and running, before DSpace URLs can be accessed.

In case of BigBlueButton, there is no admin account and no database. The recommended minimum hardware requirements are: 4 CPUs and 4 GiB ram.

VRE image Access URL


Drupal VRE server IP Mediawiki VRE server IP Redmine VRE server IP DSpace VRE server IP:8080/``jspui && VRE server IP:8080/``xmlui BigBlueButton VRE server IP

For example, in DSpace image case:

First, determine which directories are needed to be backed up.

- installation directory
- web deployment directory
    - In our case it resides inside the installation directory.

Next, open bash inside the DSpace container:

docker exec -it dspace bash

DSpace installation folder backup

nano /dspace/config/dspace.cfg
    #find line: <dspace.dir = {{dspace installation folder}}>
    #here it is /dspace
cd
    #backup will be saved on your (root) home directory
tar zcC /dspace > dspace_installation-backup-$(date +%Y-%m-%d).tar.gz .

DSpace installation folder restore

cd
tar zxC / -f dspace_installation-backup-{{select date}}.tar.gz

DSpace db backup

cd
#store password, so that dump doesn't ask for password for each database dumped
    nano .pgpass
        localhost:*:*:dspace:dspace
    chmod 600 .pgpass
pg_dump -Fc dspace -U dspace -h localhost > dspace_db-backup-$(date +%Y-%m-%d).bak

DSpace db restore

cd 
pg_restore -Fc dspace_db-backup-{{select date}}.bak -U dspace -h localhost

Docker installation directories

VRE image Installation directory


Drupal /var/www/html Mediawiki /var/www/html Redmine /home/redmine DSpace /dspace BigBlueButton /var/lib/tomcat6/webapps

https://www.docker.com/

https://docs.docker.com/articles/basics/

https://docs.docker.com/reference/commandline/exec/

Also, with

orka vre -h

helpful information about the orka vre CLI is depicted.

Reproducible Experiments commands

~orka supports the re-production of an experiment inside a Hadoop cluster. The main parameters of an experiment/algorithm/simulation as well as the capabilities of the system that executes this experiment comprise the environment of the experiment. ~okeanos IaaS facilitates the reproduction of this environment.

In order to fully describe the whole environment, i.e. system capabilities and algorithm, you should define:

cluster information
configuration settings
actions

Cluster information consists of:

Name of the cluster to be created
Size of the Hadoop cluster
Settings for master and slaves (CPU, Memory, Disk size, Disk template)
Pre-stored image to be used
~okeanos project for resources
Personality file for ssh access

Configuration settings:

The size of the blocks in HDFS
The replication factor

If you desire to re-use an already existing cluster then you need to state:

The id of the cluster
The IP of the master VM

Finally, you should define the list of actions. The actions that are currently supported are the following:

- Cluster management
    - Verb: start or stop ot format
    - Arguments: -
    - e.g.  start
            stop
            format
- Add/Remove nodes
    - Verb: node_add or node_remove
    - Arguments: -
    - e.g.  node_add
            node_remove
- Upload/Download files from hdfs
    - Verb: put or get
    - Arguments: source, destination
    - e.g.  put(source, destination)
            get(source, destination)
- Run job
    - Verb: run_job
    - Arguments: hadoop user, job
    - e.g.  run_job(user,job)
    * job should be placed in brackets ""
- Local command
    - Verb: local_cmd
    - Arguments: the command
    - e.g.  local_cmd(ls)

YAML files are suitable for the purposes of the ~orka Reproducible Research.

In this example, the cluster is created from scratch:

cluster:
    # cluster to be created
    disk_template: drbd
    flavor_master:
    - 4
    - 4096
    - 20
    flavor_slaves:
    - 4
    - 4096
    - 20
    image: Hadoop-2.5.2
    name: 'test'
    personality: /workspace/.ssh/id_rsa.pub
    project_name: escience.grnet.gr
    size: 3
configuration:
    # configuration settings
    dfs_blocksize: '128'
    replication_factor: '2'
actions:
    # list of actions
    - local_cmd (ls)
    - node_add
    - put (source,destination(hdfs))
    - run_job (user, "/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 2 10000")
    - get (source(hdfs),destination)
    - stop
    - format
    - start
    - node_remove
    - local_cmd (ls)

Re-use of the same cluster:

cluster:
    # cluster information
    cluster_id: 1
    master_IP: 12.345.678.90
actions:
    # list of actions
    - local_cmd (ls)
    - node_add
    - put (source,destination(hdfs))
    - run_job (user, "/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 2 10000")
    - get (source(hdfs),destination)
    - stop
    - format
    - start
    - node_remove
    - local_cmd (ls)

Getting help

Also, with

orka -h
orka { images | create | vre | destroy | node | list | info | hadoop | file } -h

helpful information about the ~orka CLI is depicted and

orka -V
orka --version

prints current version.

Uninstall ~orka CLI

You use pip to remove ~orka CLI from your system:

$ sudo pip uninstall orka

$ sudo rm /usr/local/bin/orka

Renew ~okeanos token

In case your ~okeanos token changes (manually or automatically) you should perform the following two actions:

The second action is only necessary on the clusters you use the Pithos storage system for Hadoop jobs. You can find the ~/.kamakirc file in /home/hduser path, except the Cloudera image, where it is stored in /var/lib/hadoop-hdfs/.