Hi fellows,
Started to test the SCīs LoadBalancer but something is not working well.
The LoadBalancer tells me that thereīs no jobs in the OGEīs queue.
Here comes all history:
1-Launched a 5-node cluster (Ubuntu HVM) cc1.x4large
-> Used mpich2 plugin (mpich2 v1.4.1 native)
2- Submitted an application job to OGE (mympiapp):
$ qsub -N Newaveinth -b y -pe orte 80 -cwd mpiexec -n 80 mympiapp
3- Checked the queue:
job-ID  prior   name       user         state submit/start at     queue
                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      2 0.55500 Newaveinth sgeadmin     r     01/23/2013 13:29:48
all.q_at_master                      80
sgeadmin_at_master:~/pmo0113$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
all.q_at_master                   BIP   0/16/16        3.06     linux-x64
      2 0.55500 Newaveinth sgeadmin     r     01/23/2013 13:29:48    16
---------------------------------------------------------------------------------
all.q_at_node001                  BIP   0/16/16        2.87     linux-x64
      2 0.55500 Newaveinth sgeadmin     r     01/23/2013 13:29:48    16
---------------------------------------------------------------------------------
            BIP   0/16/16        2.74     linux-x64
      2 0.55500 Newaveinth sgeadmin     r     01/23/2013 13:29:48    16
---------------------------------------------------------------------------------
all.q_at_node003                  BIP   0/16/16        2.86     linux-x64
      2 0.55500 Newaveinth sgeadmin     r     01/23/2013 13:29:48    16
---------------------------------------------------------------------------------
all.q_at_node004                  BIP   0/16/16        2.11     linux-x64
      2 0.55500 Newaveinth sgeadmin     r     01/23/2013 13:29:48    16
4- Tried to Load Balance the cluster launcedd with 5 nodes
ubuntu_at_ip-10-112-98-159:~$ starcluster loadbalance mycluster --max_nodes=6
StarCluster - (
http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
>>> Starting load balancer (Use ctrl-c to exit)
Maximum cluster size: 6
Minimum cluster size: 1
Cluster growth rate: 1 nodes/iteration
>>> Loading full job history
Execution hosts: 5
Queued jobs: 0
Avg job duration: 1699 secs
Avg job wait time: 8 secs
Last cluster modification time: 2013-01-23 13:32:33
>>> Cluster was modified less than 180 seconds ago
>>> Waiting for cluster to stabilize...
>>> Sleeping...(looping again in 60 secs)
>>> Loading full job history
Execution hosts: 5
Queued jobs: 0 (<-- Jobs equals a 0 ???)
Avg job duration: 1699 secs
Avg job wait time: 8 secs
Last cluster modification time: 2013-01-23 13:32:33
>>> Cluster was modified less than 180 seconds ago
>>> Waiting for cluster to stabilize...
>>> Sleeping...(looping again in 60 secs)
^C (<-- Ctrl-C and a lot of messages...)
Traceback (most recent call last):
  File "/usr/local/bin/starcluster", line 9, in <module>
    load_entry_point('StarCluster==0.93.3', 'console_scripts',
'starcluster')()
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py",
line 312, in main
    StarClusterCLI().main()
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py",
line 255, in main
    sc.execute(args)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/commands/loadbalance.py",
line 90, in execute
    lb.run(cluster)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 619, in run
    time.sleep(self.polling_interval)
KeyboardInterrupt
Exception in thread Thread-1 (most likely raised during interpreter
shutdown):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
  File
"/usr/local/lib/python2.7/dist-packages/ssh-1.7.13-py2.7.egg/ssh/transport.py",
line 1602, in run
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
'error'_at_node002
All the best,
Sergio
Received on Wed Jan 23 2013 - 08:39:58 EST