Saturday, July 9, 2016

When Mongo Ops Manager (mms) is not accessible..

Things to do..

MongoDB shell version: 3.0.6
connecting to: mongo-mms.myhost.com:27017/admin

have a look at the mongo processes running

ps aux | grep mongo
root      2595  0.5 30.4 38507360 10038044 ?   Sl    2015 2216:05 mongod
root      6576  0.0  0.0  64004   744 ?        S     2015   0:00 /bin/bash /etc/init.d/mongodb-mms start
mongod    6585  1.6  8.8 6388796 2932392 ?     Sl    2015 6163:38 /opt/mongodb/mms/jdk/bin/mms-app -d64 -Xss228k -Xmx4352m -Xms4352m -XX:NewSize=600m -Xmn1500m -XX:ReservedCodeCacheSize=128m -XX:-OmitStackTraceInFastThrow -Dxgen.webServerGzipEnabled=true -Duser.timezone=GMT -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Dsun.net.client.defaultReadTimeout=20000 -Dsun.net.client.defaultConnectTimeout=10000 -Dorg.eclipse.jetty.util.UrlEncoding.charset=UTF-8 -Dorg.eclipse.jetty.server.Request.maxFormContentSize=4194304 -Dserver-env=hosted -Dapp-id=mms -Dbase-port=8080 -Dbase-ssl-port=8443 -Dapp-dir=/opt/mongodb/mms -Dxgen.webServerReuseAddress=true -Dmms.keyfile=/etc/mongodb-mms/gen.key -XX:SurvivorRatio=12 -XX:MaxTenuringThreshold=15 -XX:CMSInitiatingOccupancyFraction=62 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseBiasedLocking -XX:+CMSParallelRemarkEnabled -XX:-OmitStackTraceInFastThrow -classpath /opt/mongodb/mms/classes/mms.jar:/opt/mongodb/mms/agent:/opt/mongodb/mms/agent/backup:/opt/mongodb/mms/agent/monitoring:/opt/mongodb/mms/agent/automation:/opt/mongodb/mms/data/unit:/opt/mongodb/mms/conf/:/opt/mongodb/mms/lib/* -Dlog_path=/opt/mongodb/mms/logs/mms0 -Dinstance-id=0 com.xgen.svc.core.ServerMain
root     14319  0.0  0.0  64004   840 pts/0    S    14:44   0:00 /bin/bash /opt/mongodb/mms/bin/mongodb-backup-http start
mongod   14328 35.7  3.0 3790304 1006852 pts/0 Sl   14:44   1:36 /opt/mongodb/mms/jdk/bin/mms-app -d64 -Xss228k -Xmx2048m -Xms2048m -XX:NewSize=512m -Xmn786m -XX:ReservedCodeCacheSize=128m -XX:-OmitStackTraceInFastThrow -Duser.timezone=GMT -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Dsun.net.client.defaultReadTimeout=20000 -Dsun.net.client.defaultConnectTimeout=10000 -Dorg.eclipse.jetty.util.UrlEncoding.charset=UTF-8 -Dorg.eclipse.jetty.server.Request.maxFormContentSize=4194304 -Dserver-env=hosted -Dapp-id=bslurp -Dbase-port=8081 -DBSLURP.DEBUG.PORT=8091 -Dbase-ssl-port=8444 -Dapp-dir=/opt/mongodb/mms -Dmms.extraPropFile=conf-mms.properties -Dxgen.webServerReuseAddress=true -Dmms.backup.enableBlockstoreSharding=false -Dmms.keyfile=/etc/mongodb-mms/gen.key -XX:SurvivorRatio=32 -XX:TargetSurvivorRatio=60 -XX:MaxTenuringThreshold=15 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseBiasedLocking -XX:MaxGCPauseMillis=10 -XX:+CMSParallelRemarkEnabled -XX:-OmitStackTraceInFastThrow -classpath /opt/mongodb/mms/classes/mms.jar:/opt/mongodb/mms/agent:/opt/mongodb/mms/backup-agent-go:/opt/mongodb/mms/data/unit:/opt/mongodb/mms/conf/:/opt/mongodb/mms/lib/* -Dlog_path=/opt/mongodb/mms/logs/backup-http-server0 -Dinstance-id=0 com.xgen.svc.core.ServerMain
root     16474  0.0  0.0  61224   792 pts/0    S+   14:48   0:00 grep mongo
root     22491  0.0  0.0 137316  2432 ?        S     2015   0:00 su -s /bin/bash mongod -c /usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config
mongod   22492  0.1  0.0 334468 22980 ?        Ssl   2015 577:27 /usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config

check if there are any permission issues (ls -la /opt/mongodb/mms/ & ls -la /opt/mongodb/mms/tmp/) so we can rule out any permissions issues.

[root@mongo-mms bin]# ls -la /opt/mongodb/mms/
total 636
drwxr-xr-x 11 mongod mongod   4096 Oct  8  2015 .
drwxr-xr-x  3 root   root     4096 Oct  8  2015 ..
drwxr-xr-x  5 mongod mongod   4096 Oct  8  2015 agent
drwxr-xr-x  2 mongod mongod   4096 Oct  8  2015 bin
drwxr-xr-x  2 mongod mongod   4096 Oct  8  2015 classes
drwxr-xr-x  2 mongod mongod   4096 Oct 20  2015 conf
drwxr-xr-x  8 mongod mongod   4096 Dec 17  2014 jdk
drwxr-xr-x  2 mongod mongod  12288 Oct  8  2015 lib
drwxr-xr-x  2 mongod mongod   4096 Jul  8 14:44 logs
-rw-r--r--  1 mongod mongod   5507 Jun 22  2015 MMS-MONGODB-MIB.txt
drwxr-xr-x  2 mongod mongod   4096 Oct  8  2015 mongodb-releases
-rw-r--r--  1 mongod mongod     79 Jun 22  2015 README
-rw-r--r--  1 mongod mongod 575800 Jun 22  2015 THIRD-PARTY-NOTICES
drwxr-xr-x  4 mongod mongod   4096 Jul  8 14:44 tmp
-rw-r--r--  1 mongod mongod     50 Jun 22  2015 VERSION

[root@mongo-mms bin]# ls -la /opt/mongodb/mms/tmp/
total 24
drwxr-xr-x  4 mongod mongod 4096 Jul  8 14:44 .
drwxr-xr-x 11 mongod mongod 4096 Oct  8  2015 ..
-rw-r--r--  1 mongod mongod    5 Jul  8 14:44 bslurp-0.pid
drwxr-xr-x  4 mongod mongod 4096 Jul  8 14:44 bslurp-jetty-tmp-0
-rw-r--r--  1 mongod mongod    5 Jul  8 14:42 mms-0.pid
drwxr-xr-x  4 mongod mongod 4096 Jul  8 14:43 mms-jetty-tmp-0

Check the contents of /opt/mongodb/mms/conf/ so we can rule out any configuration issue.

The Ops Manager logs from /opt/mongodb/mms/logs/mms0.log so we can see if there's further information.

Please NOTE that April, May, June log files were missing, probably because mms was down)
-rw-r--r-- 1 mongod mongod     26122 Mar 22 17:00 mms0-access.20160322.log.gz
-rw-r--r-- 1 mongod mongod      1813 Mar 22 17:04 mms0.20160322.log.gz
-rw-r--r-- 1 mongod mongod     26428 Mar 23 17:00 mms0-access.20160323.log.gz
-rw-r--r-- 1 mongod mongod      1808 Mar 23 17:04 mms0.20160323.log.gz
-rw-r--r-- 1 mongod mongod     12313 Mar 24 04:26 backup-http-server0-access.20160228.log.gz
-rw-r--r-- 1 mongod mongod     52349 Mar 24 04:33 backup-http-server0-access.log
-rw-r--r-- 1 mongod mongod     30433 Mar 24 17:00 mms0-access.20160324.log.gz
-rw-r--r-- 1 mongod mongod      1809 Mar 24 17:04 mms0.20160324.log.gz
-rw-r--r-- 1 mongod mongod     25351 Mar 25 17:00 mms0-access.20160325.log.gz
-rw-r--r-- 1 mongod mongod      1813 Mar 25 17:04 mms0.20160325.log.gz
-rw-r--r-- 1 mongod mongod     25182 Mar 26 17:00 mms0-access.20160326.log.gz
-rw-r--r-- 1 mongod mongod      1827 Mar 26 17:04 mms0.20160326.log.gz
-rw-r--r-- 1 mongod mongod     25240 Mar 27 17:00 mms0-access.20160327.log.gz
-rw-r--r-- 1 mongod mongod      1815 Mar 27 17:04 mms0.20160327.log.gz
-rw-r--r-- 1 mongod mongod     25780 Mar 28 17:00 mms0-access.20160328.log.gz
-rw-r--r-- 1 mongod mongod      1813 Mar 28 17:04 mms0.20160328.log.gz
-rw-r--r-- 1 mongod mongod     26001 Mar 29 17:00 mms0-access.20160329.log.gz
-rw-r--r-- 1 mongod mongod      2790 Mar 29 17:04 mms0.20160329.log.gz
-rw-r--r-- 1 mongod mongod       457 Mar 30 01:33 backup-http-server0.20151020.log.gz
-rw-r--r-- 1 mongod root      127672 Jul  8 14:42 mms-migration.log
-rw-r--r-- 1 mongod mongod      9242 Jul  8 14:44 mms0-access.20160330.log.gz
-rw-r--r-- 1 mongod mongod 306260056 Jul  8 14:44 mms0.log
-rw-r--r-- 1 mongod mongod       396 Jul  8 14:44 mms0-access.log
-rw-r--r-- 1 mongod mongod      4746 Jul  8 14:44 backup-http-server0.log

check the status of mms & mms agent

[root@mongo-mms bin]# /etc/init.d/mongodb-mms-monitoring-agent status
mongodb-mms-monitoring-agent is running

[root@mongo-mms bin]# /etc/init.d/mongod status
mongod (pid 2595) is running...

[root@mongo-mms bin]# /etc/init.d/mongodb-mms status
Check MMS status
   Probing instance 0...                                   [FAILED]
     PID file not found: /opt/mongodb/mms/tmp/mms-0.pid
Check Backup HTTP Server status
   Probing instance 0...                                   [FAILED]
     PID file not found: /opt/mongodb/mms/tmp/bslurp-0.pid
[root@mongo-mms bin]# /etc/init.d/mongodb-mms start
Starting pre-flight checks
Successfully finished pre-flight checks

Migrate MMS data
   Running migrations...                                   [  OK  ]
Start MMS server
   Instance 0 starting....................                 [  OK  ]
Start Backup HTTP Server
   Instance 0 starting...........                          [  OK  ]

[root@mongo-mms bin]# /etc/init.d/mongodb-mms status
Check MMS status
   Probing instance 0...                                   [FAILED]
      The instance is not running.
Check Backup HTTP Server status
   Probing instance 0...                                   [  OK  ]
      The instance is running.


lets see if we can start it up..

      [root@mongo-mms bin]# /etc/init.d/mongodb-mms start
      Starting pre-flight checks
      Successfully finished pre-flight checks
     
      Migrate MMS data
         Running migrations...                                   [  OK  ]
      Start MMS server
         Instance 0 starting...................                  [  OK  ]
      Start Backup HTTP Server
         Instance 0 is already running                           [FAILED]
      [root@mongo-mms bin]# /etc/init.d/mongodb-mms status
      Check MMS status
         Probing instance 0...                                   [FAILED]
            The instance is not running.
      Check Backup HTTP Server status
         Probing instance 0...                                   [  OK  ]
            The instance is running.

Looking at the Ops Manager logs, it looks like Ops Manager won't start because it thinks it's already running:

2016-07-08T19:16:44.348+0000 [main] ERROR com.xgen.svc.mms.svc.snmp.SnmpTrapAgentSvcProvider [get:30] - com.xgen.mms:class=Snmp,protocol=Adaptor,port=11611
javax.management.InstanceAlreadyExistsException: com.xgen.mms:class=Snmp,protocol=Adaptor,port=11611

The ps output appears to confirm this.

If Ops Manager is already running and it's working, then there's no need to do anything else. To see if this is true, please look at the file /opt/mongodb/mms/conf/conf-mms.properties. You'll see a line that says

mms.centralUrl=http://mongo-mms.myhost.com:8080

If you open that URL in your browser, and see below message,

This site can’t be reached

mongo-mms.myhost.com refused to connect.
Try:
Reloading the page
Checking the connection
Checking the proxy and the firewall

Then, try below - it worked for me.

sudo kill 6585 14319 14328 (Note that these are the process ids that we collected at the start of our document)

Then try to re-run
sudo service mongodb-mms start

and try http://mongo-mms.myhost.com:8080 in your browser

Friday, July 1, 2016

Before Cassandra Installation

Pre-Reqs for dse 4.6.8 Cassandra installation

1. need a higher version of Java 

Details :

#If you have :
cassand01 ~]$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

Needed version is:
java version "1.8.0_74"
Java(TM) SE Runtime Environment (build 1.8.0_74-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.74-b02, mixed mode)



2. Check /etc/hosts - it shoudn't have garbage. Make sure this file is good

3. make sure no firewall issues

[root@cassan02 .cct]# telnet cassan03.myblog.com 9042
Trying 10.xx.xx.xx ...

telnet: connect to address 10.xx.xx.xx: No route to host

work with your system admin to resolve the firewall issue

details on ports : https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/sec/secConfFirePort.html

4. Make sure all the dependencies (rpms are downloaded and applied)

libselinux-2.0.94-5.8.el6.x86_64
libselinux-utils-2.0.94-5.8.el6.x86_64
selinux-policy-targeted-3.7.19-260.el6.noarch
libselinux-python-2.0.94-5.8.el6.x86_64
util-linux-ng-2.17.2-12.18.el6.x86_64
selinux-policy-3.7.19-260.el6.noarch

NOTE: there can be more out there. Just depending on the OS version and os flavor, it can vary..

5. set swappiness -> /sbin/sysctl vm.swappiness=0

6. set zone_reclaim_mode -> echo 0 > /proc/sys/vm/zone_reclaim_mode


7. set vm.max_map_count -> value 131072

Unable to gossip with any seeds - Cassandra Issues






Unable to gossip with any seeds


it appears that the nodes are not able to connect to each other


as in there is probably a firewall/port issue here





Check :
[root@cassan02 .cct]# telnet cassan03.myblog.com 7199
Trying 10.24.8.36...
telnet: connect to address 10.xx.xx.xx: No route to host
[root@cassan02 .cct]#
[root@cassan02 .cct]# telnet cassan03.myblog.com 9042
Trying 10.xx.xx.xx ...

telnet: connect to address 10.xx.xx.xx: No route to host

work with your system admin to resolve the firewall issue

Probable error messages you will see from /var/log/cassandra/system.log

INFO  [main] 2016-07-01 06:24:54,569  OutboundTcpConnection.java:97 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [ScheduledTasks:1] 2016-07-01 06:24:57,731  TokenMetadata.java:433 - Updating topology for all endpoints that have changed
ERROR [main] 2016-07-01 06:25:25,581  CassandraDaemon.java:581 - Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1345) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:541) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:794) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:726) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:389) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:336) ~[dse-core-4.8.6.jar:4.8.6]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at com.datastax.bdp.DseModule.main(DseModule.java:74) [dse-core-4.8.6.jar:4.8.6]
INFO  [Daemon shutdown] 2016-07-01 06:25:25,582  DseDaemon.java:420 - DSE shutting down...
WARN  [StorageServiceShutdownHook] 2016-07-01 06:25:25,582  Gossiper.java:1462 - No local state or state is in silent shutdown, not announcing shutdown
INFO  [Daemon shutdown] 2016-07-01 06:25:25,582  PluginManager.java:106 - All plugins are stopped.
INFO  [StorageServiceShutdownHook] 2016-07-01 06:25:25,583  MessagingService.java:734 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.24.8.36] 2016-07-01 06:25:25,583  MessagingService.java:1018 - MessagingService has terminated the accept() thread
ERROR [Daemon shutdown] 2016-07-01 06:25:25,587  CassandraDaemon.java:229 - Exception in thread Thread[Daemon shutdown,5,main]
java.lang.AssertionError: null
at org.apache.cassandra.gms.Gossiper.addLocalApplicationStateInternal(Gossiper.java:1415) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1439) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1429) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
at com.datastax.bdp.gms.DseState.setActiveStatusSync(DseState.java:252) ~[dse-core-4.8.6.jar:4.8.6]
at com.datastax.bdp.server.DseDaemon.preStop(DseDaemon.java:428) ~[dse-core-4.8.6.jar:4.8.6]
at com.datastax.bdp.server.DseDaemon.safeStop(DseDaemon.java:438) ~[dse-core-4.8.6.jar:4.8.6]
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:684) ~[dse-core-4.8.6.jar:4.8.6]

at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_74]