Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Include Page
spaceKeyGrouper
pageTitleNavigation

Panel
borderColor#ccc
bgColor#FcFEFF
titleColorwhite
titleBGColor#00a400

Image Added  This topic is discussed in the "Grouper Maintenance" training video.

Grouper diagnostics provides a URL on Grouper WS and UI (in Grouper 2.2+) which will help to give the health of Grouper.  This can include memory in the WS server, connection to the Grouper Registry DB, that sources can perform queries, and that Grouper loader jobs are successfully executing.  If everything is ok, a 200 HTTP code will be returned, else 500.  A description of the issue will be returned as well.  The point is that this URL can by pointed to be web monitoring software like nagioNagios, big brotherBig Brother, BMC, etc.

There is general information displayed on success as well, the server name, number of WS requests (since server started), the last error (if recent), etc.

...

Each test is configurable to restrict it (without causing an error) in the grouper-ws.properties (grouper.properties in 2.2+).  If you want to customize the number of minutes since a SUCCESS should be detected in loader jobs, you can do that as well.  These settings are in the grouper-ws.properties (grouper.properties in 2.2+)

Note, there is a lot of intelligent caching here so that repeated hits do not do queries each time.

Sample configuration

Code Block

#if ignore tests.  Note, in job names, invalid chars need to be replaced with underscore (e.g. colon)
#anything in this regex: [^a-zA-Z0-9._-]
ws.diagnostic.ignore.memoryTest = false
ws.diagnostic.ignore.dbTest_grouper = false
ws.diagnostic.ignore.source_jdbc = false
ws.diagnostic.ignore.loader_CHANGE_LOG_changeLogTempToChangeLog = false
ws.diagnostic.ignore.loader_MAINTENANCE__grouperReport = false

#number of minute that can go by without a success before an error is thrown
ws.diagnostic.minutesSinceLastSuccess.loader_SQL_GROUP_LIST__aStem_aGroup2 = 60

Trivial option

Exclude/include jobs by URL param

You can includeOnly jobs in the URL by comma separated param (2.2.3+ and 2.2.2.api.patch.6)Use this to do checks often, or when there is a cluster, you can use this on all nodes, and a deeper check on one node only

https://url.to.grouper.edu/grouperWsgrouper/status?diagnosticType=trivial

Note, this is a success, but since there was an error recently, it is displayed

=daemonJobsOnly&includeOnly=loader_MAINTENANCE_cleanLogs,loader_CHANGE_LOG_consumer_syncGroups,loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19

Code Block
SUCCESS loader_CHANGE_LOG_changeLogTempToChangeLog: Loader job CHANGE_LOG_changeLogTempToChangeLog ignored in config since URL param contains includeOnly which doesn't have 'loader_CHANGE_LOG_changeLogTempToChangeLog' (46ms elapsed)
SUCCESS loader_MAINTENANCE_cleanLogs: Not checking, there was a success from before: 2016/01/31 11:45:13.000, expecting one in the last 3120 minutes (46ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_syncGroups: Not checking, there was a success from before: 2016/01/31 15:14:00.000, expecting one in the last 30 minutes (46ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_grouperRules: Loader job CHANGE_LOG_consumer_grouperRules ignored in config since URL param contains includeOnly which doesn't have 'loader_CHANGE_LOG_consumer_grouperRules' (46ms elapsed)
SUCCESS loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19: Not checking, there was a success from before: 2016/01/31 13:40:04
Code Block

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (11ms elapsed)


Diagnostics errors since start: 3 (11ms elapsed)
Last diagnostics error date: 2010/05/17 02:23:27
Last diagnostics error message:
There was an error in the diagnostic task DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog

:Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
java.lang.RuntimeException: Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 303120 minutes
	at edu.internet2.middleware.grouper.ws.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:103)
	at edu.internet2.middleware.grouper.ws.status.DiagnosticTask.executeTask(DiagnosticTask.java:44)
	at edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:129)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
	at java.lang.Thread.run(Thread.java:619)

DB option

This will do a lightweight query to the registry, and the memory query

https://url.to.grouper.edu/grouperWs/status?diagnosticType=db

Code Block

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (20ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (28ms elapsed)


Diagnostics errors since start: 3 (28ms elapsed)

Subject sources

(46ms elapsed)

You can exclude jobs in the URL by comma separated param (2.2.3+ and 2.2.2.api.patch.6)

https://url.to.grouper.edu/grouper/status?diagnosticType=daemonJobsOnly&exclude=loader_MAINTENANCE_cleanLogs,loader_CHANGE_LOG_consumer_syncGroups,loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19

Code Block
SUCCESS loader_CHANGE_LOG_changeLogTempToChangeLog: Not checking, there was a success from before: 2016/01/31 15:14:50.000, expecting one in the last 30 minutes (31ms elapsed)
SUCCESS loader_MAINTENANCE_cleanLogs: Loader job MAINTENANCE_cleanLogs ignored in config since URL param contains exclude which has 'loader_MAINTENANCE_cleanLogs' (31ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_syncGroups: Loader job CHANGE_LOG_consumer_syncGroups ignored in config since URL param contains exclude which has 'loader_CHANGE_LOG_consumer_syncGroups' (31ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_grouperRules: Not checking, there was a success from before: 2016/01/31 15:14:02.000, expecting one in the last 30 minutes (31ms elapsed)
SUCCESS loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19: Loader job SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19 ignored in config since URL param contains exclude which has 'loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19' (31ms elapsed)


Trivial option

Use this to do checks often, or when there is a cluster, you can use this on all nodes, and a deeper check on one node onlyThis will do a find by ID on all sources, and the DB test, and the memory test.  Note that the same sources.xml settings that configure the Grouper startup settings will apply here as well.  i.e. you can skip a source, or set the ID to search for.

https://url.to.grouper.edu/grouperWs/status?diagnosticType=sourcestrivial

Note, this is a success, but since there was an error recently, it is displayed

Code Block

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (37ms11ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (40ms

Diagnostics errors since start: 3 (11ms elapsed)
SUCCESS source_g:gsa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (42ms elapsed)
SUCCESS source_jdbc: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)
SUCCESS source_g:isa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)


Diagnostics errors since start: 3 (45ms elapsed)

Loader jobs

Last diagnostics error date: 2010/05/17 02:23:27
Last diagnostics error message:
There was an error in the diagnostic task DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog

:Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
java.lang.RuntimeException: Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
	at edu.internet2.middleware.grouper.ws.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:103)
	at edu.internet2.middleware.grouper.ws.status.DiagnosticTask.executeTask(DiagnosticTask.java:44)
	at edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:129)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
	at java.lang.Thread.run(Thread.java:619)

DB option

This will do a lightweight query to the registry, and the memory query

https://url.to.grouper.edu/grouperWs/status?diagnosticType=db

Code Block
Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (20ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (28ms elapsed)


Diagnostics errors since start: 3 (28ms elapsed)

Subject sources

This will do a find by ID on all sources, and the DB test, and the memory test.  Note that the same sources.xml settings in each source that configure the Grouper startup settings will apply here as well.  i.e. you can skip a source, or set the ID to search for.

Code Block
     <init-param> 
       <param-name>findSubjectByIdOnCheckConfig</param-name> 
       <param-value>true|false</param-value> 
     </init-param> 
     <init-param> 
       <param-name>subjectIdToFindOnCheckConfig</param-name> 
       <param-value>someSubjectIdWhichMightExistOrWhatever</param-value> 
     </init-param> 
     <init-param> 
       <param-name>findSubjectByIdentifiedOnCheckConfig</param-name> 
       <param-value>true|false</param-value> 
     </init-param> 
     <init-param> 
       <param-name>subjectIdentifierToFindOnCheckConfig</param-name> 
       <param-value>someSubjectIdentifierWhichMightExistOrWhatever</param-value> 
     </init-param> 
     <init-param> 
       <param-name>findSubjectByStringOnCheckConfig</param-name> 
       <param-value>true|false</param-value> 
     </init-param> 
     <init-param> 
       <param-name>stringToFindOnCheckConfig</param-name> 
       <param-value>someStringWhichMightExistOrWhatever</param-value> 
     </init-param>


https://url.to.grouper.edu/grouperWs/status?diagnosticType=sources

Code Block
Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (37ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (40ms elapsed)
SUCCESS source_g:gsa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (42ms elapsed)
SUCCESS source_jdbc: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)
SUCCESS source_g:isa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)


Diagnostics errors since start: 3 (45ms elapsed)

Daemon jobs

Note: grouper 2.2.3+ and 2.2.2.api.patch.6 has a diagnostic type of daemonJobsOnly where only daemon (and loader) jobs will be run.

https://url.to.grouper.edu/grouperWs/status?diagnosticType=daemonJobsOnly

Code Block
Server: mchyzer-pc, grouperVersion: 2.2.2, up since: 2016/01/31 15:14, 0 requests
SUCCESS loader_CHANGE_LOG_changeLogTempToChangeLog: Not checking, there was a success from before: 2016/01/31 15:14:50.000, expecting one in the last 30 minutes (65ms elapsed)
SUCCESS loader_MAINTENANCE_cleanLogs: Not checking, there was a success from before: 2016/01/31 11:45:13.000, expecting one in the last 3120 minutes (65ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_syncGroups: Not checking, there was a success from before: 2016/01/31 15:14:00.000, expecting one in the last 30 minutes (66ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_grouperRules: Not checking, there was a success from before: 2016/01/31 15:14:02.000, expecting one in the last 30 minutes (66ms elapsed)
SUCCESS loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19: Not checking, there was a success from before: 2016/01/31 13:40:04.000, expecting one in the last 3120 minutes (66ms elapsed)


Diagnostics errors since start: 0 (66ms elapsed)

"all" This will test all loader jobs (for a success within a certain threshold),  do a find by ID on all sources, and the DB test, and the memory test.  By default all loader jobs will look for a success within the last 25 hours.  The exception is change log jobs which look for a success within the last 30 minutes.  This is configurable in the grouper-ws.properties

https://url.to.grouper.edu/grouperWs/status?diagnosticType=all

Code Block

Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:45, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (6055ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (6076ms elapsed)
SUCCESS source_g:gsa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (6077ms elapsed)
SUCCESS source_jdbc: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (6091ms elapsed)
SUCCESS source_g:isa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (6091ms elapsed)
SUCCESS loader_CHANGE_LOG_changeLogTempToChangeLog: Loader job CHANGE_LOG_changeLogTempToChangeLog ignored in config (6091ms elapsed)
SUCCESS loader_MAINTENANCE__grouperReport: Loader job MAINTENANCE__grouperReport ignored in config (6091ms elapsed)
SUCCESS loader_MAINTENANCE_cleanLogs: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minuteslast 1500 minutes (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_chrisTest: Loader job CHANGE_LOG_consumer_chrisTest ignored in config (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_chrisTest: Loader job CHANGE_LOG_consumer_chrisTest ignored in config (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_chrisTestxmpp: Loader job CHANGE_LOG_consumer_chrisTestxmpp ignored in config (6122ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_xmpp: Loader job CHANGE_LOG_consumer_xmpp ignored in config (6122ms elapsed)
SUCCESS loader_SQL_GROUP_LIST_CHANGE_LOGaStem:aGroup2_consumer_xmpp: Loader job CHANGE_LOG_consumer_xmpp ignored in config (6122ms_f74068fd47124b079ea0c750354f6935: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6125ms elapsed)
SUCCESS loader_SQL_GROUP_LISTSIMPLE__aStem:aGroup2aGroup__f74068fd47124b079ea0c750354f6935a186d80e0fe946b78dba45d16a2a1be7: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6125ms6132ms elapsed)
SUCCESS loader_ATTR_SQL_SIMPLE__aStem:aGrouppenn:community:employee:orgPermissions:orgs__a186d80e0fe946b78dba45d16a2a1be7a8c2933dd66945af9755372efa9141b5: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6135ms elapsed)


Diagnostics errors since start: 0 (6132ms6135ms elapsed)
SUCCESS loader_ATTR_SQL_SIMPLE__penn:community:employee:orgPermissions:orgs__a8c2933dd66945af9755372efa9141b5: Found the most recent success: 2010/05/17 02:39:00.000, expecting one in the last 1500 minutes (6135ms elapsed)


Diagnostics errors since start: 0 (6135ms elapsed)

Here is an example of an error

Here is an example of an error

Code Block
HTTP Status 500 -

type Exception report

message

description The server encountered an internal error () that prevented it from fulfilling this request.

exception

java.lang.RuntimeException:
There was an error in the diagnostic task DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog

:Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
	edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:191)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

root cause
Code Block

HTTP Status 500 -

type Exception report

message

description The server encountered an internal error () that prevented it from fulfilling this request.

exception

java.lang.RuntimeException:
There wasCant anfind errora insuccess the diagnostic task DiagnosticLoaderJobTest, Loader job CHANGE_LOG_changeLogTempToChangeLog

:Cant find a success since: 2010/05/17 01:38:50.000, expecting one in the last 30 minutessince: 2010/05/17 01:38:50.000, expecting one in the last 30 minutes
	edu.internet2.middleware.grouper.ws.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:103)
	edu.internet2.middleware.grouper.ws.status.DiagnosticTask.executeTask(DiagnosticTask.java:44)
	edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:191129)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

root cause

java.lang.RuntimeException: Cant find a success since: 2010/05/17 01:38:50.000, expecting onenote The full stack trace of the root cause is available in the last 30 minutes
	edu.internet2.middleware.grouper.ws.status.DiagnosticLoaderJobTest.doTask(DiagnosticLoaderJobTest.java:103)
	edu.internet2.middleware.grouper.ws.status.DiagnosticTask.executeTask(DiagnosticTask.java:44)
	edu.internet2.middleware.grouper.ws.status.GrouperStatusServlet.doGet(GrouperStatusServlet.java:129)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

note The full stack trace of the root cause is available in the Apache Tomcat/6.0.20 logs.
Apache Tomcat/6.0.20 logs.

If you dont want this protected by authentication

On the demo server, this URL is protected:

https://grouperdemo.internet2.edu/grouper_v2_3/status

Because this URL is protected:

https://grouperdemo.internet2.edu/grouper_v2_3

This server uses Apache in front of tomcat with reverse proxy AJP, so to make the status servlet not protected, make another mapping in apache which is not protected:

Code Block
ProxyPass /status_grouper_v2_3/status ajp://localhost:8131/grouper_v2_3/status

that URL is not protected by authn, so it is unprotected:

https://grouperdemo.internet2.edu/status_grouper_v2_3/status?diagnosticType=all


See Also

Grouper Reportsda