Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Include Page
spaceKeyGrouper
pageTitleNavigation

...

borderColor#ccc
bgColor#FcFEFF
titleColorwhite
titleBGColor#00a400

...

Grouper

...

Grouper diagnostics provides a URL on Grouper WS and UI (in Grouper 2.2+) which will help to give the health of Grouper.  This can include memory in the WS server, connection to the Grouper Registry DB, that sources can perform queries, and that Grouper loader jobs are successfully executing.  If everything is ok, a 200 HTTP code will be returned, else 500.  A description of the issue will be returned as well.  The point is that this URL can by pointed to be web monitoring software like nagioNagios, big brotherBig Brother, BMC, etc.

There is general information displayed on success as well, the server name, number of WS requests (since server started), the last error (if recent), etc.

...

Note, there is a lot of intelligent caching here so that repeated hits do not do queries each time.

Sample configuration

In v4.10+ Grouper diagnostics will report success based on the schedule of the job.  Jobs that run every minute, hour, day, week, month, year makes threshold: 30 min, 150 min, 52 hours, 8 days, 33 days, and 367 days (unless there is an override in the config).

Sample configuration

Code Block
#if ignore tests.  Note, in job names, invalid chars need to be replaced with underscore (e.g. colon)
#anything in this regex: [^a-zA-Z0-9._-]
ws.diagnostic.ignore.memoryTest = false
ws.diagnostic.ignore.dbTest_grouper = false
ws.diagnostic.ignore.source_jdbc = false
ws.diagnostic.ignore.loader_CHANGE_LOG_changeLogTempToChangeLog = false
ws.diagnostic.ignore.loader_MAINTENANCE__grouperReport = false

#number of minute that can go by without a success before an error is thrown
ws.diagnostic.minutesSinceLastSuccess.loader_SQL_GROUP_LIST__aStem_aGroup2 = 60
# list groups which should check the size, in this case, "employee" or "students" in the key name is a variable
# {valueType: "group", required: true, regex: "^ws\\.diagnostic\\.checkGroupSize\\.([a-zA-Z0-9._-]+)\\.groupName$"}
#ws.diagnostic.checkGroupSize.students.groupName = community:students

# min group size of known groups
# {valueType: "integer", required: true, regex: "^ws\\.diagnostic\\.checkGroupSize\\.([a-zA-Z0-9._-]+)\\.minSize$"}
#ws.diagnostic.checkGroupSize.students.minSize = 18000

#if a change log consumer hasn't had a success but it is running and progress is being made, treat as a success
# {valueType: "boolean", required: true}
ws.diagnostic.successIfChangeLogConsumerProgress = true

# usdu daemon minutes since success 10 days
# {valueType: "integer"}
ws.diagnostic.minutesSinceLastSuccess.loader_OTHER_JOB_usduDaemon = 14400

# allow diagnostics from these IP ranges, e.g. 1.2.3.4/32 or 2.3.4.5/24, comma separated, leave blank if available from everywhere
# {valueType: "string", multiple: true}
ws.diagnostic.sourceIpAddresses = 

# if status details should be sent to the client or just logged
# {valueType: "boolean", required: true}
ws.diagnostic.sendDetailsInResponse = true
Code Block
#if ignore tests.  Note, in job names, invalid chars need to be replaced with underscore (e.g. colon)
#anything in this regex: [^a-zA-Z0-9._-]
ws.diagnostic.ignore.memoryTest = false
ws.diagnostic.ignore.dbTest_grouper = false
ws.diagnostic.ignore.source_jdbc = false
ws.diagnostic.ignore.loader_CHANGE_LOG_changeLogTempToChangeLog = false
ws.diagnostic.ignore.loader_MAINTENANCE__grouperReport = false

#number of minute that can go by without a success before an error is thrown
ws.diagnostic.minutesSinceLastSuccess.loader_SQL_GROUP_LIST__aStem_aGroup2 = 60

Exclude/include jobs by URL param

...

https://url.to.grouper.edu/grouper/status?diagnosticType=daemonJobsOnly&exclude=loader_MAINTENANCE_cleanLogs,loader_CHANGE_LOG_consumer_syncGroups,loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19

Code Block
SUCCESS loader_CHANGE_LOG_changeLogTempToChangeLog: Not checking, there was a success from before: 2016/01/31 15:14:50.000, expecting one in the last 30 minutes (31ms elapsed)
SUCCESS loader_MAINTENANCE_cleanLogs: Loader job MAINTENANCE_cleanLogs ignored in config since URL param contains exclude which has 'loader_MAINTENANCE_cleanLogs' (31ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_syncGroups: Loader job CHANGE_LOG_consumer_syncGroups ignored in config since URL param contains exclude which has 'loader_CHANGE_LOG_consumer_syncGroups' (31ms elapsed)
SUCCESS loader_CHANGE_LOG_consumer_grouperRules: Not checking, there was a success from before: 2016/01/31 15:14:02.000, expecting one in the last 30 minutes (31ms elapsed)
SUCCESS loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19: Loader job SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19 ignored in config since URL param contains exclude which has 'loader_SQL_SIMPLE__loader:owner__9178d7d636de49d6b271d12ca351dc19' (31ms elapsed)

 


Trivial option

Use this to do checks often, or when there is a cluster, you can use this on all nodes, and a deeper check on one node only

https://url.to.grouper.edu/grouperWsgrouper-ws/status?diagnosticType=trivial

Note, this is a success, but since there was an error recently, it is displayed

...

This will do a lightweight query to the registry, and the memory query

https://url.to.grouper.edu/grouperWsgrouper-ws/status?diagnosticType=db

Code Block
Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (20ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (28ms elapsed)


Diagnostics errors since start: 3 (28ms elapsed)

...

Code Block
     <init-param> 
       <param-name>findSubjectByIdOnCheckConfig</param-name> 
       <param-value>true|false</param-value> 
     </init-param> 
     <init-param> 
       <param-name>subjectIdToFindOnCheckConfig</param-name> 
       <param-value>someSubjectIdWhichMightExistOrWhatever</param-value> 
     </init-param> 
     <init-param> 
       <param-name>findSubjectByIdentifiedOnCheckConfig</param-name> 
       <param-value>true|false</param-value> 
     </init-param> 
     <init-param> 
       <param-name>subjectIdentifierToFindOnCheckConfig</param-name> 
       <param-value>someSubjectIdentifierWhichMightExistOrWhatever</param-value> 
     </init-param> 
     <init-param> 
       <param-name>findSubjectByStringOnCheckConfig</param-name> 
       <param-value>true|false</param-value> 
     </init-param> 
     <init-param> 
       <param-name>stringToFindOnCheckConfig</param-name> 
       <param-value>someStringWhichMightExistOrWhatever</param-value> 
     </init-param>

 


https://url.to.grouper.edu/grouperWsgrouper-ws/status?diagnosticType=sources

Code Block
Server: mchyzer-PC, grouperVersion: 1.6.0, up since: 2010/05/17 02:19, 0 requests
SUCCESS memoryTest: Allocating 100000 bytes to an array to make sure not out of memory (37ms elapsed)
SUCCESS dbTest_grouper: Retrieved object from database (40ms elapsed)
SUCCESS source_g:gsa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (42ms elapsed)
SUCCESS source_jdbc: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)
SUCCESS source_g:isa: Searched for subject by id: grouperTestSubjectByIdOnStartupASDFGHJ (45ms elapsed)


Diagnostics errors since start: 3 (45ms elapsed)

...

https://url.to.grouper.edu/grouperWsgrouper-ws/status?diagnosticType=daemonJobsOnly

...

https://url.to.grouper.edu/grouperWsgrouper-ws/status?diagnosticType=all

...

If you dont want this protected by authentication

Note: this works with apache, if you are not using apache (e.g. default in v5), then you cannot do this.

This URL is in the container and is not protected by default from shib: 

/status_grouper/status?diagnosticType=db

You can do this yourself, in On the demo server, this URL is protected:

...

https://grouperdemo.internet2.edu/status_grouper_v2_3/status?diagnosticType=all 


See Also

Grouper Report