If you are downloading a huge dataset but you need to break it into pages for performance reasons, the legacy UI-style paging can miss some records
The problem: imagine downloading a list of 30k employees in pages of 1000. When you get toward the last few pages, if an employee is removed from the beginning list, then the next page downloaded will skip someone who is an employee. So the result will not be complete and someone will get locked out or deprovisioned. Similarly if an employee is added while paging, you will get a duplicate (this might not have as negative of consequences).
Note, there is no cursor is memory or in the database. But it is called cursor-based paging and is lightweight.
Here is how it works:
- Sort by something ascending (its better if this is unique (e.g. the uuid), but doesnt need to be (e.g. lastUpdated))
- Get the first X records (first page)
- Find the last <sort> field that was returned in the list of X records
- Pass that value back, and get the second X records, after the last <sort> field. Note, if the sort field is unique then cursorFieldIncludesLastRetrieved is false ( greater than that field). If the sort field is not unique then cursorFieldIncludesLastRetrieved is true (greater than or equal to that field), and you can manually remove duplicates (since the last record value will be returned again).
- Loop and keep getting the last record's <sort> field value and get the next cursor page
Get first page of size 2 sorted by ID
ID Name a1 group0 b2 group6
See that the last ID returned is "b2". ID is unique. So get the next page of size 2 sorted by ID where lastCursorField is "b2" and cursorFieldIncludesLastRetrieved is false
ID Name c3 group2 d4 group8
See that the last ID returned is "d4". ID is unique. So get the next page of size 2 sorted by ID where lastCursorField is "d4" and cursorFieldIncludesLastRetrieved is false
ID Name e5 group1
- See that the number of records is less than the pageSize (2) so that is the last page. All 5 records were returned
To get all record from the legacy paging method, do this (assume page size of 1000) (assume less than 5 records will change while paging)
- Loop with 5 tries
- Get the first 1100 records
- Get the second page but with pageSize of 1095
- If there is no overlap of records, throw out the data and start over in outer loop
- If get through where last page has less than pageSize records, then done
- Throw out the overlap, and reduce the pageSize so there is some overlap
There is an algorithm in GcGetMembers.java that does this, e.g. (get 70k records with page sizes approx 100k, and overlap of 50
You need to sequentially cycle through records and you cannot get multiple pages at once, since you dont know the ID to start after
Web services changes
- Anywhere (any object or method) where there is "pageSize" and "pageNumber"
- Add: pageIsCursor (String) T|F default to F. if this is T then we are doing cursor paging
- Add: pageLastCursorField (String).
- Add: pageLastCursorFieldType: (String): could be: string, int, long, date, timestamp
- Add: pageCursorFieldIncludesLastRetrieved (String) T|F
- Note: the API changes are done, the only changes that should be needed are in the WS project
Convert the cursor field to whatever type it should be
Example of changing a WS method