Best practice for downloading very large datasets
The OData API has reasonable limits to make sure that data extraction over the Internet is as smooth as possible and also to make sure that infrastructure on itslearning side is can always reply to requests.
Currently existing limits are:
- Number of rows accessible per entity table - 2000000 (two million), which means that only latest two million entities can be retrieved. Considering page size (see below) this implies, that maximum value for $skip query option is 1990000.
- Page size - 10000 entities. If the endpoint exposes more for provided query, then data is being paged and the nextLink property is added to response, that allows to browse the data. It is not possible to override the page size limitation using $top or any other query. Tools like Excel or PowerBI might automatically handle paging, displaying all available entities for each query. To browse the data manually, use nextLink property of the response, or use $skip query option.
- "default" date limit on some of the fact tables: by default we return the last 30 days data. This can be changed by adding the following filter to your query (exactly as typed, but dates can of course be customised): $filter=(Date/DateId gt 20180101 and Date/DateId lt 20180201)
Any entity with the DateId field can be filtered. Following entities contain DateId field:
- CourseVisit
- LearningObjectiveAssessment
- LearningObjectiveCourseElement
- LearningObjectiveProgress
- UserCourseSession
- UserElementSession
- UserLogin
- UserSession
Best practice for downloading modified entities periodically
Most of the entities accessible through OData have T_ModifiedDate/T_CreatedDate property attached. These properties indicate the last date of modification of the entity or the date when entity was added to OData. These properties can be used to download only entities, that were modified in specified period of time. For example, to download Courses that were modified after 1th of June 2018 midnight, a following query can be specified:
$filter=T_ModifiedDate gt DateTime'2018-06-01'
Please be aware that transfer of large amounts of data over the Internet can be interrupted by possible time-outs on your ISP, server side or your client side infrastructure. We recommend that reason you should implement your processes to download data incrementally (e.g. data for the last day) and apply a "retry strategy" to make sure data download process results in success.
Example Filters
- Selecting LearningItemId, ElementSessionStartDate, ElementSessionEndDate of UserElementSessions:
/UserElementSessions?$select=LearningItemId,ElementSessionStartDate,ElementSessionEndDate
- Retrieving sessions made to specified LearningItemId before 20th of September 2018
/UserElementSessions?$filter=ElementSessionStartDate lt DateTime'2018-09-20'
- Selecting CourseElementId and ElementTitle of CourseElements
/CourseElements?$select=CourseElementId,ElementTitle
- Filtering by CourseElements that are Obligatory
/CourseElements?$filter=Obligatory eq 1
Date and Time Formats
Fields of Edm.DateTime type are represented in YYYY-MM-DDTHH:MM:SS format.
Fields referencing the Dates endpoint (DateId, MonthlyDateId) are numbers held in YYYYMMDD format.
Fields referencing the Times endpoint (TimeId) are numbers in HMM (9:00 AM or earlier) or HHMM (10:00 AM or later) format.
Note that all timestamps provided by our APIs are in Coordinated Universal Time (UTC). You should apply your local time offset when needed.