Best Practice for Accessing OData

Best practice for downloading very large datasets

The OData API has reasonable limits to make sure that data extraction over the Internet is as smooth as possible and also to make sure that infrastructure on itslearning side is can always reply to requests.

Currently existing limits are:

  • number of rows accessible per entity table - 2000000 (two million), which means that only latest two million entities can be retrieved. Considering page size (see below) this implies, that maximum value for $skip query option is 1990000.

  • a page size of 10000 rows (ten thousand). To access the next page, use $skip query option or use the nextlink property attached to each request that contains paged data

  • "default" date limit on most of the fact tables: by default we return the last 30 days data. This can be changed by adding the filter to your query on Date/DateId field, e.g. like this:

$filter=(Date/DateId gt 20180101 and Date/DateId lt 20180201)

Best practice for downloading modified entities periodically

Most of the entities accessible through OData have T_ModifiedDate/T_CreatedDate property attached. These properties indicate the last date of modification of the entity or the date when entity was added to OData. These properties can be used to download only entities, that were modified in specified period of time. For example, to download Courses that were modified after 1th of June 2018 midnight, a following query can be specified:

$filter=T_ModifiedDate gt DateTime'2018-06-01'

Please be aware that transfer of large amounts of data over the Internet can be interrupted by possible time-outs on your ISP, server side or your client side infrastructure. We recommend that reason you should implement your processes to download data incrementally (e.g. data for the last day) and apply a "retry strategy" to make sure data download process results in success.