API Details

Authentication

The Canvas Data API makes use of HMAC authentication for API. This requires that each request individually be signed with a signature that is keyed with your API key and salted and signed with your API secret.

The API Key and Secret can be generated by any user who has the 'Download Flat Files' permission. See the portal page for details

The scheme to compute the signature is as follows:

HTTP_METHOD\n
Host_Header\n
Content-Type_header\n
Content-MD5_header\n
/path/to/resource\n
alphabetical=query&params=here\n
Date_header\n
API_secret

A base64 encoded SHA-256 HMAC digest signed with the API secret is then passed in the Authorization header and a date header like so:

Authorization: HMACAuth API_KEY:signature
Date: Thur, 25 Jun 2015 08:12:31 GMT

The Date header is required to be in RFC 7231 format (but ISO-8601 should work as well) and to be within 15 minutes of server time

For example, here is javascript/node.js code that creates creates the signature:

var crypto = require('crypto')
var url = require('url')
var HMAC_ALG = 'sha256'
var apiAuth = module.exports = {
  buildMessage: function(secret, timestamp, reqOpts) {
    var urlInfo = url.parse(reqOpts.path, true)
    var sortedParams = Object.keys(urlInfo.query).sort(function(a, b) {
      return a.localeCompare(b)
    })
    var sortedParts = []
    for (var i = 0; i < sortedParams.length; i++) {
      var paramName = sortedParams[i]
      sortedParts.push(paramName + '=' + urlInfo.query[paramName])
    }
    var parts = [
      reqOpts.method.toUpperCase(),
      reqOpts.host || '',
      reqOpts.contentType || '',
      reqOpts.contentMD5 || '',
      urlInfo.pathname,
      sortedParts.join('&') || '',
      timestamp,
      secret
    ]
    return parts.join('\n')
  },
  buildHmacSig: function(secret, timestamp, reqOpts) {
    var message = apiAuth.buildMessage(secret, timestamp, reqOpts)
    var hmac = crypto.createHmac(HMAC_ALG, new Buffer(secret))
    hmac.update(message)
    return hmac.digest('base64')
  }
}

API Routes

All current API routes are namespaced into the /api/account/(:accountId|self) namespace. Currently, with API tokens tied to accounts, the accountId parameter is redundant, but with future versions this should change. The 'self' identified can be used as a placeholder and will tie back to the primary accountId going forward

GET /api/account/(:accountId|self)/dump

Returns a list of canvas data dumps order by the most recent dumps first

Query Params:

after - retrieve dumps only after a given sequence number

limit - the number of dumps to retrieve, defaults to 50

Example Response:

[
  {
    dumpId: "uuid",
    sequence: 1234, // incrementing counter for each new dump, provides a strict ordering of dumps
    accountId: "customer_account_id",
    numFiles: 10, // the number of files/tables associated with the dump
    finished: true, // indicates whether the dump is completed or not
    expires: 12345678, // timestamp of when the dump will be pruned from the database
    updatedAt: "2015-10-24T00:00:00.000Z",
    createdAt: "2015-10-24T00:00:00.000Z",
    schemaVersion: "1.0.1"
  },
  ...
]

GET /api/account/(:accountId|self)/file/latest

Retrieves a list of expiring URLs for the latest dump that can be downloaded

Example Response:

{
  dumpId: "uuid",
  sequence: 1234, // incrementing counter for each new dump, provides a strict ordering of dumps
  accountId: "customer_account_id",
  numFiles: 10, // the number of files/tables associated with the dump
  finished: true, // indicates whether the dump is completed or not
  expires: 12345678, // timestamp of when the dump will be pruned from the database
  updatedAt: "2015-10-24T00:00:00.000Z",
  createdAt: "2015-10-24T00:00:00.000Z",
  artifactsByTable: {
    user_dim: {
      tableName: 'user_dim',
      partial: true, // false if a complete dump, true indiciates more files are needed for a complete dataset
      files: [
         {url: 'http://url_to_download/file1.tar.gz', filename: 'file1.tar.gz'}
      ]
    }
  }
}

GET /api/account/(:accountId|self)/file/byDump/:dumpId

Retrieve a list of expiring URLs for the given dump id (see GET /dump) that can be downloaded

Example Response:

See example above for 'file/latest' route

GET /api/account/(:accountId|self)/file/byTable/:tableName

Retrieve a list of expiring URLS for a given table ordered by the most recent dump first. This is useful for dealing with tables that are incremental in nature. This allows for looking back and finding the first complete dump and then only loading partial dumps following

Query Params:

after - retrieve dumps only after a given sequence number

limit - the number of dumps to retrieve, defaults to 50

Example Response:

{
  table: "user_dim",
  history: [
   {
     dumpId: 'uuid',
     sequence: 1234
     partial: true,
     files: [
        {url: 'http://url_to_download/file1.tar.gz', filename: 'file1.tar.gz'}
     ]
   },
   {
     dumpId: 'uuid',
     sequence: 1233,
     partial: false, // for a complete dataset, only need first complete dump and following partial dumps
     files: [
        {url: 'http://url_to_download/file1.tar.gz', filename: 'file1.tar.gz'}
     ]
   },
   ...
  ]
}

GET /api/account/(:accountId|self)/file/sync

Retrieve a list of files and signed URLs that constituite a complete snapshot of the current data, including all partial dumps up until the last full dump. The API returns filenames that are globally unique but stable, which allows for the client to only download files it doesn't have locally and ignore filenames it has downloaded previously. The incomplete flag is set if their is no backfill for partial tables. By implementing the following algorithmn, a client can get a full snapshot using only this API:

- Make a request to this API, for every file:
  - If the filename has been downloaded previously, do not download it
  - If the filename has not yet been downloaded, download it
- After all files have been processed, delete any local file that isn't in the list of files from the API

Example Response:

{
  "schemaVersion": "1.0.0",
  "incomplete": true,
  "files": [
    {"filename": "course_dim-00000-abcd123.gz", table: "course_dim", url: "http://....", "partial": false},
    {"filename": "account_dim-00000-abcd123.gz", table: "account_dim", url: "http://....", "partial": false},
    {"filename": "requests-00000-abcd123.gz", table: "requests", url: "http://....", "partial": false},
    {"filename": "requests-00001-abcd123.gz", table: "requests", url: "http://....", "partial": true}
  ]
}

GET /api/schema

Retrieve a list of the most recent schema versions, with the most recent schema first

Example Response:

[
 {
   version: "1.0.1",
   createdAt: "2015-10-27T21:27:31.834Z"
 },
 {
   version: "1.0.0",
   createdAt: "2015-10-24T21:24:27.000Z
 }
]

GET /api/schema/latest

Retrieve the most recent schema and its version.

Example Response

{
 "version": "1.0.1",
 "schema": {
    "course": {
      "dw_type": "dimension",
      "description": "A course in the canvas system",
      "columns": [
        {
          "type": "bigint",
          "description": "Unique surrogate id for a course",
          "name": "id"
        },
        {
          "type": "bigint",
          "description": "Primary key for this course in the canvas courses table.",
          "name": "canvas_id"
        },
        {
          "type": "bigint",
          "description": "The root account associated with this course.",
          "name": "root_account_id"
        },
        {
          "type": "bigint",
          "description": "The parent account for this course.",
          "name": "account_id",
          "dimension": {
            "name": "account",
            "id": "id",
            "role": "account"
          }
        },
        {
          "type": "timestamp",
          "description": "Timestamp when the course object was created in Canvas",
          "name": "created_at"
        },
        {
          "type": "boolean",
          "description": "True if the course is publicly visible",
          "name": "publicly_visible"
        },
        {
          "description": "Correlated id for the record for this course in the SIS system (assuming SIS integration is configured)",
          "type": "varchar",
          "length": "256",
          "name": "sis_source_id"
        },
        {
          "description": "Workflow status indicating the current state of the course, valid values are: completed, created, deleted, available, claimed",
          "type": "varchar",
          "length": "256",
          "name": "workflow_state"
        }
      ],
      "incremental": false,
      "tableName": "course_dim",
      "hints": {}
   },
   "account_dim": {...
   }
 }
}

GET /api/schema/:version

Retrieve the schema for the given version

Example Response

See the example for GET /api/schema/latest

Notes

Sequence Numbers

The sequence field is an increasing number which indicates the relative ordering of dumps. However, this numbers many not be continuous and result in gaps in the sequence number. Additionally, some api calls (such as GET /api/account/(:accountId|self)/file/byTable/:tableName) will return results from a subset of the dumps and may have additional gaps.