Understand the data model of the Atlassian Data Lake

Each product has its own set of tables and columns in the Atlassian Data Lake. If you choose to include all data for your products, more columns are added to each of those tables.

Only data for Jira Software, Jira Service Management, Jira Work Management, Jira Product Discovery, Confluence, and Opsgenie are available in the Atlassian Data Lake at this time. Data for more products are coming soon.

The product tables capture product data in a star schema, meaning some tables refer to other tables. Because of this, you may need to join multiple queries to get the data you need in Atlassian Analytics.

Data freshness

For most tables, it takes at least half an hour for changes in your products to reflect in the Data Lake. This makes them especially useful for custom analysis, or when having the most up-to-date information is important.

However, for the following tables, it takes about three to six hours for changes to reflect:

jira_issue
jira_issue_cycle_time
jira_issue_field
jira_issue_status_history
jira_project

Dates and timestamps in the Data Lake

Keep in mind that all columns with dates and timestamps (for example, created_at, updated_at, and so on) are in the UTC time zone. To convert these to use a different time zone, either change your workspace time zone in your workspace settings or the individual dashboard’s time zone in its dashboard settings.

Naming conventions

Some tables and columns have certain words in their SQL names that indicate specific purposes.

Table names

_mapping

Any table that has this suffix in its SQL name stores foreign keys to other tables. Its main function is to bridge those tables to combine their data for analysis and insights.

For example, the opsgenie_alert_responder_mapping table in the Opsgenie schema is meant to show which responder type responded to an alert. It has foreign keys to the opsgenie_team, opsgenie_schedule, opsgenie_escalation, and atlassian_account tables.

_history or _history_

Any table with history in its SQL name stores all historical updates of a particular object. Most history tables will have a non-history counterpart table, which only stores the latest information for that object.

For example, the jira_issue_history table captures an update whenever Jira sends us any new data for the same issue (indicated by its ID). This is usually done on some particular event or state change like when the issue transitions to a different status. All of those events for the particular issue are stored in the history table. The non-history table, jira_issue, only stores the latest update to the issue.

Column names

_id

This suffix is reserved for primary keys and foreign keys. Use these columns to join tables together.

For example, in the jira_issue table, issue_id is the primary key for an issue object, and project_id is a foreign key that can be used to join this table to the jira_project table.

_by or account_id or _account_id

This suffixes indicate the column is a foreign key to the account table. The column stores the account identifier for actions performed by the account.

For example, the values in created_by in the confluence_page table are the account IDs of those who created pages.

The only exception is the account_id column in the account table, which is a primary key for that table.

_at or _until

These suffixes indicate the values will be timestamps—for example, created_at in the confluence_page table and snoozed_until in the opsgenie_alert table.

_ref

This suffix indicates the values will be in-product identifiers for a specific object—for example, issue_ref in the jira_issue table. This is not the same as the object’s ID (_id suffix), which is a unique identifier in the Data Lake and should be used for joins.

Was this helpful?

It wasn't accurate

It wasn't clear

It wasn't relevant