Understand the data model of the Atlassian Data Lake

Each app has its own set of tables and columns in the Atlassian Data Lake. If you choose to include all data for your apps, more columns are added to each of those tables.

Only data for Jira, Jira Service Management, Jira Product Discovery, Confluence, Talent, Focus, Goals, and Atlassian Projects are available in the Atlassian Data Lake at this time. Data for more apps are coming soon.

The app tables capture app data in a star schema, meaning some tables refer to other tables. Because of this, you may need to join multiple queries to get the data you need in Atlassian Analytics.

Data model for data shares

Data shares let you to connect the Atlassian Data Lake to your organization’s environments or third-party tools. However, the data model for data shares is different from the one that’s available in Atlassian Analytics. Read more about the data model for data shares.

Data freshness

For most tables, it can take an average of 90 minutes for changes in your apps to reflect in the Data Lake. This makes them especially useful for custom analysis, or when having the most up-to-date information is important.

However, the following tables take about 5 to 8 hours for app changes to reflect:

goal_hierarchy
jira_issue
jira_issue_cycle_time
jira_issue_field
jira_issue_status_history
jira_project

Live Jira tables

The Jira tables mentioned above (except for jira_issue_cycle_time) also have corresponding “live” tables, which take up to 90 minutes to reflect changes:

jira_issue_live
jira_issue_field_live
jira_issue_status_history_live
jira_project_live

Keep in mind that while these tables have fresher data, they also have much slower query performance. This is because the tables are calculated whenever the data is queried, unlike the corresponding non-”live” tables that use a pre-calculated snapshot of the data.

Dates and timestamps in the Data Lake

Keep in mind that all columns with dates and timestamps (for example, created_at, updated_at, and so on) are in the UTC time zone. To convert these to use a different time zone, either change your workspace time zone in your workspace settings or the individual dashboard’s time zone in its dashboard settings.

Naming conventions

Some tables and columns have certain words in their SQL names that indicate specific purposes.

Table names

_mapping

Any table that has this suffix in its SQL name stores foreign keys to other tables. Its main function is to bridge those tables to combine their data for analysis and insights.

For example, the opsgenie_alert_responder_mapping table in the Opsgenie schema is meant to show which responder type responded to an alert. It has foreign keys to the opsgenie_team, opsgenie_schedule, opsgenie_escalation, and atlassian_account tables.

_history or _history_

Any table with history in its SQL name stores all historical updates of a particular object. Most history tables will have a non-history counterpart table, which only stores the latest information for that object.

For example, the jira_issue_history table captures an update whenever Jira sends us any new data for the same work item (indicated by its ID). This is usually done on some particular event or state change like when the work item transitions to a different status. All of those events for the particular work item are stored in the history table. The non-history table, jira_issue, only stores the latest update to the work item.

Column names

_id

This suffix is reserved for primary keys and foreign keys. Use these columns to join tables together.

For example, in the jira_issue table, issue_id is the primary key for a work item object, and project_id is a foreign key that can be used to join this table to the jira_project table.

_by or account_id or _account_id

This suffixes indicate the column is a foreign key to the account table. The column stores the account identifier for actions performed by the account.

For example, the values in created_by in the confluence_page table are the account IDs of those who created pages.

The only exception is the account_id column in the account table, which is a primary key for that table.

_at or _until

These suffixes indicate the values will be timestamps—for example, created_at in the confluence_page table and snoozed_until in the opsgenie_alert table.

_ref

This suffix indicates the values will be in-app identifiers for a specific object—for example, issue_ref in the jira_issue table. This is not the same as the object’s ID (_id suffix), which is a unique identifier in the Data Lake and should be used for joins.

Was this helpful?

It wasn't accurateIt wasn't clearIt wasn't relevant

Still need help?

The Atlassian Community is here for you.

Ask the Community