Backfill User Activity History
Beta
The Historical User Activity feature is currently in beta for the Analytics tool.
The Historical User Activity feature is currently in beta for the Analytics tool.
Existing data pipelines will continue running normally, but will only import new daily activity. To import your full historical backlog into your target database, you must perform this one-time manual backfill.
Required User Permissions:
Data Connector Access: Verified connection to your destination database, such as Snowflake, SQL Server, BigQuery, and more.
Prerequisites:
Time: The initial historical backfill can take anywhere from minutes to several hours, depending on your tenant size and account age. Schedule this operation during an off-peak window.
Compute Sizing: Temporarily scale up your data warehouse or compute instance for the initial load, then scale back down when finished. For example, upgrade a Snowflake warehouse from X-Small to Medium/Large.
Storage Requirements: Ensure the host machine has free disk space equal to or greater than twice the compressed size of the target table, as partition files are downloaded locally before loading to the database.
File Limits (macOS/Linux): Raise the shell file descriptor limit in your terminal before running the catch-up script by executing: ulimit -n 65536.
Log in to your target data destination interface.
Clear the existing watermark for the User Activity table by executing the corresponding command for your environment:
Snowflake
DELETE FROM <YOUR_DB>.<YOUR_SCHEMA>.PROCESSING_LOG
WHERE TABLE_NAME = 'user_activity';
SQL Server
DELETE FROM [<your_schema>].[processing_log]
WHERE table_name = 'user_activity';
BigQuery
DELETE FROM `<project>.<dataset>.processing_log`
WHERE table_name = 'user_activity';
For cloud storage locations like ADLS, S3, or Fabric Lakehouse, the processing_log file is located next to your data within your managed area. If you cannot locate the file path, check the setup guide for your specific data connector or reach out to Procore Support for additional assistance.
Run your Analytics Cloud Connector CLI script via your terminal or scheduling platform:
# example — replace with the invocation your scheduler uses
python ds_to_snowflake.py --config config.yaml
Monitor your execution console logs. The CLI will indicate destination-aware evaluation by logging:
APPEND partitioned load for user_activity: N event_date partition(s) in share,
M already in Snowflake, K to load.
Confirm that K matches the historical block of missing days you intend to populate.
Once the backfill completes, verify the earliest available records using the following row count query:
SELECT MIN(event_date), MAX(event_date), COUNT(*)
FROM <YOUR_DB>.<YOUR_SCHEMA>.USER_ACTIVITY;
Note: This row count query is an example for Snowflake. You can modify this query based on your destination database.
The MIN(event_date) value will now reflect your account's earliest global historical user activity rather than a rolling 30-day index.