In this article
A guide intended for publishers and marketers who want to upload second-party data into Permutive. This is typically user cohort membership data that comes from an advertising partner or an advertising partner’s DMP. It could also be the publisher's first-party data on their subscribers or other identifiable users.
This type of data import can be used as a one-off data dump or a periodic import.
Self-sourced Audience Data will be sent to a GCS bucket. Permutive supports one GCS per customer, but each data source, which at Permutive is referred as data providers or audience set, is broken down into subfolders.
There are a couple of options depending on your intended upload method:
- Manual file upload: we will grant Google workspace group email addresses access to the GCS bucket so that you can upload files via the Google Cloud Platform Console or command-line tool. The account has to be recognized as associated with an active Google Account or Google Apps account. Using group email accounts allows you to manage user access, meaning if you want to add/remove users you do not need to reach out to us. If you do not have a group email address please let us know.
- Programmatic file upload: you will need to create a service account within your GCP project, which will be used to perform uploads to Permutive. In this case, please let us know the email address of your service account so that we can grant access.
Note: You can also provide both types of users to have access to the same bucket.
Each Audience Set will have its own taxonomy. The taxonomy defines the audience segment IDs being sent to Permutive, in order to create custom cohorts and view insights in the Permutive Dashboard. More information can be found here.
Note: Please contact Permutive Support with the following information and our team will provide you with a bucket and relevant access:
Google Cloud Project Service Account
In order to use Permutive’s Audience Data Import, you must have access to create your own Google Cloud Project Service Account. If that is not possible, you must have access to a Google workspace group email address.
Note: You can also provide both types of users to have access to the same bucket. Please see the overview section above for more information.
Name for your Audience Set/Data Provider
This can be a partner's name, if the data is from a self-sourced data partner such as
bombora_us, or an internal audience set name if the data is first-party, such as
subscriptionsPlease use just letters, numbers, and underscores. No spaces or other symbols.
User Identifier Type
Name of the user id tag to be used in matching.
Please be sure the identifier utilized is defined in your Identity Management list in the Permutive Dashboard, more info here.
Audience Set Lifetime (days)
Each project/data source has a lifetime, which trickles down to all segment ids defined in that source. Lifetime is the expiry time for users added to segments in this source. By default, the lifetime is 60 days. We do not support segment-level lifetime settings.
Taxonomy .csv file
The taxonomy file defines the various segment IDs being sent into the GCS file path. Taxonomy files must be in either Excel (.xlsx) or CSV format. You can find more information below.
Sending Data Into Permutive
Below is a breakdown of how data should be sent to Permutive, Taxonomy file format, and user matching.
Any audience data item included in your upload must have a user ID associated with it. In order for Permutive to tie imported data with a user landing on your site, the same user ID must be available on-site. This identifier must be defined in your identity management list in the dashboard. This could be a user ID picked up by one of our existing cookie syncs, or it could be your own internal user ID that you are using with permutive.identify. The ID must be a string containing up to 100 characters.
Permutive treats these external user IDs as identifiers, and each identifier has an identity type associated with it.
Example: Below, we've used a sample scenario where you have data collected on subscribers. This will only apply to visitors who have logged in. We assume that when a user logs in you will have an opportunity to execute code and will have some form of internal ID for that user (in the
user.id variable). You could then use the below code to associate the login ID with that visitor. We would be able to use the
subscriber_id as a user identifier in the audience data. Please refer to our Identity Framework Guide for more details.
Important: Ensure that the value passed as an id is never empty as these would cause ingestion errors and could make different users on the platform collapse into one.
For in-depth information about taxonomies, please refer to this support page.
A taxonomy defines the segment IDs being brought into the Permutive platform. You can update the taxonomy at any time in the future, adding, removing, or modifying cohorts as required.
Programmatically: We have an API available to use for existing project/data source taxonomy updates. Please find more information here (this will link to the dev page for the API)
via Zendesk Support: Please always send us the full current taxonomy, not just the changes made. This must be either Excel (.xlsx) or CSV format.
Example: We'll continue our subscribers' data scenario. Let's assume that you have the following data on each user, most of them optional:
- The country a user lives in
- Their income bracket
- List of interests
You would model this as follow, with each possible value as a separate cohort:
|0001||Country - France||Users living in France||0|
|0002||Country - Spain||Users living in Spain||0|
|0003||Income < $20,000||Having an income of less than $20,000||0|
|0004||Income $20,000 - $40,000||Having an income of between $20,000 and $40,000||0|
|0005||Interest - Travel||Those who are interested in traveling||0|
|0006||Interest - Musicals||Those who are interested in musicals||0|
|0007||Gender - Female||People that identify as Female||0|
|0008||Gender - Male||People that identify as Male||0|
Note: Fields with the asterisk (*) are required, the rest are optional. If you previously used to send in Lifetime (days), it's fine to keep it in the file but it is not necessary any longer.
You can find the example taxonomy file attached at the bottom of the page.
Each row in your file should describe a list of segments for a specific user. The row must be in the following format:
<USER ID><SPACE><SEGMENT IDS COMMA SEPERATED>
For example, an import of four segments against a user ID would appear as a single row in the data file, with a comma-separated list of segment IDs:
Note: Segment updates for an individual user are incremental. This means that if a user is already a member of other self-sourced audience segments from the same audience set, as a result of a previous upload, Permutive will append the new list of segments to the existing segments. User IDs and segment IDs must not contain spaces. Every line in your input should be terminated with a new line and there should be no enclosing quotes (or any other wrapping characters).
In our subscribers' data scenario, we have already modeled our data and are ready to prepare the file for upload.
We start with the following data in our database:
User 1 - subscriber-id: 76E5F445-1993, from Spain with Income $20,000 - $40,000
User 2 - subscriber-id:5E824DCF-2C6D, a male user interested in traveling
User 3 - subscriber-id:69E0985B-50C0, a female user from France, interested in musicals and traveling
User 4 - subscriber-id:2DABE6C1-07DD, a male user from France, interested in musicals, with Income < $20,000
This would be translated into the following structure, ready for the upload to the Permutive GCS:
76E5F445-1993 0002,0004,0007 5E824DCF-2C6D 0005,0008 69E0985B-50C0 0001,0005 69E0985B-50C0 0007 69E0985B-50C0 0006 2DABE6C1-07DD 0001,0006,0003
Please note that lines 3-5 describe the same user - they will be treated the same as:
You could also format each line to have a user id and just one cohort if this format is easiest for you.
Please email the file (or the first 100 lines if the file is larger than 5 MB) to email@example.com for format verification before uploading it to the bucket.
You can find the example upload file attached at the bottom of the page.
Important: Please ensure that the CPM is listed as '0' when importing self-sourced/first-party audience segments. There is no additional charge to import self-sourced data. However, if there is any number larger than zero listed for the CPM, you will be charged for that amount as part of third-party billing.
All data should be uploaded under a subdirectory in your bucket:
You will need a data provider ID for each self-sourced data partner you import data from. Please liaise with our support team to ensure your self-sourced data IDs are set up prior to beginning data uploads and to ensure you know the alias type for your uploads.
The exact URL will be provided after bucket and data provider/audience set creation.
For efficiency, data files should be uploaded in compressed format. Please gzip compress your files prior to uploading. Post compression, file names should end with a .gz extension.
To ensure seamless data ingestion, we impose some system limits for each file upload. If you think you will need to exceed these limits, please let us know.
|User IDs per file||10,000,000|
|Daily Data Volume||40 GB|
Note: Please be advised that the turnaround time for setting up your GCS bucket will be between 1-2 weeks.
Creating Cohorts with Second-party Data
Once our support team has confirmed the cohorts are available:
1. Navigate to the 'Audience' > 'Custom Cohorts' tab in the Permutive dashboard
2. Select '+ Add Cohort'
4. Set up any first-party rules
5. Choose '+OR/+AND' and then 'Second Party'
6. Search for and select the relevant second-party cohort
7. Save the cohort
If you have any questions, please contact customer support by emailing firstname.lastname@example.org or chat to the Customer Operations Team via the LiveChat icon in the bottom right corner of your screen.