In this article
A guide intended for publishers and marketers who want to upload second-party data into Permutive. This is typically user segment membership data that comes from an advertising partner or an advertising partner’s DMP. It could also be the publisher's first-party data on their subscribers or other identifiable users.
This type of data import can be used as a one-off data dump or a periodic import.
- Sending data into Permutive
- Taxonomy File Format
- Data File Format
- Data Upload
In order to use Permutive data imports, you’ll first need access to a Google Cloud Storage bucket for data uploads.
Permutive supports one GCS per customer, but each project/data source (we refer to them as data providers) is broken down into subfolders.
Please contact Permutive support with the following information and our team will provide you with a bucket and relevant access:
1. Users to have access to the bucket. There are a couple of options depending on your intended upload method:
- Manual file upload: we will grant your email address access to the GCS bucket so that you can upload files via the Google Cloud Platform console or command-line tool. The user has to be recognised as associated with an active Google Account or Google Apps account.
- Programmatic file upload: you will need to create a service account within your GCP project, which will be used to perform uploads to Permutive. In this case, please let us know the email address of your service account so that we can grant access.
Note: You can also provide both types of users to have access to the same bucket.
2. The name of the project/data source (this could be a partner's name or an internal project name if it's your data) - it is quite arbitrary and will form a part of your bucket's path. Please use just letters, numbers and an underscore (ie.
3. Name of the user id alias to be used in matching. If the data comes from an advertising partner then this will most likely be an AppNexus ID. When you are using your own data, users have to be identified via our identity framework, see User Matching below.
Sending data into Permutive
Data should be uploaded to your Permutive GCS bucket in the format described in this document. The Permutive platform will detect new uploads made to this bucket and immediately ingest new data into the platform.
Any data item included in your upload must have a user ID associated with it. In order for Permutive to tie imported data with a user landing on your site, the same user ID must be available on-site. This could be a user ID picked up by one of our existing cookie syncs, or it could be your own internal user ID that you are using with permutive.identify. The ID must be a string containing up to 100 characters.
Permutive treats these external user IDs as aliases, and each alias has an alias type associated with it. We typically rely on other third-party IDs such as the AppNexus ID to match Permutive users with second and third-party data that are sent into the platform. This alias has type Appnexus. If Permutive is not already picking up AppNexus IDs for your users, or if you’d like to send a different third-party ID in your data imports, please discuss this with our support team.
Below, we've used a sample scenario where you have data collected on subscribers. This will only apply to visitors who have logged in. We assume that when a user logs in you will have an opportunity to execute code and will have some form of internal ID for that user (in
user.id variable). You could then use this code to associate the login ID with that visitor:
We would be able to use the
subscriber_id as a user ID alias in the audience data.
Please do refer to our Identity Framework Guide for more details.
Note: Ensure that the value passed as an id is never empty as these would cause ingestion errors and could make different users on the platform collapse into one.
Taxonomy File Format
Prior to beginning uploads for a new second-party data provider, you must send us, via email, a segment taxonomy that describes the segment IDs you will be sending in your data import files. We are able to receive taxonomy files in either Excel (xlsx) or CSV format.
A taxonomy file should include the following information about each segment:
A unique identifier for the segment. This can be any alphanumeric string. This will be never displayed in the UI as it's only used by our platform. The best practice here would be to have a sequence (ie.
The display name for the segment. This will show in the Permutive dashboard. You may choose to organise your segments into categories, in which case category levels should be delimited by a hyphen. You will be able to update this value.
Description of the segment. This will show in the Permutive dashboard.
The CPM for the segment. This will be displayed in the dashboard, but it has no effect on how the segment functions. You can leave it as '0' if you do not need to see the value in the dashboard.
The expiry time for this segment. After that many days since the upload date, the users will fall out of this segment, unless a new data file is uploaded in the interim.
You can update the taxonomy at any time in the future, adding, removing or modifying segments as required. Please always send us the full current taxonomy, not just the changes you've made.
We'll continue our subscribers' data scenario. Let's assume that you have the following data on each user, most of them optional:
- The country a user lives in
- Their income bracket
- Their declared gender
- Subscription type (required value)
- List of interests
You would model this as follow, with each possible value as a separate segment:
|ID||Name||Description||CPM (USD)||Lifetime (days)|
|0001||Country - France||Users living in France||0||45|
|0002||Country - Spain||Users living in Spain||0||45|
|0003||Country - Germany||Users living in Germany||0||45|
|0004||Country - India||Users living in India||0||45|
|0005||Country - China||Users living in China||0||45|
|0006||Income < $20,000||Having an income of less than $20,000||0||60|
|0007||Income $20,000 - $40,000||Having an income of between $20,000 and $40,000||0||60|
|0008||Income > $40,000||Having an income of over $40,000||0||60|
|0009||Gender - Female||People that identify as Female||0||30|
|0010||Gender - Male||People that identify as Male||0||30|
|0011||Subscriber - Free||Non-paying subscribers||0||30|
|0012||Subscriber - Premium||Paying subscribers||0||30|
|0013||Interest - Cars||Those who are interested in cars.||0||30|
|0014||Interest - Travel||Those who are interested in travelling||0||30|
|0015||Interest - Musicals||Those who are interested in musicals||0||30|
Data File Format
Each row in your file should describe a list of second-party segments for a specific user. The row must be in the following format:
<USER ID><SPACE><SEGMENT IDS AS CSV>
For example, an import of four segments against a user ID would appear as a single row in the data file, with a comma-separated list of segment IDs:
Note: Segment updates for an individual user are incremental. This means that if a user is already a member of other second-party segments from the same provider, as a result of a previous upload, Permutive will append the new list of segments to the existing segments. User IDs and segment IDs must not contain spaces. Every line in your input should be terminated with a new line and there should be no enclosing quotes (or any other wrapping characters).
In our subscribers' data scenario, we have already modelled our data and are ready to prepare the file for upload.
We start with the following data in our database:
User 1 - subscriber-id: 76E5F445-1993, a premium subscriber from Spain with Income $20,000 - $40,000
User 2 - subscriber-id:5E824DCF-2C6D, a male user, free account
User 3 - subscriber-id:69E0985B-50C0, a female user from China, premium account, interested in cars and travelling
User 4 - subscriber-id:2DABE6C1-07DD, a male user from France, premium account, interested in cars, with Income > $40,000
This would be translated into the following structure, ready for the upload to the Permutive GCS:
Please note that lines 3-5 describe the same user - they will be treated the same as:
You could also format each line to have a user id and just one segment if this format is easiest for you.
All data should be uploaded under a 2p subdirectory in your bucket:
You will need a data provider ID for each second-party data partner you import data from. Please liaise with our support team to ensure your second-party data IDs are set up prior to beginning data uploads and to ensure you know the alias type for your uploads.
The exact URL will be provided after bucket and data provider creation.
For efficiency, data files should be uploaded in compressed format. Please gzip compress your files prior to upload. Post compression, file names should end with a .gz extension.
To ensure seamless data ingestion, we impose some system limits for each file upload. If you think you will need to exceed these limits, please let us know.
User IDs per file
Total daily data volume
If you have any questions, please contact customer support by emailing firstname.lastname@example.org or chat to the Customer Operations Team via the LiveChat icon in the bottom right corner of your screen.