Introduction
Permutive’s Content Classification integrations allow you to automatically enrich your first-party data with structured, high-quality classifications. This enables you to categorise content consistently and accurately, helping you build relevant cohorts at scale across all your digital properties.
Permutive provides a growing catalog of classification providers, ensuring you have access to the best solution for your specific content and data requirements. Additionally, you can integrate with custom classification sources, such as in-house models or proprietary taxonomies, giving you flexibility in how you categorise and leverage your content.
With a no-code workflow, these integrations can be enabled and configured at the click of a button, eliminating the need for lengthy development cycles.
Concepts
Definitions
- Dimension Type: The classification dimensions available from a provider. Most providers classify against standard categories, like the IAB content taxonomies. Some providers also offer classifications such as sentiment or the extraction of keywords.
- Taxonomy: Category classifications have an associated taxonomy. The taxonomy defines all available categories and their hierarchy. Each provider implements at least one taxonomy, but some providers might be able to classify against more than one taxonomy system. Common examples include the IAB content taxonomy.
- Provider: All existing integrations are listed in the Catalog as providers. A provider can be a third-party content classification solution or a custom endpoint.
Data Model
All content classification data is stored against a common schema. Below is an overview of this structure:
"contextual":
"classifications" :{
"categories": [
{
"provider": "Provider Name",
"value": "201",
"confidence": 0.72,
"taxonomy": "iab_2_0",
"reason": "Classified because …"
}
],
"concepts": [
{
"provider": "Provider Name",
"value": "Permutive",
"confidence": 0.41,
}
],
"emotions": [...],
"entities": [...],
"keywords": [...],
"sentiments": [...]
}
}
- provider and value are always present
- taxonomy is only present (and mandatory) for categories; it can be a standard or custom taxonomy. The dashboard will use this value to display a human-readable name (e.g. show “Fine Art” instead of “201”)
- confidence is optional and a score between 0 and 1
- reason is optional and will contain a short sentence explaining why this classification was made
All dimension types (sentiment, keywords, …) follow the same schema as categories, with the exception of taxonomies which only apply to categories. Permutive automatically updates your event schema to include those properties. Only dimension types available through your enabled providers will be shown in the cohort builder.
Content Classifications Providers
Catalog
You can find the catalog of available integration under “Contextual > Catalog” in the Permutive dashboard:
Preview
You can preview how a provider classifies a URL by hovering over the tile and clicking on “Preview":
After entering a URL and clicking “Classify” you will see a table with classification results. You can filter the results against any of the available dimension types.
The following providers currently support previews:
Provider | Support |
IBM Watson | ⎷ |
TextRazor | ⎷ |
OS Data Solutions | ⎷, if provider is enabled |
Webhook | ⎷, if provider is enabled |
If the provider is enabled for your project, the preview modal will take your settings into account. For example, IBM Watson will use your XPath rules if you have configured them.
Enabling a provider
- To activate a provider included in your Permutive contract, please contact your Customer Success Manager (CSM)
- If you have a direct relationship with a content classification provider, you can use your API key by enabling the below toggle:
Configuration
The following provider configuration options which are available across all providers:
Setting | Description |
Requested Types | Specify which dimension types to request. |
Domains | Optimise your quota usage by restricting classifications to specific domains. The domains shown in this list are based on what you have defined under “Settings > Domains & App names” |
Quota |
Maximum number of classifications in a specified timeframe. You can only change this value when using your own API key. Otherwise this is managed by Permutive. Once the quota has been reached, no more content will be classified in the current time period. |
Selective Classifications Threshold | Set a minimum traffic threshold to classify only high-traffic URLs, optimizing quota usage. |
If your some of your content is behind a paywall, you can configure a URL parameter which will be appended to any classification request. You can then configure your CRM to give the crawler full access when this URL parameter is present, ensuring the classification service has full access and can provide high quality classifications. This is done by clicking on "Settings" on the Catalog page and entering your URL parameter as shown below:
Using Content Classifications in Cohorts
Contextual Cohorts
Once you have a content classification provider enabled, you will see a new option in the Contextual Cohort builder - “Content Classification”
Custom Cohorts
In the Custom Cohort builder you need to select “Content Classifications - {Dimension}” as a Pageview property. When selecting “Categories” you need to specify a taxonomy. You can optionally filter by minimum confidence and provider. When specifying a minimum confidence, please make sure your provider supports this for their classifications.
Available Providers
Provider | Status |
IBM Watson | ⎷ Available |
TextRazor | ⎷ Available |
OS Data Solutions | ⎷ Available |
Webhook | ⎷ Available |
Silverbullet 4D | Coming soon |
illuma | Coming soon |
IBM Watson
Overview
IBM Watson Natural Language Understanding uses deep learning to extract meaning and metadata from unstructured text data. Get underneath your data using text analytics to extract categories, classification, entities, keywords, sentiment and emotion.
Supported Types
- Categories
- Concepts
- Emotion
- Entities
- Keywords
- Sentiment
Supported Taxonomies
Provider-Specific Configurations
IBM Watson version | Select which Watson version to use. We recommend using the latest version, which maps to the IAB 2.0 taxonomy. Older versions use a Watson taxonomy. |
XPath | Improve classifications by specifying a selector for the article body. This makes sure other content, like featured articles, doesn’t influence classifications. If the XPath does not return any results for a URL, we issue a second classification request, omitting the XPath selector. This ensures that an invalid XPath doesn’t stop all classifications. |
TextRazor
Overview
TextRazor provides advanced text analytics and content classification using natural language processing and machine learning. It extracts entities, concepts and categories from unstructured text, enabling precise content understanding and contextual targeting.
Supported Types
- Categories
- Entities
- Concepts
Supported Taxonomies
Provider Specific Configurations
There is no provider specific configuration
OS Data Solutions
Overview
OS Data Solutions provides German publishers with advanced content classification based on an independent OVK standard. Their Contextual Classifier analyzes website content to assign IAB-standard categories and keyword-based segments, enabling precise contextual targeting without user data.
Supported Types
- Categories
- Concepts
- Emotion
- Entities
- Keywords
- Sentiment
Supported Taxonomies
- IAB 2.2 (excluding Sensitive Topics)
- BVDW
Provider-Specific Configurations
Private Key | Private key created for a service account to authenticate to GCP using Identity-Aware Proxy |
Client Email | Email address of the account that authenticates |
IAP Client ID | Identity-Aware Proxy identifier of the client accessing OSDS servers |
OSDS Server URL | URL of the OSDS classification service instance |
OS Data Solutions will provide you more details on the above values.
Webhook
Overview
The webhook integration allows you to connect to a custom content classification source.
Supported Types
The webhook integration generally supports all dimension types, but this ultimately depends on what your endpoint is returning to Permutive.
Supported Taxonomies
Provider-Specific Configurations
Endpoint | The webhook provider will call this endpoint to request classifications and taxonomies. |
Standard Taxonomies | List of standard taxonomies the endpoint might return in classifications. Can be IAB 2.0, IAB 2.2, IAB 3.0 |
The endpoint must be able to respond to two types of requests:
- Classifications
The Permutive Webhook integration makes the below classification request to your endpoint, where “url” is the content that gets classified:
POST / Body: { “type”: “classify”, “url”: “http://example.com” }
We expect a response in this format:
{
"classifications" : [
{
"value": "201",
"type": "categories",
"confidence": 0.72,
"taxonomy": "iab_2.0"
}
...
]
}
Below is a description of the fields included in the response:
Field | Type | Required | Description |
classifications | Array of objects | Yes | List containing all classifications for the current URL |
classifications[#].value | String | Yes | The classification value. If this is of type categories and you use a standard taxonomy, it has to exactly match the IAB category ID |
classifications[#].type | String | Yes | This must be any of the following values:
|
classifications[#].confidence | Number between 0 and 1 | No | Include this if you have a confidence rating for your classification |
classifications[#].taxonomy | String | Only for categories |
Only include this if the type is “categories” If you select “Standard Taxonomies” in the dashboard (see screenshot below), it has to match “iab_2.0”, “iab_2.2” or “iab_3.0” Alternatively it should match the ID of your custom taxonomy |
- Taxonomies
The Permutive Webhook integration makes the below request to retrieve any custom taxonomies. This is used for where “taxonomy” in the classification request points to a non-standard (i.e. custom) taxonomy:
POST / Body: { “type”: “taxonomies” }
We expect a response with this format:
[
{
"id": "my_custom_taxonomy",
"name": "My Custom Taxonomy",
"url": "http://example.com",
"values”: [
{
“id”: “10”,
“name”: “Books”,
“parent”: null
},
{
“id”: “123”,
“name”: “Comic Books”,
“parent”: “10”
}
]
}
...
]
If you don’t use a custom taxonomy, the request should return an empty array. Below is a description of the fields included in the response:
Field | Type | Required | Description |
id | String | Yes | A unique identifier for your taxonomy |
name | String | Yes | Display name of your taxonomy |
url | String | No | Optional, a URL with more information on your taxonomy |
values | Array of objects | Yes | The list of entries in your taxonomy |
values.id | String | Yes | The ID of an entry in your taxonomy; this must match the “value” you return for categories |
values.name | String | Yes | The display name of the category - this is what will be shown in the dashboard |
values.parent | String | No | Optional if this is a sub-category, the “id” of the parent category |
Custom Classifications
You can use the Webhook provider to import custom content classification to Permutive. Please contact your CSM to discuss alternative approaches.
Product Availability
Content Classifications are currently supported on the below platforms:
Web | iOS | Android | AMP | FIA | CTV |
⎷ | Coming soon | Coming soon | 𝘅 | 𝘅 | Depends on environment |
Comments
0 comments
Article is closed for comments.