Skip to main content
All CollectionsREST API
API Recipe: Extract All Tags from All Web Pages
API Recipe: Extract All Tags from All Web Pages
Luiza Gircoveanu avatar
Written by Luiza Gircoveanu
Updated over 8 months ago

Overview

Given an Audit Run, you can extract all the tags that each page initiated. Tags are defined by network calls made from the page that ObservePoint recognizes as a technology from the ObservePoint tag signature database.

Note: More information about each web page is available from the ObservePoint API, but this recipe only covers tags initiated on a page. Using this recipe as a starting point, you can expand your integration to include other information.

Step 1: Get your Audit ID and Run ID

You’ll need an Audit ID and Run ID to get started. There are multiple ways to do this, depending on your needs. The ObservePoint API is flexible, so you can use the approach that works best for you.

Here are 3 ways to get your Audit ID and Run ID:

  • From a webhook: If you have a webhook configured, the webhook payload will include Audit ID and Run ID. See the Webhooks section above for setting up webhooks.

  • Manually (good for one-time testing): You can find the Audit ID in the ObservePoint application under "Data Sources". Click on the Audit you want, note the Audit ID and Run ID in the address bar.

Tip: A common scenario is to download and store Audit Run IDs in a database you control. Later, when your code runs, it can first query your database for the most recently processed run ID, and then query the ObservePoint API for any to find the next run ID that hasn’t yet been downloaded.

Step 2: Download all the pages from the Audit Run:

Make an authenticated POST request to this URL with an empty request payload:

In your code, start with page=0, and make multiple requests, incrementing the page number for each request, until you have downloaded all the web pages for this run (see the “Pagination” section above for more instructions).

Each response will look like the following. Note that this example Audit Run scanned 544 web pages, and we requested a page size of 100, so there are 6 total pages to request.

{
"metadata": {
"pagination": {
"totalCount": 544,
"totalPageCount": 6,
"pageSize": 100,
"currentPageSize": 100,
"currentPageNumber": 0
}
},
"pages": [
{
"pageId": "77ebb089815a3b8d82813ebaf6320730",
"dataCollectionUuid": "77ebb089815a3b8d82813ebaf6320730",
"pageUrl": "http://example.com/",
"pageTitle": "Example Home Page",
"pageLoadTime": 665,
"pageStatusCode": 200,
"initialPageStatusCode": 200,
"finalPageStatusCode": 200,
"redirectCount": 0,
"size": 34363
},
{
"pageId": "139cf2a71e4ecb1967b7a5b47770e66a",
"dataCollectionUuid": "139cf2a71e4ecb1967b7a5b47770e66a",
"pageUrl": "http://example.com/path",
"pageTitle": "Example Web Page",
"pageLoadTime": 1621,
"pageStatusCode": 200,
"initialPageStatusCode": 200,
"finalPageStatusCode": 200,
"redirectCount": 0,
"size": 956436
},
...
]
}

In the next step, you will use the page ID field from each of the web page records you fetch above.

Step 3: For each web page, fetch its tags

From step 2, you have a list of page IDs (example: 139cf2a71e4ecb1967b7a5b47770e66a). The next step is to query the API for the network requests which ObservePoint captured on each page.

For each page, make an authenticated GET request to this URL:

Each response will look like this. Note that get Insights was enabled in the above call and returns with aggregated data about the tags within that page, namely the number of tags within the page, number of unique tags on the page, and number of broken tags on the page (defined as tags returned with statuses other than the 2xx or 3xx range.

{
"pageTagInsights": {
"noOfTagRequests": 18,
"noOfUniqueTags": 11,
"noOfBrokenTags": 1
},
"pageTags": [
{
"name": "Adobe Analytics",
"category": "Analytics",
"tagId": 1,
"pageTagInstances": [
{
"tagInstanceId": "7ccffa7c-9ceb-45d0-82ae-ce1dc5387547",
"account": "ncmecprod",
"bytes": 1065,
"loadTime": 258,
"duplicates": 0,
"multiples": 0,
"statusCode": 200,
"tagInstanceVariables": [
{
"name": "MID",
"value": "7ccffa7c-9ceb-45d0-82ae-ce1dc5387547"
},
{
"name": "v6",
"value": "desktop"
},
...
]
}
],
"tagRequestCount": 1,
"tagDuplicateRequestCount": 0,
"tagMultipleRequestCount": 0,
"tagUniqueRequestCount": 1
},
{
"name": "Crazy Egg",
"category": "Testing & Personalization",
"tagId": 144,
"pageTagInstances": [
{
"tagInstanceId": "9684cca1-94e6-4ff4-b63f-8783b0afb682",
"account": "None",
"bytes": 94,
"loadTime": 265,
"duplicates": 2,
"multiples": 0,
"statusCode": 200
},
{
"tagInstanceId": "c25bacc6-7cb8-4521-95af-cf7abc929c62",
"account": "None",
"bytes": 91,
"loadTime": 268,
"duplicates": 0,
"multiples": 4,
"statusCode": 200,
"tagInstanceVariables": [
...

This API endpoint returns these fields in its payload:

Field Name

Field Description

name

Name of the tag/technology identified

category

The category which that technology belongs (e.g. Analytics, Tag Management)

tagId

Identifier for tag signature within the ObservePoint Tag Database

pageTagInstances

An array of objects, each representing an instance of that tag/technology on the page

tagInstanceId

Unique identifier for that specific tag instance on that page on that run instance

account

Account or Reporting Suite of that tag; defined by the ObservePoint Tag Database

bytes

The size, in bytes, of the tag content

loadTime

The time in milliseconds that it took to complete this network request that resulted in the tag

duplicates

Total number of instances that have the same payload made to the same technology on that page

multiples

Total number of instances that have made to the same call to the technology on that page (not the same payload)

statusCode

The HTTP status code of the network request that resulted in tag instance

tagInstanceVariables

An array of objects, each representing a key/value pairs collected by the tag/technology

Conclusion

With all the web pages and tags/technologies downloaded for this Audit Run, you can store them in a database and report/visualize them as you like.

Did this answer your question?