API Recipe: Extract All Network Requests From All Web Pages

Overview

Given an Audit Run, you can extract all the network requests that each page generated, including each request’s URL, status code, size, mime type, and response time.

Note: More information about each web page is available from the ObservePoint API, but this recipe only covers page network requests. Using this recipe as a starting point, you can expand your integration to include other information

Implementation

Step 1: Get your Audit ID and Run ID

You’ll need an Audit ID and Run ID to get started. There are multiple ways to do this, depending on your needs. The ObservePoint API is flexible, so you can use the approach that works best for you.

Here are 3 ways to get your Audit ID and Run ID:

From a webhook: If you have a webhook configured, the webhook payload will include Audit ID and Run ID. See the Webhooks section above for setting up webhooks.
Manually (good for one-time testing): You can find the Audit ID in the ObservePoint application under "Data Sources". Click on the Audit you want, note the Audit ID and Run ID in the address bar.

From the API: You can query the API at https://api.observepoint.com/v3/web-audits/ to get the list of Audits in your account. You can query https://api.observepoint.com/v2/web-audits/auditId/runs to get the list of recent run IDs. A common use case is to query the ObservePoint API for any run IDs you haven’t already ingested into your database.
- Important: this approach uses a mix of v2 and v3 endpoints

Tip: A common scenario is to download and store Audit Run IDs in a database you control. Later, when your code runs, it can first query your database for the most recently processed run ID, and then query the ObservePoint API for any to find the next run ID that hasn’t yet been downloaded.

Step 2: Download all the pages from the Audit Run:

Make an authenticated POST request to this URL with an empty request payload:

https://api.observepoint.com/v3/web-audits/auditId/runs/runId/reports/page-summary/pages?size=pageLimit&page=pageNumber

In your code, start with page=0, and make multiple requests, incrementing the page number for each request, until you have downloaded all the web pages for this run (see the “Pagination” section above for more instructions).

Each response will look like the following. Note that this example Audit Run scanned 544 web pages, and we requested a page size of 100, so there are 6 total pages to request.

{
    "metadata": {
        "pagination": {
            "totalCount": 544,
            "totalPageCount": 6,
            "pageSize": 100,
            "currentPageSize": 100,
            "currentPageNumber": 0
        }
    },
    "pages": [
        {
            "pageId": "77ebb089815a3b8d82813ebaf6320730",
            "dataCollectionUuid": "77ebb089815a3b8d82813ebaf6320730",
            "pageUrl": "http://example.com/",
            "pageTitle": "Example Home Page",
            "pageLoadTime": 665,
            "pageStatusCode": 200,
            "initialPageStatusCode": 200,
            "finalPageStatusCode": 200,
            "redirectCount": 0,
            "size": 34363
        },
        {
            "pageId": "139cf2a71e4ecb1967b7a5b47770e66a",
            "dataCollectionUuid": "139cf2a71e4ecb1967b7a5b47770e66a",
            "pageUrl": "http://example.com/path",
            "pageTitle": "Example Web Page",
            "pageLoadTime": 1621,
            "pageStatusCode": 200,
            "initialPageStatusCode": 200,
            "finalPageStatusCode": 200,
            "redirectCount": 0,
            "size": 956436
        },
        ...
    ]
}

In the next step, you will use the page ID field from each of the web page records you fetch above.

Step 3: For each web page, fetch its network requests

From step 2, you have a list of page IDs (example: 139cf2a71e4ecb1967b7a5b47770e66a). The next step is to query the API for the network requests which ObservePoint captured on each page.

For each page, make an authenticated GET request to this URL:

https://api.observepoint.com/v3/web-audits/auditId/runs/runId/pages/pageId/request-log?size=100

This request is also paginated (see “Pagination” above), so you’ll need to iterate through all pages if there are more than 100 network requests for any given page.

Each response will look like this. Note that the web page in this example made 4 network requests, and we requested size=100, so there is only one page of results to download.

{
    "metadata": {
        "pagination": {
            "totalCount": 4,
            "totalPageCount": 1,
            "pageSize": 100,
            "currentPageSize": 4,
            "currentPageNumber": 0
        }
    },
    "requests": [
        {
            "requestUrl": "http://demo.example.com/image.jpg",
            "statusCode": 200,
            "loadTime": 292,
            "geoLocation": "United States",
            "geoLocationInfo": {
                "countryCode": "US",
                "countryName": "United States"
            },
            "mimeType": "image/jpeg",
            "responseSizeBytes": 17581
        },
        {
            "requestUrl": "http://static.example.com/js/demo.js",
            "statusCode": 200,
            "loadTime": 355,
            "geoLocation": "United States",
            "geoLocationInfo": {
                "countryCode": "US",
                "countryName": "United States"
            },
            "mimeType": "text/html",
            "responseSizeBytes": 1483
        },
        ...
    ]
}

This API endpoint returns these fields in its payload:

Field Name	Field Description
requestUrl	The URL of the network request sent by the page
statusCode	The HTTP status code of the network request
loadTime	The time in milliseconds that it took to complete this network request
geoLocation	The country from which this request was served, based on its IP address and the MaxMind GeoIP2 database.
geoLocationInfo	Details about the geo location, including country code
mimeType	The mime type as reported by the server
responseSizeBytes	The size, in bytes, of the network response content

Conclusion

With all the web pages and network requests downloaded for this Audit Run, you can store them in a database and report/visualize them as you like.

Webhooks

API Recipe: Extract All Tags from All Web Pages

API Recipe: Update Audit Starting URLs

API Recipe: Extract All Console Logs from All Web Pages

API Recipe: Extract All Resource Initiators from All Web Pages