Software Architect / Developer / Microsoft MVP (AI)

Ads API, AdTech, Analytics and Big Data, MVP, Social Media, Twitter

Introducing the Twitter Labs Filtered Stream API

In an earlier blog post, I introduced the Twitter Labs initiative and the Metrics API. In that post, I showed you how to surface metrics for tweets such as the number of impressions, retweets and likes tweets had received.

We also looked at the underlying JSON the Metrics endpoint payload contains and seen how to consume this in using C#.

In this blog, I introduce another API that belongs to the Labs Program – the Filtered Stream API.

What is the Filtered Stream API?

The Filtered Stream API allows you to filter the real-time stream of public Tweets.  You do this by applying Rules which consist of a combination of operators.  Two of the coolest things about this API are that you can create up to 10 Rules and you can also select the JSON payload format that you want.

Another nice feature is that the Rules you create are retained and bound to your account.  This means you don’t need to recreate Rules if your connection dropped.

At the time of writing, you can stream up to 500,000 Tweets per month. You can even stream up to 500,00 per day if you wanted to.  There is a constraint however in that you can’t receive more than 50 Tweets per second.

Getting Start with the Filtered Stream API Endpoint

Like the Metrics API, you need to activate this API via the Labs Dashboard. You also need an approved developer account and a registered Twitter developer app.

Armed with these you can then consume the following endpoints:

  • Initiate a stream connection        (GET /labs/1/tweets/stream/filter)
  • Retrieve your rules          (GET /labs/1/tweets/stream/filter/rules)
  • Create a new rule            (POST /labs/1/tweets/stream/filter/rules (create))
  • Delete a rule       (POST /labs/1/tweets/stream/filter/rules (delete))

You can find out more about these endpoints here.

Authentication

Any code you create needs to authenticate your client prior to consuming this API using Bearer (or Application Only) Authentication. You can see an overview of the process flow that’s needed to generate the bearer token here:

(image source developer.twitter.com)

I’ve written a library in C# which encapsulates all of this for me. I simply send in my consumer key and secret then BOOM! – I automatically get the bearer token returned. All in a few lines of code.  You can find out more about application-only authentication here.

Rules and Operators

To filter data from the Streaming API you need to create 1 or more Rules.  These are effectively your queries that filter the real-time data from the Streaming API.

It’s beyond the scope of this blog to go into the queries and operators you can leverage.  Let me just say that they are VERY comprehensive.

You get enterprise-quality query operators and conditions that let you surface the narrowest of signals that you might be interested in.

Here, you can see a simple Rule I have created to track Tweets in English that are mentioning the hashtag: #COVID2019:

data": [

{

"id": "1239103125254017026",

"value": "lang:en #COVID2019",

"tag": "english corona tweets"

}

],

"meta": {

"sent": "2020-03-24T15:02:03.132Z"

}

Some of the other notable data points you can filter on include, but are not limited to:

  • location of the user when they tweeted
  • where the user lives
  • if the Tweet contains links, images, hashtags, and media, etc.
  • if the Tweet has mentioned a product, company, etc.

You can find out more about the operators you can apply to your Rules and Search Queries here.

Annotations

With the basics explained I want to mention Annotations.  These technically aren’t a sole feature of the Filtered Streaming API but are well worth a mention as the Filtered Streaming API has now been updated to include this extra data.

In a nutshell, Annotations are a set of data points that contain contextual information about the Tweet itself. Each Tweet is analysed by Twitter and then supplemented with this contextual information.  This makes it much easier to find signal in the noise when processing data at scale.

Entity Annotations

Entities are comprised of people, places, products, and organisations. Entities are delivered as part of the entity payload section. They are programmatically assigned based on what is explicitly mentioned in the Tweet text.

You can see an example of some of these here:

Context Annotations

These are derived from the analysis of a Tweet’s text and will include a domain and entity pairing which can be used to discover Tweets on topics that may have been previously difficult to surface.  At present, Twitter has a list of 50+ domains to categorise Tweets.

JSON Payloads

With this API you get 3 different options in terms of the content of the JSON payload:

  • Default
  • Compact
  • Detailed

Here you can see an example of a Detailed payload:

{
  "data": {
    "id": "1189226081406083073",
    "created_at": "2019-10-29T17:02:47.000Z",
    "text": "Sharing Tweets in DMs is our love language. Today, for Android users, we’re making that easier. See more details: https://t.co/eu1upmY4yo",
    "author_id": "783214",
    "in_reply_to_user_id": "783214",
    "referenced_tweets": [
      {
        "type": "replied_to",
        "id": "1151997885455581185"
      }
    ],
    "entities": {
      "urls": [
        {
          "start": 113,
          "end": 137,
          "url": "https://t.co/eu1upmY4yo",
          "expanded_url": "http://developer.twitter.com",
          "display_url": "developer.twitter.com",
          "status": 200,
          "title": "Developers Tap into What’s Happening",
          "description": "Discover the power of Twitter APIs"
        }
      ],
      "annotations": [
        {
          "start": 55,
          "end": 61,
          "probability": 0.9596,
          "type": "Product",
          "normalized_text": "Android"
        }
      ]
    },
    "stats": {
      "retweet_count": 341,
      "reply_count": 372,
      "like_count": 2773,
      "quote_count": 70
    },
    "possibly_sensitive": false,
    "lang": "en",
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>",

    "context_annotations": [
      {
        "domain": {
          "id": "45",
          "name": "Brand Vertical",
          "description": "Top level entities that describe a Brands industry"
        },
        "entity": {
          "id": 781974596165640193,
          "name": "Technology",
          "description": "This entity includes conversation about Information Technology"
        }
      },
      {
        "domain": {
          "id": "46",
          "name": "Brand Category",
          "description": "Categories within Brand Verticals that narrow down the scope of Brands"
        },
        "entity": {
          "id": 10026820777,
          "name": "Android",
          "description": "Mobile operating system based on the Linux kernel."
        }
      }
    ],
    "format": "detailed"
  },
  "matching_rules": [
    {
      "id": "1166916266197536768",
      "tag": "from-twitter"
    },
    {
      "id": "1166916266197536769",
      "tag": "is-verified"
    }
  ]
}

You get loads of insights in the above JSON. One of the coolest things is the probability scoring of the Annotation.  For example, an Android Product has been identified in the above Tweet with 0.96 confidence:

annotations": [
        {
          "start": 55,
          "end": 61,
          "probability": 0.9596,
          "type": "Product",
          "normalized_text": "Android"
}

Another cool insight are the statistics you get such as the number if Likes, Comments and Retweets:

stats": {
      "retweet_count": 341,
      "reply_count": 372,
      "like_count": 2773,
      "quote_count": 70

You probably have others that interest you, but these are the ones that jump out at me.

What Does all this Mean?

In the past, I’ve written Part of Speech (POS) Taggers and implemented Named Entity Recognition APIs to identify some of the insights we’ve just looked at. You now get a lot of this out of the box with the Filtered Stream API and the Rule querying syntax.

The Filtered Stream API gives you a leg up on development and saves you from having to do some of this.

If you’re into processing and analysing Twitter data, being able to surface this data lets you build some innovative solutions. Sprinkle a little AI over the top of this data and you can glean even more insights.

Use Cases

I have quite a few ideas about how the Filtered Stream API can be used and integrated (especially around integrating the Twitter Ads API with it).  Some of these include:

  • off the shelf component –  create a component that encapsulates all required calls needed to consume and process the Filtered Stream API. This would save writing all the low-level authentication and mapping code that’s required to consume the Filtered Stream API, thereby saving developers time
  • real-time analytics –  use the data for reporting purposes to help inform business decisions.  Ideal for data analysts, researchers, marketing and ad-tech specialists
  • white label product  – expose this data and functionality in a white label product that you can resell – “Twitter Analytics in a box”.
  •  SaaS solution – create  full-blown solution, using insights from the Filtered Stream with other datasets and your custom IP as part of a bigger SaaS solution

These are just some ideas and you no doubt have your own.

Summary

In this blog post, I’ve introduced the Twitter Filtered Stream API.  We’ve looked at the functionality the Filtered Stream offers, how you can connect to it and explored some of the rich insights it lets you surface. I’ve also shared some ideas about how you might use this.

For the last few weekends, I’ve been working on an API that can connect to the Filtered Stream API.  This takes the form of a reusable .NET Core class library that can be used by multiple projects.

I’ve merged this API into a product I’ve been building called Social Opinion.  At the time of writing,  I’ve shared preview functionality of Social Opinion with the DevRel Team at Twitter (thanks for your support!).

One part of this preview functionality makes it easier to visualise insights such as Annotations and Entities and there has been some good initial feedback.

that’s good – it’s a clear way of visualizing annotations really

I look forward to hearing more feedback and seeing what else testing results in.

Next weekend’s blog post will be centered around connecting to the Filtered Stream API and will introduce aspects of the Social Opinion API integration.

  • Got a question?
  • Want to know more?
  • Interested in this interface?

Drop me a message below or reach out on social.

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.

Leave a Reply