Software Architect / Developer / Microsoft MVP (AI)

Azure, C#, Prototyping, Twitter

New Social Opinion API Integration: Twitter Labs Sampled Stream

In some of my earlier posts I’ve introduced the newer Twitter Labs APIs. I also introduced the Social Opinion API and how I’d been building integrations with Twitter new Lab Endpoints.  At the time of publishing that blog, the Social Opinion API had coverage for 5 of the 8 Labs API Endpoints which included:

  • Filtered Stream
  • Metrics
  • Recent Search
  • Tweet Lookup
  • User Lookup

I’m happy to announce that a 6th has been integrated – the Sampled Stream API.

In this blog post, I’ll introduced the Sampled Stream and the Social Opinion API integration with it.

I’ll also show you can use the Social Opinion API to consume the Sampled Stream with a few lines of code and we’ll see how it lets you instantly connect to the Twitter firehose in real-time.

What is the Sampled Stream?

The Sampled Stream API lets you stream about 1% of all new public Tweets as they happen. In some respects, the way it behaves is like the Filtered Stream.  The main difference being the Sampled Stream doesn’t let you define rules to filter data like its Filtered Stream counterpart.

You’re allowed to have one client connected at any given time and the idea is that you have this running 24×7 and pulling data in real-time straight from all public Tweets. As this endpoint is in preview mode, this constraint me be lifted in future.

What data does the Sampled Stream provide?

Here we can see the typical JSON payload the Sampled Stream will deliver:

{

  "data": {

    "id": "1189226081406083073",

    "created_at": "2019-10-29T17:02:47.000Z",

    "text": "Sharing Tweets in DMs is our love language. Today, for Android users, we’re making that easier. See more details: https://t.co/eu1upmY4yo",

    "author_id": "783214",

    "in_reply_to_user_id": "783214",

    "referenced_tweets": [

      {

        "type": "replied_to",

        "id": "1151997885455581185"

      }

    ],

    "entities": {

      "urls": [

        {

          "start": 113,

          "end": 137,

          "url": "https://t.co/eu1upmY4yo",

          "expanded_url": "http://developer.twitter.com",

          "display_url": "developer.twitter.com",

          "status": 200,

          "title": "Developers Tap into What’s Happening",

          "description": "Discover the power of Twitter APIs"

        }

      ],

      "annotations": [

        {

          "start": 55,

          "end": 61,

          "probability": 0.9596,

          "type": "Product",

          "normalized_text": "Android"

        }

      ]

    },

    "stats": {

      "retweet_count": 341,

      "reply_count": 372,

      "like_count": 2773,

      "quote_count": 70

    },

    "possibly_sensitive": false,

    "lang": "en",

    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>",

    "context_annotations": [

      {

        "domain": {

          "id": "45",

          "name": "Brand Vertical",

          "description": "Top level entities that describe a Brands industry"

        },

        "entity": {

          "id": 781974596165640193,

          "name": "Technology",

          "description": "This entity includes conversation about Information Technology"

        }

      },

      {

        "domain": {

          "id": "46",

          "name": "Brand Category",

          "description": "Categories within Brand Verticals that narrow down the scope of Brands"

        },

        "entity": {

          "id": 10026820777,

          "name": "Android",

          "description": "Mobile operating system based on the Linux kernel."

        }

      }

    ],

    "format": "detailed"

  }

}

You also have a few options to help to refine the size of the JSON payload.  You can see these here:

(image source developer.twitter.com)

How to connect to the Sampled Stream with the Social Opinion API

As we’re dealing with a real-time interface, this integration uses Events.  At a high level, the process is as follows:

  1. Setup Authorisation and OAuth
  2. Configure events
  3. Start the service
  4. Parse data from the Social Opinion API as events are raised

Under the hood there are a few happen to support this.

  • A constant network connection is maintained to the Twitter API by the Social Opinion API
  • When JSON data is fetched from the Twitter Sampled Stream endpoint, it is transformed and converted to one or many objects
  • Events are raised which contains an object model for you to work with.

From a development experience it means you can simply install the NuGet package and get started with just a few lines of code. It also means you get the full intellisense experience as you code.

The Social Opinion API encapsulates all this for you though, so you don’t need to worry about it.

Authorisation

Like the other Labs Integrations, the first thing you need to setup your OAuth Token and Keys:

string _ConsumerKey = ConfigurationManager.AppSettings.Get("ConsumerKey");

string _ConsumerSecret = ConfigurationManager.AppSettings.Get("ConsumerSecret");

string _AccessToken = ConfigurationManager.AppSettings.Get("AccessToken");

string _AccessTokenSecret = ConfigurationManager.AppSettings.Get("AccessTokenSecret");


OAuthInfo oAuthInfo = new OAuthInfo

            {

                AccessSecret = _AccessTokenSecret,

                AccessToken = _AccessToken,

                ConsumerSecret = _ConsumerSecret,

                ConsumerKey = _ConsumerKey

            };

These are being assigned to the object OAuth.

Wiring up Sampled Stream Service Events

Next, we create an instance of the SampledStreamService, passing in the OAuth object:

SampledStreamService streamService = new SampledStreamService(oAuthInfo);

Now that we have an instance of the service, we need to wire up on event called DataReceivedEvent

streamService.DataReceivedEvent += StreamService_DataReceivedEvent;

We also need to add the code which handles this event:

private static void StreamService_DataReceivedEvent(object sender, EventArgs e)

{

SampledStreamService.DataReceivedEventArgs eventArgs = e as SampledStreamService.DataReceivedEventArgs;


SampledStreamModel model = eventArgs.StreamDataResponse;

}

In this event you can see we have an object called SampledStreamModel.  This is a class used to represent all data which belongs the Sampled Stream JSON payload. This is returned as part of the eventArgs parameter.

Sampled Stream Model

It’s worth detailing the properties you get access to within the SampledStreamModel, this is what we can see here:

public class SampledStreamModel
{

  public Data data { get; set; }

}




public class ReferencedTweet
{

  public string type { get; set; }

  public string id { get; set; }

}




public class Url
{

  public int start { get; set; }

  public int end { get; set; }

  public string url { get; set; }

  public string expanded_url { get; set; }

  public string display_url { get; set; }

  public int status { get; set; }

  public string title { get; set; }

  public string description { get; set; }

}


public class Annotation
{
  public int start { get; set; }

  public int end { get; set; }

  public double probability { get; set; }

  public string type { get; set; }

  public string normalized_text { get; set; }

}


public class Entities
{

  public List<Url> urls { get; set; }

  public List<Annotation> annotations { get; set; }

}


public class Stats
{

  public int retweet_count { get; set; }

  public int reply_count { get; set; }

  public int like_count { get; set; }

  public int quote_count { get; set; }

}


public class Domain
{

  public string id { get; set; }

  public string name { get; set; }

  public string description { get; set; }

}


public class Entity
{
  public object id { get; set; }

  public string name { get; set; }

  public string description { get; set; }

}


public class ContextAnnotation
{

  public Domain domain { get; set; }

  public Entity entity { get; set; }

}


public class Data
{

  public string id { get; set; }

  public DateTime created_at { get; set; }

  public string text { get; set; }

  public string author_id { get; set; }

  public string in_reply_to_user_id { get; set; }

  public List<ReferencedTweet> referenced_tweets { get; set; }

  public Entities entities { get; set; }

  public Stats stats { get; set; }

  public bool possibly_sensitive { get; set; }

  public string lang { get; set; }

  public string source { get; set; }

  public List<ContextAnnotation> context_annotations { get; set; }

  public string format { get; set; }

}

You can see from the above you get access to loads of information from the various classes that make up the class SampledStreamModel.

Starting the Sampled Stream Service

At this point, we have everything in place now and can start the Social Opinion API SampledStreamService! You can do this with this by calling the StartStream method:

streamService.StartStream("https://api.twitter.com/labs/1/tweets/stream/sample?tweet.format=detailed", 100, 5);

You can see 3 parameters are needed:

  • The stream endpoint address
  • The number of tweets to fetch (100)
  • The number of times to retry in the event of failed network connections (5)

A Closer Look with the Debugger

I’ve housed the Sampled Stream Service in a Console Application and have placed a breakpoint on the event StreamService_DataReceivedEvent

First, we can see the Tweet copy:

(For reference – the entire copy says: “RT @clairo: hey guys – wanted to share a Google Doc that has info on how to better educate yourself “)

Next, we can expand the data property and see all the properties the Social Opinion API has extracted:

We can expand the Context Annotations property:

Selecting a Context Annotation surfaces the following Domain and Entity insights:

I think we can agree that Google is both a brand and entity being discussed in the selected Tweet!

We can expand the Stats property and see some KPIs for this Tweet:

I could go on, but you get the idea!

Use Cases and Ideas

The use case for this is straightforward, if you want a raw, diverse, and unfiltered connection to public Tweets this will do the job for you.

Having access to a 1% sample of public Tweets in real-time can help you quickly detect spikes or dips around topics and conversations. You can take this a step further by surfacing Context Annotations to determine which Entities or Domains are helping shape the public conversation.

It really depends on the insights you need to surface!

I prefer to be able to pre-filter the data before its laid to rest in Azure or some other datastore so would look towards the Filtered Stream or Recent Search APIs. That said, the Sampled Stream is a good way to introduce yourself to real-time connections and gives you access to a 1% sample of new public Tweets as they happen.

If performance is important to you, you can bypass the Social Opinion API Sampled Stream Service use the underlying Sampled Stream Client. Doing this means Twitter API JSON is transformed to a strongly typed object.

This will also raise an event but will contain the raw JSON.  You could then ingest real-time data to Azure Table Storage for further enrichments.

Summary

In this blog post I’ve introduced the Sampled Stream API.  We’ve looked at the data it returns and seen how you can easily consume the Samples Stream API with the latest Social Opinion API integration.

You can get the latest cut of the Social Opinion API from NuGet here.

Next steps are to integrate the Hide Replies. When this is done, the Social Opinion API will support all Labs endpoints.

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.

Leave a Reply