Overview
Continuously stream tweets containing a set of track terms. Aggregate top hashtags, top mentions from the stream and store them in a rocks db instance. Contains a local executable that can run forever, collecting aggregates and storing results in a local rocks DB. Also has a repl mode for querying the results from the db.
Project Structure
TweetGateCore
contains classes for querying twitter, aggregating and storing results in DB.TweetGate
contains classes for the executable and commands described in Usage section.
High level logic
TwitterStream.cs\StartTwitterPump()
Sends tweets to a pipe using System.IO.Pipelines. ==> TwitterStream.cs\ProcessTweetStream()
Pushes tweets to a reactive subject.
TweetSubject ==> Query.cs\SimpleAggregate()
returns a query that aggregates data and returns observables for aggregates ==> RocksDBStore.cs\PersistObservableAsync()
stores aggregates to DB.
Program.SaveAggregates.cs
kicks off above workflow.
ReactiveX is used for publish-subscribe mechanism and minimal Trill is used for window aggregations.
RocksDB Sharp is used for storing aggregate data.
Usage
Either install dotnet and do “dotnet run” or build a self-sufficient executable.
-
Save tweets to a local file.
saveTweets [twitterConfigJsonFile] [destinationFile] [durationMinutes]
Saves tweets to the file provided for duration minutes. -
Compute Aggregates from a local file.
saveAggregates file [inputDataFile] [rocksDBPath]
Aggregates tweets ininputDataFile
and stores aggregates in DB.
Use case is for first storing the tweets in a file using (1) and then computing aggregates over it. Mainly used for testing. -
Compute aggregates for tweets directly from twitter stream api.
saveAggregates direct [twitterConfigJsonFile] [rocksDBPath]
Streams tweets from twitter API, aggregates them and stores aggregates in DB.
Use case is for storing aggregates for certain keywords. -
View aggregates in DB
repl [rocksDBPath] [OutputDirectoryPath]
Apis for reading content in the DB. Additional details available inProgram.Repl.cs
IfOutputDirectoryPath
is provided, results are stored in files in that directory. If it not provided, results are printed to console. Use case is to quickly view DB content.
Example Twitter Config. This page has more details on TrackTerms
.
{
"TrackTerms": "comma,@separated,#hashTags,and,text",
"OAuthConsumerSecret": "<>",
"OAuthToken": "<>",
"OAuthTokenSecret": "<>",
"OAuthConsumerKey": "<>"
}
Example aggregates output for about 30 minutes is available here - Top Hashtags , Top Mentions, Top Retweets.