-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathREADME.txt
58 lines (39 loc) · 2.43 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
________ _______________ ____________
___ __/__ ____(_)_ /__ /______________ ___/_ /__________________ _______ ___
__ / __ | /| / /_ /_ __/ __/ _ \_ ___/____ \_ __/_ ___/ _ \ __ `/_ __ `__ \
_ / __ |/ |/ /_ / / /_ / /_ / __/ / ____/ // /_ _ / / __/ /_/ /_ / / / / /
/_/ ____/|__/ /_/ \__/ \__/ \___//_/ /____/ \__/ /_/ \___/\__,_/ /_/ /_/ /_/
###########################################
### twitterStream Data Ingest version 1.0
###########################################
To begin generating data:
1. First open twitter_kafka_direct.py and add in the needed credentials for your twitter dev account.
* http://dev.twitter.com
2. Ensure you have all requirements installed and that python can access the modules (see requirements.txt) .
- if you need for example tweepy and have pip :
>> pip install tweepy
- if you don't have pip download get_pip.py (latest from the google webs :) and run:
>> python get_pip.py
>> pip install tweepy
- if you like you can also track tweepy down and install it manually but I don't see why when pip is so awesome.
3. Then test by opening a terminal window then cd into the directory with the python script and run:
### Replace the generic paths with the path in your configuration.
>> python /path/to/twitter_kafka_direct.py
4. to deliver the stream to csv:
- replace stubs with values for your tokens in twitterStream.py
>> python /path/to/twitterStream.py > twitterData.csv
5. to write data to kafka:
- Use twitter_kafka_direct.py. Replace token stubs with your values and state your topic mytopic
default is 'topic'
>> python /path/to/twitter_kafka_direct.py
that will begin to stream data events into a kafka producer.
###########################################
*** Note ***
this procedure assumes a topic named twitterstream exists in kafka to produce the data to.
if you need to create a topic use the following code :
>> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitterstream
check what topics you have with:
>> bin/kafka-topics.sh --list --zookeeper localhost:2181
to check if data is in fact landing in kafka:
>> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic twitterstream --from-beginning
###########################################