Next steps: Challenge Yourself
“Find the bag of words model for the Airline Tweets Dataset”.
Go to the next hint only if you can't figure out the previous hint.
Hint 1: Follow the pipeline: Tokenize -> Build your bag of words -> Find frequency
Hint 2: humanreadablesequenceofcharacters 2 c(“human”,”readable”,”sequence”,”of”,”characters”)
Hint 3: Follow the pipeline: Tokenize -> Build your vocabulary -> Vectorizer -> Find frequency
Hint 4: text2vec package
Hint 5: main functions: itoken, create_vocabulary, vocab_vectorizer, create_dtm
Hint 6: Reference
The code will be uploaded at the end of the workshop to our repository. (You will receive a notification on GitHub).
If you do not know how to "PULL" the latest verison of our repository, please follow this link.