Next steps: Challenge Yourself

“Find the bag of words model for the Airline Tweets Dataset”.

Go to the next hint only if you can't figure out the previous hint.

Hint 1: Follow the pipeline: Tokenize -> Build your bag of words -> Find frequency

Hint 2: humanreadablesequenceofcharacters 2 c(“human”,”readable”,”sequence”,”of”,”characters”)

Hint 3: Follow the pipeline: Tokenize -> Build your vocabulary -> Vectorizer -> Find frequency

Hint 4: text2vec package

Hint 5: main functions: itoken, create_vocabulary, vocab_vectorizer, create_dtm

Hint 6: Reference

The code will be uploaded at the end of the workshop to our repository. (You will receive a notification on GitHub).

If you do not know how to "PULL" the latest verison of our repository, please follow this link.

results matching ""

    No results matching ""