A basic overview of the RMT Corpus is given in Table 2.
The corpus encompasses nearly 80,000 tweets, comprising one million words, compiled by 2,300 users.
Remember that hyphenated words are counted as an individual token (e.g. each instance of here-turi-kōkā counts as one word).
“” and “” references are excluded from the total word count, whereas numbers and hashtags are not. [newline]Research also indicates that social media marketing and the web can play a significant role in supporting the revitalisation of minority languages.

You have harvested data from the Twitter livestream and searched back over the previous six days.
Write Python code that outputs sentiment values for three of one’s dataframes.
When you output the three values, arrange them in what you guess will be the least positive to many positive sentiment.

All the bots we are building have some functionality in keeping.
For example, they have to authenticate to the Twitter API.

  • Whenever we installed twarc2, the initial release of twarc (that can be regarded as “twarc1”), is also installed.
  • The word does not contain any double consonants, excluding ‘ng’ and ‘wh’, which are single consonants in Māori.
  • It’s also possible to request another user’s timeline via the id parameter.
  • Streaming lets you actively watch out for tweets that match certain criteria instantly.

When we first ran this, just a tiny fraction of the tweets remained.
But more recently, we noticed that about 80% gets rehydrated.
We suspect that Twitter has restored a few of this content.
A more sophisticated text analysis would include passing this through another filter to remove the main one and two letter words and the URL’s.
All this was to obtain our tweets right into a string because TextBlob has its own data format, so we needed a string to pass to textblob.

Recall that API is an acronym for Application Programming Interface.
APIs allow the development of bots that generate tweets.
Some examples include seismographs, weather forecasts, or delivering content with respect to a commercial brand.
In our case, a single tweet acts as a single JSON object and the JSONL acts because the file that stores every one of them.
JSONLs better for collecting twitter data, since JSON’s are not as able to storing multiple units of data compared.

In this section, we present an initial analysis of the RMT Corpus.
We intend to build on this analysis by creating visualisations for exploring the RMT Corpus in future work.
Significantly less than 1% of the 9.5 million tweets that were used as input for third step had sufficient Māori text to be included in the corpus.
The overwhelming most discarded tweets were written in English, showing that the English language dominates the Twittersphere even among the cohort of reo Māori tweeters.
Qualitative research would be needed to understand why Māori-language users choose to tweet in Māori or English in a specific context.

One of the most striking observations is that a number of these users haven’t been as active in recent years.
Almost all of the ten most prolific tweeters’ Māori-language activity spiked between 2013 and 2015, after which their volume of tweets steadily declined.
While the known reasons for this aren’t clear, it may be as the individuals concerned are now using Twitter less often , and as such, posting fewer tweets.

