Brazilian tool monitors disinformation on networks and WhatsApp, a black box for fake news

Millions of Brazilian voters will participate in elections in more than 5,500 municipalities this Sunday, Nov. 15, and choose their future mayors and councilors. To decide their candidates, they need to navigate a sea of false news, mainly on WhatsApp, pointed out by experts as a black box for misleading content. Therefore, as part of the effort to cover the elections and unmask rumors, the fact-checking site Aos Fatos started monitoring WhatsApp groups, through its tool Radar.

Radar is an automated system that tracks websites and social networks in Brazil in real-time in search of potentially misleading content. Initially, the tool was planned to launch for municipal elections, but with the pandemic, the team anticipated planning to follow up on rumors about the coronavirus. The platform, launched in beta in August, gradually expanded to include more channels, such as websites, Twitter and YouTube, until it included WhatsApp in mid-October. The plan is to add Facebook and Instagram by the end of the year.

Ferramenta Radar, do Aos Fatos

Ferramenta Radar, do Aos Fatos. Imagem: Divulgação

To monitor WhatsApp, Aos Fatos hired data science company Twist Systems. With this partnership, they follow 270 political discussion groups, which are public – this means that these groups have made an access link available on the Internet for anyone to join. Currently, more than one million WhatsApp messages are collected and analyzed weekly by the tool.

The director of innovation for Aos Fatos and product leader for Radar, Carol Cavaleiro, explained to LatAm Journalism Review (LJR) that the contract with Twist Systems was important because the company had experience with monitoring on WhatsApp during other elections. "They already had a database of these groups. And every time a new group appears, they evaluate it," Cavaleiro said.

For Radar editor, journalist Bruno Fávero, monitoring of WhatsApp has attracted attention. "It is something that very few people do and nobody does like us. It is a super difficult platform to obtain data and that was at the center of discussions about disinformation in the 2018 Brazilian elections," Fávero told LJR.

Another advantage of the project is to map the rumors and misleading content on different platforms, which allows them to understand and analyze how the waves of disinformation arise, grow and spread through various networks.

"Radar arose from the need to have a broader view of the Brazilian disinformation system. It is very common [for people] to focus on one network. Many people study disinformation on Twitter, other people study it on YouTube. What we try to do with Radar is to cross all these platforms," Cavaleiro said.

Thus, when the team analyzes content on WhatsApp, it is also evaluating and collecting links that circulate there and can lead to sites with fake content, for example. "Every time we receive a story from a new website and it falls into this filter of the possibility of disinformation, we already look for a Facebook channel, a Twitter profile. We analyze how these accounts and different platforms integrate,” Cavaleiro explained.

To handle all this work, Radar has ten professionals, almost all of whom are fully dedicated. In addition to Cavaleiro and Fávero, the team has three developers, two linguists, two reporters and a data scientist. "Radar is not just the monitor, which is the most public face of the project. It is a nucleus that produces data and reports about disinformation in the networks,” Fávero said.

Radar was conceived by the executive director of Aos Fatos, Tai Nalon, and planning began in 2019. The platform won the Google Innovation Challenge, received support from the Google News Initiative and, with that, was able to get off the ground.

According to Cavaleiro, the initial financial contribution to Radar is at its end, and now the platform seeks to be self-sustainable. Private consultancies, for companies seeking to obtain risk and data analysis on public opinion, are already underway and are a source of revenue.

At the same time, with the material generated by Radar, the team produces reports and a newsletter, which reaches around 600 subscribers. Currently, all material is open and free, because it is in the testing phase, but the plan is to establish a paywall for the articles and charge a subscription for the newsletter, which includes a weekly analysis and the full data collected by the tool. The target audiences are newsrooms, universities, think tanks, research centers, as well as private companies.

How does the monitor work?

The one-year planning period for Radar was crucial to get the monitor, which has a very complex operation, up and running. The tool's methodology is published, in a detailed and transparent way, on the website.

The first step in using the tool is to define the themes that will be monitored. Currently, in addition to the pandemic, Radar focuses on municipal elections in Brazil and in the two largest cities, São Paulo and Rio de Janeiro. The Radar team then assembles automated searches on the free APIs (Application Programming Interface) of each social network to collect data on these subjects.

The terms inserted by the team in this search need to be broad, such as coronavirus, pandemic, flu, COVID-19, to capture as much information about the topic as possible. "The tool we built always depends on public APIs [of the social networks themselves]. So, no matter how much we collect broad terms, we never get the whole of Twitter, for example, because the public API limits by volume of data collected,  the number of requisitions, etc.," Cavaleiro explained.

To monitor websites, the first step is similar, except that, instead of a public API, they work with Trendolizer. This tool, Cavaleiro said, builds a database from external links. Thus, Trendolizer includes in the bank not only the address of the selected website, but all those whose links appear on the page. In other words, the database of the monitored sites grows progressively, automatically.

At this point, the fact-checking experience of Aos Fatos, which is already five years old, was crucial. The organization had an extensive base of websites, the content of which they had already verified, and this was used to feed the tool initially.

After this extensive data collection, Radar automatically organizes and cleans it, to eliminate very short publications, without enough information, or duplicates. Then, Radar passes this content through an automated scoring system, which helps to identify whether a publication has the potential to be misleading. The scoring system consists of a series of metrics, designed specifically for each platform. There are about 40 criteria considered when analyzing Twitter publications, 23 on WhatsApp, 30 on YouTube, and 36 on websites.

A diretora de inovação do Aos Fatos e líder de produto do Radar, Carol Cavaleiro

A diretora de inovação do Aos Fatos e líder de produto do Radar, Carol Cavaleiro. Foto: Arquivo Pessoal

On YouTube, for example, the description and title texts are considered, as well as characteristics of the publishing channel and comments made by other users – the videos themselves cannot all be transcribed and, therefore, do not enter the evaluation. In the case of sites, the classification considers, among other things, date and time of publication, author, textual structure of the title and the full text. Some factors that are evaluated for various channels are the excessive presence of capital letters, emojis or misspellings. "If the publication has special characters, for example, if it has Covid with zero and 1 [C0ViD], then it scores," Cavaleiro said.

Many of the criteria start from words commonly used in false news or writing styles, studied and mapped by linguists based on past fact-checks done by Aos Fatos. When analyzing disinformation campaigns, they look for linguistic patterns and translate this into rules applicable in the system.

"If you have an alarmist term, like 'attention,’ 'warning,’ or an exaggerated construction, like 'everyone in the world,’ it will score. And the system will make this combination of terms. If a tweet has only the word Covid, it has little chance of being disinformation. But if it has Covid, another alarmist term, another generalization and an insult, it has more chances,” Cavaleiro explained.

In addition, with the knowledge of the fact-checkers of Aos Fatos, the team adds expressions that are frequent in misleading content, such as "vachina," used to negatively describe the vaccine tested in Brazil in partnership with a Chinese company.

"Within our database, we apply these filters, like 'vachina,' or the sum of constructions, like 'vachina,’ plus a problematic hashtag, plus terms like 'urgent' and 'share.’ This is a very common construction, mainly in conspiracy theories. It's always very alarmist and has a call to action at the end," Cavaleiro said.

Fávero pointed out that these rules are not random, but the result of much research. Since there is no kind of tutorial or glossary of problematic terms, this needs to be created from scratch for each subject that Radar will monitor, and then be constantly updated. "It is dynamic, because the vocabulary and the way people talk changes over time, so there is a continuous process of improving the rules,” he said.

Scoring and limitations

At the end of the process, each piece of content gets a score from 1 to 10. Only publications with a score below 5, considered of low quality, are displayed on the Radar. The lower the grade, the greater the chances of the content being a piece of disinformation or containing an error.

Fávero pointed out that Radar does not eliminate the need for human checking, because the automated rules only detect signs that a publication is potentially misleading. In other words, it cannot be said that everything that appears on the Radar is disinformation, as the tool is not infallible.

Cavaleiro recalled that the system is not able to differentiate a joke or irony from serious content. "It is important to note that it tries to map the disinformation ecosystem, it sees the whole.” If a publication falls on Radar, but it seems to be true, probably the linguistic construction, the words, the way it is written, are common in misleading materials. "It means that it has echoes in some content that is probably disinformation and, then, it is also part of that ecosystem. That is how the user gets lost," she said.

She explained that currently, even sites known for publishing disinformation have become “professional” and mixed false content with stories republished from traditional media. These are pages that have come to use some tricks, such as having a staff directory, to appear reliable. "There is no longer that website that you can glance at quickly and say: of course it is disinformation. This is the disinformation ecosystem that we are dealing with today,” Cavaleiro said.

Bruno Favero jornalista

Bruno Fávero, editor do Radar Aos Fatos. Foto: Arquivo Pessoal

As project editor, Fávero highlighted how the tool helps Aos Fatos in the production of reports and checks. It is also a good thermometer to decide whether a rumor should be checked or not – Aos Fatos only checks for disinformation if it is going viral, with considerable impact outside the source bubble. Otherwise, it would end up only contributing to the popularization of that deceptive content.

Radar also serves to identify story ideas, because it automatically prepares a word cloud based on all low-quality material collected. Then, when an unexpected term appears, it may be indicative of a new growing rumor.

Finally, the tool makes it easy to discover the drivers. "Rarely do reporters have time to investigate who were the biggest disseminators of that disinformation,” Fávero said. “Sometimes it is very obvious, but sometimes it is not. And Radar helps a lot with this.”

This story was originally written in Portuguese and was translated by Teresa Mioli.