texas-moody

Data scientists in Brazil working on the country’s first robot-journalist to report on congressional bills

Bills making their way through Brazil’s Chamber of Deputies will be closely followed by a new kind of beat reporter: a news-producing robot, the first of its kind in the country.

The bot will automatically produce short, objective notes using information from Medidor de Poder (Power Meter), a database that gathers information on bills, politicians and candidates. Since 2016, Operação Serenata de Amor, the group behind the project, has used artificial intelligence to monitor spending by federal deputies. It aims to launch the news-producing robot in the second half of this year ahead of the October general elections.

According to Yaso Cordova, director of Operação Serenata de Amor and fellow of the Digital Kennedy School, the bot will produce informative and objective notes in an agile and automatic way.

"We want to do it very carefully to give relevance to what we want to write and not attribute any kind of value to the bills at hand," he told the Knight Center. "We do not want to assign values that are not transparent to the voter. Opinion is for humans only. The idea is to let people draw their own conclusions, which is what we already do with Rosie," he concluded, referring to Operação Serenata de Amor’s bot that reports suspicious spending in the Chamber of Deputies through Twitter.

As mentioned, the robot-journalist will be able to write short articles about bills passing through the Chamber of Deputies using information from the Power Meter database. The Serenata de Amor team wants to gather information such as the wealth of politicians, the donations received by campaigns, bills already proposed, and expenses for work and district projects.

In addition to providing data on representatives that have already been elected, the database, which is built with public data provided by the Supreme Electoral Tribunal (TSE), will also aggregate information on candidates for other positions. Vacancies for state and federal deputy, senator, governor and president will be up for a vote in October.

To collect and standardize all the necessary data, Serenata de Amor will count on partner organizations, such as the project Congresso em Números (Congress in Numbers). "Structuring all these data in the same way is often what holds up journalists' work," Cordova said. "We have a lot of available data, we just have to put it all together."

The idea is for this database to be a useful tool for other journalists to find stories and establish relationships between candidates or representatives who have already been elected. "It would be the journalists' job to navigate the database and find relationships. It's very similar to the idea of Panama Papers," Irio Musskopf, data scientist and creator of Operação Serenata de Amor, told the Knight Center.

Challenges in Portuguese

Perhaps the greatest challenge in the development of the robot-journalist is the lack of technology available in Portuguese. Much of the knowledge in this field is in English –bots that write articles are already used by outlets like The Washington Post, which has a tool called Heliograf.

Overcoming this difficulty is the job of data scientist Ana Schwendler, a specialist in natural language processing. She is also working on another pioneering project in Brazil, the fact-checking conversational robot Fátima, developed by the site Aos Fatos in partnership with Facebook.

"Natural language processing is a way to make machines understand what humans write," Schwendler explained to the Knight Center. "We need to collect more evaluative information concerning how the [textual] structure works in Portuguese, and how to make quality textual production within the rules of Portuguese, that’s something that does not exist, yet."

The expectation is that the robot will learn to make better and better texts over time through the application of machine learning concepts and, more specifically, deep learning. Schwendler explains that in deep learning, artificial neural networks inspired by the structure of the human brain use algorithms to acquire knowledge through experience. The robot will be trained by following texts produced by humans.

"We have examples of texts that we want and it will learn from that," Schwendler explained. "Over time, the better it gets, it will learn what is good and can learn from user response….We always want feedback from outsiders."

Institutionalization and other projects

Operação Serenata de Amor began in 2016 with support from a crowdfunding campaign. The name, taken from a candy brand, is inspired by the Toblerone Affair, in which a Swedish politician had to withdraw her candidacy for prime minister because she purchased chocolate with her government credit card. The group said it wants to “find corruption in small expenses, but in large volume.” Additionally, the site’s name plays with the titles of anti-corruption operations carried out by the Brazilian Federal Police.

Its project Rosie, artificial intelligence whose name was inspired by the robot on the cartoon The Jetsons, monitors repayments to federal deputies done through the Quota for Exercise of Parliamentary Activity. This fund pays for food, lodging and other expenses. More than 9,000 suspicious reimbursements have already been identified.

Before that, the group was connected to data company Data Science Brigade, but this year Operação Serenata de Amor will become an NGO under the umbrella of Open Knowledge Brasil. "For us, it will be very important to have an institution supporting us to have new partnerships," said Yaso Cordova.

In this election year, the Serenata team is working on three new projects, including the Power Meter and new bot. One is the development of a new interface to facilitate the verification of suspicious refunds reported by Rosie. "In this way, we would fulfill our promise to facilitate citizen participation in public spending," said Irio Musskopf.

The other project under development seeks to address a problem that’s received a lot of attention this election year: the proliferation of fake news on the internet. The group wants to develop a browser extension to identify false content online. The target audience is comprised of teenagers and seniors, who in general are less media-savvy, according to Musskopf.

Note from the editor: This story was originally published by the Knight Center’s blog Journalism in the Americas, the predecessor of LatAm Journalism Review.