How to create a data journalism team: practical tips for bringing programmers and journalists together

This article is part of the book, "Innovative Journalism in Latin America," published by the Knight Center for Journalism in the Americas, with the help of Open Society Foundations' Program on Independent Journalism.

By Fabiola Torres López, Founder and editor of OjoPúblico

In the era of big data, journalism can benefit greatly from using information technology to reinvent methods for searching, analysis, and news coverage.

Five years ago, reporters who looked into databases or sought help from hackers and web coders for news angles were seen as the “nerds” of the newsroom. The current atmosphere is different: the world is now aware of concrete stories of corruption and even global investigative projects like the Panama papers, all thanks to media that increasingly understand the benefits of alliances between reporters and technology. The awarding of the 2017 Pulitzer Prize to the investigation that uncovered the dark side of the offshore industry can be read as the greatest recognition of the rigor and impact of this new way of working.

In Latin and Central America, there is a large gap between technology and journalism, but it’s inspiring that eight journalism and six digital native sites have formed data journalism teams inspired by the experiences of The New York Times, The Guardian, ProPublica and The Los Angeles Times.

I am a co-founder of one of these sites: Ojo Público, a digital media site comprised of six journalists and two developers specializing in data reporting. Our piece, Memoria Robada (Stolen Memory), received the Third Latin American Award for Investigative Jouranlism in 2016. It was a great regional investigation that used big data to provide evidence of the scale of trafficking of cultural goods in Latin America and its status as organized crime.

The data journalism units highlighted have distinct characteristics, in team size, profile, and work dynamics. But, in a short period of time, most have also made shocking revelations and have been recognized by the prestigious Data Journalism Awards, organized by the Global Editors Network (GEN), which identify the most innovative stories supported by data.

These teams are located in Argentina (1), Peru (2), Costa Rica (1), Colombia (2), Brazil (6), Mexico (1) and Chile (1). I have had the privilege of meeting various members and learning about their experiences, which allows me to describe some fundamental traits that may be useful to other journalists and techies who are looking to follow in their footsteps.

1. Size is relative

“One of the keys to the success of a data journalism team is the diversity of its members,” said Ricardo Brom, director of data intelligence from Argentine newspaper La Nación. An electrical engineer by training, Brom left his post as director of technology services of the newspaper in early 2011 to become the brain behind coding at La Nación Data, the data journalism department at this paper, which has become a reference point for this field in Latin America.

The team is made up of a journalist coordinator, with database management skills; a reporter who is an expert in the law of access to infromation; two programmers, a data analyst and a designer in charge of data visualizations. They are dedicated to medium- and long-term special projects, but also collaborate with several reporters from different sections of the newsroom who require their assistance.

The characteristics of the La Nación Data team correspond to the data journalism unit of a large media outlet that has been given a significant role in the paper’s investigations. However, this does not mean that only large teams can produce impactful stories with databases. The experience of small digital media sites like Ojo Público and Convoca in Peru, both winners of the Data Journalism Awards, shows that it's possible to create excellent work. This depends on the extent to which journalists integrate a developer – one with sufficient knowledge of programming languages for database management– into their newsrooms.

2. The developers

In the field of web development, there are professionals and auto didactics with diverse knowledge on computer programming. But a data journalism team needs developers with two types of profiles: the backend, who works on the server and manages the database, and the frontend, who is responsible for the visible part of the platform. If it’s only possible to count on one of them, one must start to find and integrate the backend.

These are the job characteristics of both:

  • The backend developer is in charge of programming different components of the website and management of the database. They know programming languages like Java, PHP, Ruby and Python, as well as database management programs like MySQL, Postgres, SQL Server and MongoDB. The developer must also have knowledge of digital security to prevent attacks on the server.
  • The frontend developer builds the code of all visual components of a site. The developer must also know three programming languages: HTML (Hyper Text Markup Language), which allows the structure of all web platforms; CSS (Cascading Style Sheets), which organizes presentation and appearance (colors, types, font sizes), and JavaScript, which offers interactivity (animations, buttons, windows on the page).

3. The new journalism mentality

The fundamental challenge for reporters of a data journalism team is not to learn to use the most sophisticated digital tools, but to think differently about the issues that should be investigated to build new ways to ask and seek information, to create hypotheses and working methods.

In traditional journalism, the reporter focused on identifying a relevant case. Data journalism looks for patterns, trends and identifiable phenomena. Your job is no longer a solitary activity, instead it involves a team –using the help of technology – that provides correct formulas and relevant sources to then use to question databases quickly and find clues or evidence previously unimaginable.

4. Collaborative spirit and integrated space

All team members must have in-depth knowledge of history or research skills to contribute to the team. Each team member has a clear role, but they also know that does not mean they will be limited in their tasks, but can collaborate with ideas and solutions throughout the reporting process. These may include the best ways to collect or extract information, methods for testing and verifying, and alternatives for presenting information to readers and users in the clearest manner possible.

This philosophy requires the group to interact in the same space. It will be easier for reporters to learn to communicate in the language of programmers to handle big data. In turn, programmers will think like journalists when they create codes to cross or analyze a mountain of information in search of understanding.

Large media companies such as The Guardian and The New York Times have data teams within newsrooms to work closely with editors and reporters from other areas. In small digital media specializing in investigations with big data, such as Ojo Público, newsrooms are designed as data teams.

5. Basic work tools

No matter what type of information the members of the team will access, it will be essential to know how to use a spreadsheet. The basic toolbox also includes a program to clean and standardize data like Open Refine and tools to create simple visualizations like Infogr.am and Tableau to complex visualizations like D3.js, a JavaScript library that allows a variety of interactive graphics.

For projects involving the management of large databases, like the OffshoreLeaks or Panama Papers did, you will need more advanced programs will become like Neo4j (https://neo4j.com/product/), which allows identification between large quantities of data and shows them in graphics of nodes and edges, making reading relationships between diffeerent data points more intuitive. In this case it will be crucial to have input from web developers.

6. The visual component

Often, the findings or news originating in one or more crossed databases need to be told in a language other than text to make them more understandable: fixed or interactive visualizations. Its main purpose is not aesthetic value, but clarity and comprehension.

As described above, a team of data journalists have reporters who know simple tools to build visualizations. The one or more developers should be able to create a code or reuse any visual to come up with innovative ways to tell their stories. Two fundamental books that should be in the group’s library are: Facts are sacred, by Simon Rogers, and The Functional Art, by Alberto Cairo.

7. Ongoing trainings

Staying updated on new available digital tools and the most innovative ways people are using the technology to enhance journalistic research, training and ongoing exploration is part of the team’s daily routine. Following data blogs, such as these from The Guardian, Nación Data and ProPublica, is as necessary as participation in hackathons, online courses and local and international workshops.

8. Integration with the local community

There are several organizations in the world made by journalists and programmers that promote the best use of technology in journalism. Being part of them can foster exchanging knowledge and experiences, meeting new professionals and appreciating the talent and the latest global trends in data journalism. Three organizations stand out for their scope:

  • Hacks/Hackers, with chapters in more than 90 countries
  • Global Editors Network (GEN)which has 1,300 members and organized the Data Journalism Awards
  • Investigative Reporters and Editors, which has promoted the NICAR (National Institute for Computer-Assisted Reporting) conference since 1994. It is an annual summit in the U.S. of experts in computer-assisted reporting

9. A transparent method

One of the traits that readers are most thankful for, and that news teams produce in data journalism, is transparency. The majority of data journalist groups, after findings are published, explain how research was carried out, what models or statistical criteria were applied and what limitations exist in the story. In addition, some groups release the databases they used in their platforms on GitHub to be reused by other journalists or people interested in exploring the topic.

We must also remember that one of the great advantages of working with programmers is the automation of several processes, such as extracting public data from the State’s websites to then build databases that were previously only located in hundreds or thousands of documents in PDF or JPG format. After the team is finished using them, the philosophy is to release all information of public interest.

10. Original focus

The way that data journalism teams work allows the design of investigations on diverse themes with original and innovative focuses, starting from how we show something is true, to the narratives for presenting the findings.

For example, how do we explain the impact of private corporate policies on citizens’ health? With this platform, you are able to know if your doctor or preferred clinic has an oustanding malpractice complaint, and who is behind it. How do we explain the impact of trafficking of cultural heritage items in the region? “Memoria Robada” (Stolen Memory) tracks 2 million data points and combines research, a database of stolen objects around the world and available official documents. We are reaching a point where we might be able to unite rigorous methodologies (journalism and coding) to expand the boundaries of journalism.

Other guides in the series include:

Note from the editor: This story was originally published by the Knight Center’s blog Journalism in the Americas, the predecessor of LatAm Journalism Review.