Why is collecting and analysing data about public procurement so damned difficult? Data scientists explain some common problems

Red flags in banker boxes
Liz David-Barrett
No Comments

originally published on GI ACE

Open data is often lauded as a magic pill for anti-corruption: reveal what’s going on, inform the public, and, presto, government will become more accountable. Oh, and big data just means bigger gains, right?

Not quite. We have written elsewhere about the institutional and political challenges that can hinder the transparency –> accountability transformation. But even the very first stage — collecting the data — is much harder than it seems. Building indicators that can be used for analysis requires a whole series of validation steps.

In our 2016–17 ACE project, we focused on collecting data from development aid donors and lenders rather than from national governments, assuming that this would be a more reliable source since such agencies face a lot of pressure to be transparent, and also have the capacity to collect data. Many national governments came a little later to the open government agenda, and often lower-income countries lack the necessary data infrastructure.

However, even for aid data, our initial efforts to collect data from a range of agencies encountered problems. USAID doesn’t collect data on the contracts it funds if they are spent outside the United States by aid recipients. We were able to collect data from the World Bank, Interamerican Development Bank, and EuropeAid, but even then collecting a full dataset required accessing numerous sources and a long process of cleaning and checking the data.

Where national procurement data is concerned, we have often found that governments make big claims that they are fully transparent and publish everything, but when we came to collect the data, we encountered a range of problems: large amounts of missing data; lack of consistency in how data is published from one year to another; or failure to provide essential information that is necessary for meaningful analysis, such as organisational IDs.

For example, if we have all the call for tenders but cannot easily match them with contract awards, this means we cannot construct key red flags. If we lack codes for suppliers and buyers, we cannot build indicators of supplier and buyer risk.

In our new Red Flags Explainer, we draw on our experience of building and analysing datasets of government procurement over the past ten years to answer some Frequently Asked Questions about our work. Liz David-Barrett, Mihaly Fazekas, Agnes Czibik, Bence Toth, and Isabelle Adam explain some of the challenges and what can be done to fix or work around them.

Liz David-Barrett
Liz David-Barrett
Senior Lecturer in Politics, University of Sussex

Liz David-Barrett leads the Centre for the Study of Corruption’s (CSC) activities in research, teaching, and policy impact. Her research focuses on corruption risks at the interface between politics and business, in public procurement, lobbying and bribery, as well as on private-sector action to prevent corruption. She engages widely with anti-corruption practitioners in governments, the private sector, and NGOs; has written reports on the UK Bribery Act, lobbying and the revolving door, and local government corruption; and has given evidence to parliamentary select committees. David-Barrett previously worked in Croatia and Hungary as a journalist, reporting for The Economist, the Financial Times, the BBC World Service and Business Central Europe.  David-Barrett has a DPhil in Politics from Oxford, an MA in Slavonic and East European Studies from the University of London, and a BA in Philosophy, Politics and Economics (Oxford).


Mihály Fazekas is an assistant professor at the Central European University, School of Public Policy, with a focus on using Big Data methods to understand the quality of government globally. Fazekas also is the scientific director of the Government Transparency Institute, where he promotes the implementation of new measurement instruments of corruption and quality of government using ‘Big Data’. Fazekas’ research and policy interests revolve around corruption, favouritism, private sector collusion, and government spending efficiency. He regularly consults the European Commission, Council of Europe, EBRD, OECD, World Bank, and a range of national governments and NGOs across the globe. Fazekas received his PhD from the University of Cambridge and studied public policy at the Hertie School of Governance (Berlin), economics at the Corvinus University of Budapest, and teaching at the Corvinus University of Budapest.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Related blog posts