No categories assigned

Data Governance

Data Governance is the collection of practices and processes which help to ensure norms and rules of the management of information assets within an organization or state. This entails principles for the usage, collection, storage and distribution of data [11 as cited in 5]. The overall goal of Data Governance is to improve the data quality and thus to profit from data-driven innovation and to protect against risks in that matter [7]. In short, the Data Governance aims to maximize the use of data assets [12].



Jump to: Home, Digital Inclusion, Security, Safety, Stability & Resilience

Generally

With the emergence and development of information technology the amount of data that is collected, stored and used increased. Nowadays, big data is not that big anymore, but rather the standard measure of data and therefore, there is a need to manage them [5]. This is ensured by a Data Governance program. Such a program can focus on one or more different aspects of Data Governance such as standards, data quality, security, architecture, data warehouse and business intelligence, management support, data access etc. [8, 9]. The main objective of Data Governance, however, is to standardize data definitions [8]. This is true for all Data Governance programs regardless of the focus. The advantage of this standardization is that the data will be consistent and trustworthy.

The determination of who is accountable for the realization of data assets is crucial for the development and execution of a program [9]. Depending on the focus and domains of Data Governance, the needed roles can vary ranging from so-called data stewards to data security officers [3, 9, 16]. Therefore, Khatri & Carol [9 p. 149] define Data Governance as “who holds the decision rights and is held accountable for an organization’s decision-making about its data assets”.  Furthermore, the respective definition of the term ‘Data Governance’ is linked to the level under investigation within research literature: Macro level and micro level.

Macro Level

The macro level of Data Governance refers to the broadest level of governance. It concerns to the government of data beyond borders, such as international, national or sectoral borders [3]. Most commonly the macro level is understood as the management of an international data flow and therefore, it is also called international Data Governance [2]. Thus, Data Governance on the macro level consists of transnational standards and agreements regarding the collection, storage, processing, dissemination, i.e. management and use of data.

In particular, the consideration of the individuals´ digital sovereignty should be at the center of considerations. This refers to the rights of the individual to his own data, data privacy rules, civil rights and respective moral duties of organizations, governments and other stakeholders. Especially transnational organizations, which rely on a data-driven business model, can immensely increase the trust of people by working together with governments and other stakeholders to improve data management and use.

The Data Governance programs are however not only restricting misuse of personal data, they can also encourage and oblige companies to share their data with each other to increase their innovative capacities [7].

Micro Level

On the micro-level Data Governance  concerns the data management within organizations, e.g. companies [2]. Data management and data architectures are subject of a data management plan within companies. In general, Data Governance “is the use of authority combined with policy to ensure the proper management of information assets” [4 p. 11]. Companies can gain the trust of their customers and enhance their services due to improved data organization and protection through the utilization of a Data Governance program.

Disciplinary Views

Information Science

“Whereas information governance focuses squarely on the trade-off between value and risk, Data Governance tends to look more to accountability issues, structural responsibilities, and decision-making capacities, in response to external regulatory pressures as well as organizational goals.” [3 p. 1421] “Taking information assets (or data) as ‘facts having value or potential value that are documented’ [9 p. 148], Khatri and Brown define Data Governance as ‘who holds the decision rights and is held accountable for an organization’s decision making about its data assets’ [9 p. 149]. Data Governance, therefore, entails initial identification of who is responsible and accountable for data assets, and the structural roles and responsibilities or loci of accountability of those who realize value from them”. [3 p. 1422-1423]

Political Science

In the field of political science, there are various definitions of governance, but all of them refer to the need for a category of transnational policy-making that encompasses the need for international norms. In times of limited statehood and existing power asymmetries, it is of central importance for democracies to ensure that e.g. (transnational) companies respect and protect the sovereignty of citizens, stateless persons and refugees as well as the sovereignty of states. Thus, Data Governance is presented as a concept of cooperation between several stakeholders to solve this dilemma according to data usage. Nevertheless, the effectiveness of the participation of several stakeholders in relation to political effectiveness is discussed critically by governance researchers. [6]

Relationship to Internet Governance

Data Governance is a subcategory of internet governance and, in contrast to the broad concept of internet governance, refers specifically to standards and agreements regarding the collection, storage, processing, dissemination, i.e. the management and use of data. A working group established after a UN-initiated World Summit on the Information Society (WSIS) proposed the following definition of Internet governance as part of its June 2005 report: “Internet governance is the development and application by governments, the private sector and civil society, in their respective roles, of shared principles, norms, rules, decision-making procedures, and programmes that shape the evolution and use of the Internet” [15]. Often, people confuse data management and Data Governance. Even though the two terms are connected to each other [1] there is a marginal difference: While Data Governance decides on regulations and rules that must be followed for and controls the management of data, data management is the execution and implementation of these decisions [13, 14].

Risks and Values of Data Work

Following the work of Forster et al. [3], Data Governance programs must be mindful of potential perils that arise when working with data. A trade-off must be negotiated between the value of the data work (“the turning of data into action” [3 p. 1414]) and the encountered risks. Most of all, the privacy of individuals should be protected at all costs. Privacy concerns are one of the biggest problems of analyzing data. Other concerns refer to the (big) data analyses themselves: There are doubts in terms of reliability, integrity and validity of the analyses. Furthermore, it takes some effort to organize this enormous amount of data. Thus, it can be difficult to convince the investors that the benefits of the (big) data analysis outweigh the risks. During data collection, the quality of the collected data can suffer, which can lead to a loss of findings and measurement errors. An algorithmic bias is also possible. Finally, issues of data ownership, governance and data exchange can appear.

Yet, the merits of data work are as diverse as the risks: Through data work the data streams are getting organized and unified to data models. The analyses can lead to new insights, developments or opportunities. Automated, new or improved services can be established through the analysis of behaviors and detection of patterns and communities. Thereby predictions and recommendations can be made. Based on this newly found data, an actor can make informed decisions that would otherwise not be possible. However, the decision-making-process can also be automated thanks to data work and artificial intelligence. All these results have an economic value.

The goal of a Data Governance program is to decide on a trade-off between the risks and values. Generally speaking improvement of data quality leads to an improvement of processes and services which, at the end, will lead to higher profits.

Frameworks and Best Practices

There is no universal Data Governance program that is equally suitable for every actor or form of organization. Each actor must work out their individual program that suits their needs and develops over time. However, there are several frameworks, principles and practices provided by literature, which can be used as guidance while developing a Data Governance strategy. A few are listed below:

Khatri & Brown [9] developed a framework that is supposed to aid the development of a Data Governance program. Based on their definition of Data Governance (who is responsible for the decisions?), they propose a division of Data Governance in five domains: Data principles, data quality, metadata, data access and data lifecycles. Each domain can have different decision-makers that are held accountable.

Thomas [8] suggests a decision on one of six focus areas of Data Governance. Depending on the focus there are different problems to tackle. Furthermore, she stresses that the starting point for a program is not designing the program itself but developing a value statement, a detailed roadmap and a (funding) plan for it. This is important, because it is of interest to the stakeholders. Only after that, the deployment of the program starts, so that the data can be governed. Once the Data Governance program is in action, it must be monitored, measured and reports must be composed.  In her framework, Thomas [8] lists ten components a Data Governance program should have. This includes six components about rules and rules of engagement, three components about people and organizational bodies and one component about processes.

All in all, this leads to the following possible and recommended procedure of developing a Data Governance program [8, 16]: The starting point is to think about the goals that are to be pursued through the development and compliance of a Data Governance program. The key question is: What are the benefits and what are the reasons for developing the program? Knowing the answers to these questions avoids additional work. To think about the merits of the project, it is important to also know the current state of developments: What is going well and what needs to be improved or evolved? The findings should be summarized in a value statement and used for a roadmap draft. Both are useful and important to share with the stakeholders and convince them of the project. When they approve, a (budget) plan and the design of the program are the next steps on the Data Governance life circle. Once all the planning is done, the program can be applied. As long as the program is running it should be monitored. Especially when adjustments must be made or a new program must be developed, recorded information like the measured success or intermediate results can come in handy.

However, it would be inefficient to start a Data Governance program from a blank page. If possible, processes from other actor’s successful programs should be adapted. Of course, this is only possible if the processes match the prior determined goals [16]. Nonetheless, the experiences of other organization are not to be underestimated to avoid mistakes [8, 17].

Moreover, Thompson, Ravindran & Nicosia [17] add more principles regarding Data Governance programs relating to people, standards and compliances: Concerning people, it is of importance to have an effective leadership which is responsible for the adherence of the program and serve as contact point to people involved (e. g. data stewards and business leaders). Unambiguous communication is a key skill for this person. The composed standards also have to be unambiguous. The standards have to be communicated to the users of the program and the adherence of them must be examined regularly. This is in consensus with Thompson’s, Ranvindran’s & Nicosia’s [17] last broad principle ‘compliances’. They state, in order to achieve compliances the Data Governance program should provide means of monitoring and routine checks to ensure correct behavior.

Furthermore, according to the report of the British Council and the Royal Society [1], the main principle that should be kept in mind while evaluating or developing a new governance program is, that the systems should promote human flourishing. This means that it should not be forgotten, that the human is the center of the system and that the data is used for human beings. Connected to that principle is, the rights and interests of individuals and collectives that must be protected. Trade-offs that are being made must be transparent, accountable and inclusive. Existing (democratic) Governances should be improved and refined. Finally, good practices should be based upon the insight gained from past successes or failures.

Tallon [11] adds to these findings three categories of practices: Structural, operational and relational practices. They should be followed when an organization executes a Data Governance program.

Data Governance Mishaps

The goal of Data Governance is to avoid mishaps and find solutions if they occur. Heeding the best practices stated above, mistakes can be averted as it is the case in the following examples:

Losing track of the complex relationships of different systems and data that are not recorded – in other words, a lack of data awareness – is problematic. DeStefano, Tao, & Gai [22] name the company Acme Global Corp. as an example. They had to change the ID of their employees in the human resources system. However, a lot of other systems in the company relied on that system, accessed the data from there, reused it in their own processing, forwarded it to other systems or it was an access point for other systems. Other intern procedures made it even more complicated. A change of the IDs had a huge impact on the data flow and identifying the data flow and their systems was connected to a huge amount of work.

Brunelli [23] furthermore lists the company Apple as an example of bad data management. Their iPhones and iPads had unencrypted files that stored the users’ location even if the users disabled that function [25]. Of course, Apple is not the only company that has to defend themselves again privacy issues accusations: One of the most well-known scandals concerned the privacy violations of Facebook and Cambridge Analytica in 2018 [24].

Current topics

Artificial Intelligence

There is a strong relationship between Data Governance and Artificial Intelligence (AI): AI needs Data Governance to produce the desired results. It is based on machine learning, which in turn, is based on data. AI will always produce a result and find patterns of some sort. However, the utility of those results, depends strongly on other factors, such as the data input. Following the popular statement “garbage in, garbage out”, a good quality of the data must be guaranteed. This is one of the purposes of Data Governance, especially since it already treats data as an asset. It is crucial that the program thereby follows principles that do not allow discrimination of any kind [18].


Data reuse and sharing

Data reuse and exchange between companies can promote innovation and lead to higher prosperity [21]. Yet, organization on the macro and micro level fear competitive disadvantages and data sharing is complicated due to a lack of an appropriate infrastructure [7]. Legislative means to counteract the problems are still being discussed. Furthermore, the project “Data Governance” of the Alexander von Humboldt Institute for Internet and Society strives to find solutions to overcome obstacles of data exchange [21]. They also developed four ideal types of Data Governance: Single Source, Data Clearinghouse, Data Pool and the Decentralized Model. As the name implies, the Single Source type refers to a single data holder distributing the data to all data users. Thus, the data is stored at one place only. Data Clearinghouse involves several data holders and an intermediary instance that imparts the data from one data holder to the specific data user. Data Pool also includes several data holders and an intermediary instance. However, this intermediary instance serves as a pool of data to which all the data holders contribute and benefit through data reuse. The information from that data pool is then redirected to the respective data users. Lastly, the decentralized model concerns a mix of some trades of the other models. Data users access data via a data holder. A data user can at the same time be a data holder that receives information from another data holder and provides other users with information.


Cross-Borders Data Governance

With the emergence of AI, the need for Data Governance programs across borders becomes more urgent in order to ensure data security and guidance of data sharing. At the same time the open commerce should not suffer under restrictions in fear of personal data exchange. Yet, most countries, except for Japan, have been hesitant to work on a program due to data privacy concerns [19, 20] and differing legal frameworks. Cross-Borders Data Governance is about finding solutions to the obstacles of international data exchange to make it possible. A special focus of it is on countries such as developing countries and the BRICS.


Data Protection and Personal Information

It is apparent that data protection and the security of personal information are one if not the main concern of Data Governance. Almost in every part, focus and domain of Data Governance, they have to be considered. The big question is how to maximize the outcome of data work and Data Governance without violating privacy rights. There can also be focuses of the research on vulnerable groups (e. g. children) and rules and regulations in specific countries.

Workshops at IGF2019

Tuesday, Nov 26

Crossborder Data: connecting SMEs in the global supply chain

Solutions for law enforcement to access data across borders


Wednesday, Nov 27

Equitable Data Governance that empowers the public

Children's privacy and data protection in digital contexts

Public Interest Data: Where Are We? To Do What?

A universal data protection framework? How to make it work?

Data-Driven Democracy: Ensuring Values in the Internet Age

Value and Regulation of Personal Data in the BRICS

Data Governance by AI: Putting Human Rights at Risk?


Thursday, Nov 28

Making global Data Governance work for developing countries

Different Parties' Role in PI Protection: AP's Practices

Human-centred Design and Open Data: how to improve AI

Splinternet: What Happens if "Network Sovereignty" Prevails

Beyond Ethics Councils: How to really do AI governance

Enhancing Partnership on Big data for SDGs

Human-centric Digital Identities

Data Governance for Smarter City Mobility

AI Readiness for the SDGs

Rule of Law as a key concept in the digital ecosystem

Unpacking Digital Trade Impacts: Calling all Stakeholders


Friday, Nov 29

Assessing the role of algorithms in electoral processes

A tutorial on public policy essentials of Data Governance


External Links

[1] Data management and use: Governance in the 21st century. A joint report by the British Council and the Royal Society. https://royalsociety.org/-/media/policy/projects/data-governance/data-management-governance.pdf, June 2017.

[2] Data Governance, Wikipedia, https://en.wikipedia.org/wiki/Data_governance

[3] Foster, J., McLeod, J., Nolin, J., & Greifeneder, E. (2018). Data work in context: Value, risks, and governance. Journal of the Association for Information Science and Technology, 24(2), 51. https://doi.org/10.1002/asi.24105 , pp. 1421-1423.

[4] Ladley, J. (2012) Data Governance. How to Design, Deploy, and Sustain an Effective Data Governance Program.

[5] Cheong, L. K., & Chang, V. (2007) The Need for Data Governance: A Case Study. 18th Australasian Conference on Information System The Need for Data Governance 5-7 Dec 2007, Toowoomba

[6] Newig J. (2011) Partizipation und neue Formen der Governance. In: Groß M. (eds) Handbuch Umweltsoziologie. VS Verlag für Sozialwissenschaften.

[7] von Grafenstein, M., Wernick, A., & Olk, C. (2019). Data Governance: Enhancing Innovation and Protecting Against Its Risks. Intereconomics, 54, 228-232. doi: 10.1007/s10272-019-0829-9

[8] Thomas, G. (2006). The DGI Data Governance Framework. http://www.datagovernance.com/wp-content/uploads/2014/11/dgi_framework.pdf, 29.10.2019

[9] Khatri, V.K., & Brown, C.V. (2010). Designing Data Governance. Communications of the ACM, 53(1), 148–152. doi: 10.1145/1629175.1629210

[10] Cohen, R. (2006). BI Strategy: What's in a Name? Data Governance Roles, Responsibilities and Results Factors. DM Review.

[11] Tallon, P. P. (2013). Corporate governance of big data: Perspectives on value, risk, and cost. Computer, 46(6), 32-38. doi: 10.1109/MC.2013.155

[12] Brüning, A., Gluchowski, P., & Kaiser, A. (2017). Data Governance– Einordnung, Konzepte und aktuelle Herausforderungen. Chemnitz Economic Papers, 015. https://www.econstor.eu/bitstream/10419/170675/1/CEP015.pdf, 31.10.2019

[13] Al-Ruithe, M., Benkhelifa, E., & Hameed, K. (2018). A systematic literature review of data governance and cloud data governance. doi: 10.1007/s00779-017-1104-3

[14] Knight, M. (2017). Data Management vs. Data Governance: Improving Organizational Data Strategy. https://www.dataversity.net/data-management-vs-data-governance-improving-organizational-data-strategy/, 31.10.2019

[15] Tunis Agenda for the Information Society. World summit on the information society. http://www.itu.int/net/wsis/docs2/tunis/off/6rev1.html

[16] Data Governance Definition, Challenges & Best Practices. Bi-Survey.com: https://bi-survey.com/data-governance

[17] Thompson, N., Ravindran, R., & Nicosia, S. (2015). Government data does not mean data governance: Lessons learned from a public sector application audit. Government information quarterly, 32(3), 316-322. doi: 10.1016/j.giq.2015.05.001

[18] Gasser, U., & Almeida, V. A. F. (2017). A layered model for AI governance. IEEE Internet Computing, 21(6), 58-62. doi: 10.1109/MIC.2017.4180835

[19] Koshino, Yuka (2019). Resolved: Japan Could Lead Global Efforts on Data Governance. Debating Japan, 2(6). Retrieved from: https://www.csis.org/analysis/resolved-japan-could-lead-global-efforts-data-governance

[20] Finucan, Logan (2019). Japan, the unlikely hero of global data governance. Retrieved from: https://venturebeat.com/2019/04/07/japan-the-unlikely-hero-of-global-data-governance/

[21] Alexander von Humboldt Institut für Internet und Gesellschaft (2018). Data Governance. Retrieved from: https://www.hiig.de/project/data-governance/

[22] DeStefano, R. J., Tao, L., & Gai, K. (2016). Improving data governance in large organizations through ontology and linked data. In 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud) (pp. 279-284). IEEE. doi: 10.1109/CSCloud.2016.47

[23] Brunelli, M. (2012). The top five data management mishaps of 2011. Retrieved from: https://searchdatamanagement.techtarget.com/news/2240113813/The-top-five-data-management-mishaps-of-2011

[24] Facebook-Cambridge Analystica data scandal, Wikipedia, https://en.wikipedia.org/wiki/Facebook%E2%80%93Cambridge_Analytica_data_scandal

[25] Helft, M. (2011). Jobs Says Apple Made Mistakes With iPhone Data. Retrieved from: https://www.nytimes.com/2011/04/28/technology/28apple.html