Determinants of Data Quality in an Electric Utility:
A Delphi Study
I. Mulasastra1, K. Srimoung2
1Department of Computer Engineering, 2Provincial Electricity Authority of Thailand
Faculty of Engineering, Kasetsart University Jatujak,
Bangkok, Thailand 10900 Bangkok, Thailand 10900
int@ku.ac.th kallaya.s@pea.co.th
Abstract
This study investigates the determinants of data quality in a geographical information system (GIS) by conducting a Delphi study in an electricity utility in Thailand. A literature review was conducted to preliminarily investigate determinants of GIS data quality. The Delphi technique was then, used to gather convergent opinions concerning data quality determinants from engineers and technicians who are experts in GIS data production. The results revealed new data quality determinants in an electric utility context; these factors are supporting equipment, geographical factors, and human resource management. The other seven factors are common with previous studies in typical information system contexts. These are top management awareness, system quality, system usefulness, data quality control, training, data source quality, system support. These findings suggest that this electric utility should develop strategic data quality management and plans to ensure high-quality data across the organization.
Keywords- data quality management, GIS data, Delphi method, power distribution system, electricity utility
1. Introduction
Studies conducted on empirically examined determinants of data quality in typical information systems (record based data) show that both organizational and technological factors influence the quality of data. These factors are top management commitment/support, data quality control, data quality improvement, data quality policies, incentives, information system supports, useful systems, and education and training [1-5]. However, few studies have empirically examined the determinants of data quality, particularly for engineering assets which are mostly stored in geographic information systems (GISs).
Thus, this study aims to find the determinants of GIS data quality in a utility industry. By applying a Delphi technique, we conducted a research on a major electric utility, in charge of a power distribution network covering large areas of Thailand. This utility employs GISs to store geographical and distribution network maps as well as engineering asset data (e.g., transformers, electric breakers, and poles).
The rest of this paper is organized as follows: We first discuss the issues of GIS data quality in engineering asset management and then describe the critical factors influencing GIS data quality from the literature review. Next, the research methodology is presented. The findings are then discussed and summarized. Finally, the paper concludes with theoretical and managerial implications and directions for further research.
2. Background
2.1. Use of GISs in Utility Industries
Utility companies need to manage engineering assets (e.g., facilities and equipment) “to optimize the lifecycle value of the physical assets by minimizing the long-term cost of owning, operating, and maintaining, and replacing the asset, while ensuring the required level of reliable and uninterrupted delivery of service” [3]. For supporting engineering asset management (EAM), GISs have been widely used in utility industries. “A GIS integrates hardware, software, and data for capturing, managing, analyzing, and displaying all forms of geographically referenced information” [6].
In electric utilities, electric networks can be modeled by using a GIS in order to manage and map the location of equipment in the network. There are two types of data in a GIS for power distribution networks. One is map data, which is spatial data depicting roads and buildings, and distribution lines installed with electric equipment. The other is attribute data, which is non-spatial data describing physical characteristics of equipment, such as attributes of transformers and circuit breakers.
2.2. Impact of GIS Data Quality on Utility Industries
With the high quality of GIS data, a utility organization can produce useful information for decision-making, maintaining the health of the assets, reducing risks to the organization and the system, optimizing operations, better service, and cost efficiency [12]. However, low-quality data impacts the performance at all levels of the organization and substantially increases many organizations’ operating costs [13-15].
Quality data are critical for utility industries worldwide. For example, according to the Federal Highway Authority (FHWA) of the U.S., missing or inaccurate information about the location of underground utilities is a leading cause of highway construction delays [16]. In addition, it costs the U.S. economy at least $50 billion annually plus 1,906 injuries and 421 deaths over the past 20 years [16]. In Brazil, electric utilities are required by regulation to supply precise geographic information about the location of cables, transformers and customer metering points so as to be the world leader of reliable digital models in network infrastructure [17].
2.3. Determinants of Data Quality
Several factors relating to people, technology, and management could impact data quality during system development, system implementation, and system operations [1]. Case studies and data-quality expert recommendations suggest several data quality management practices that influence data quality, such as top management commitment/support, data quality control, data quality improvement, data quality policies, incentives, and education and training [1-5].
In the EAM context, Lin, et al. [3] propose a data quality framework that incorporates these key factors: system integration, training, management support, employee relations, and organizational culture. This model has been validated by conducting a case study of two large Australian engineering firms. A quantitative study by Mulasastra and Krairit [1] found that data quality control, information system support, and system usefulness significantly influence data quality in hospital information system operations in the Thai public healthcare context.
A case study by Orwattana [18] that investigated factors affecting quality of GIS data in the same organization as our study. Through in-depth interviews with management personnel from two provincial branch offices and two regional branch offices, he found five main factors affecting GIS data quality including 1) insufficient personnel for GIS data entry 2) insufficient software licenses to meet the need of system access 3) inadequate training of GIS data entry 4) in efficient GIS hardware devices causing delayed data processing 5) inefficient workflow between departments, causing huge backlog of work. However, the results were not standardized for testing with GIS data collectors in various branch offices.
3. Methods
There is a lack of existing empirical research conducting on finding data quality determinants in the GIS context. This study aimed to find the determinants related to GIS data collectors working in a utility industry. We approached this problem by applying a Delphi survey technique to aggregate expert opinions in GIS data collection.
3.1. Research Design
We conducted a study on an electric utility in Thailand. The justification that we studied only this electric utility is that this company is Thailand’s largest electric utility which operates a power distribution network covering almost all areas of the country. The company’s headquarters are located in Bangkok, with more than 200 branch offices located in most provinces and cities. All offices use a GIS to support their main operations. By conducting a single case study, we can perform careful study and deeply understand of the subject [20].
There are three main methods that can be used to collect data from experts: group discussion, expert interview, and survey [19]. This study employed a Delphi survey because the same questionnaire can be filled out several times from several experts. In the subsequent rounds, the respondents can clarify or change their views after receiving feedback on the earlier answers [19]. By using Delphi techniques, this study could receive consensus about data quality determinants from the GIS data collection experts.
Both authors of this paper used to work in the same project relating to GIS data cleansing as a part of industry-university collaboration, hence we used this long established relationship to ask permission from the company to conduct this research. Other utility companies in Thailand either have different operations or operate in different areas with a greater population density.
3.2. Data Collection
The panelists consisted of 26 engineers and technicians who had backgrounds and experiences related to GIS data quality issues [21]. They were all GIS data observers and data entry operators working in different branch offices having different geographical factors that typically play a major role in collecting data of engineering assets. The number of panelists in this study is appropriate; according to the suggestion of Delbecq, Van de Ven, and Gustafson (1975), “ten to fifteen subjects could be sufficient if the background of the Delphi subjects is homogeneous” as cited in [21]. The details of each method are explained as follows:
o Delphi processes
All the processes were administered to the panelists in one place, which was completed in three days during a training session at a headquarters’ office. All panelists were explained about the objectives of the study and encouraged to freely express their opinion based on their true perceptions concerning the study issues.
Round 1: We begin with the traditional Delphi process by distributing hard copies of the open-ended questionnaire to the panelists with a list of data quality determinants that we discovered from the literature review. The panelists were asked to add more statements concerning the factors that they thought might impact GIS data quality. We then analyzed the results by categorizing the data quality determinants to form a set of well-structured questionnaires for the next round survey.
Round 2: We asked the panelists to rate a set of statements relating to data quality factors on a 7-point scale (1 = lowest impact to 7 = highest impact). The panelists were also allowed to add more statements relating to factors influencing GIS data quality. The results were used to establish preliminary priorities among the statements (Ludwing,1994 as cited in [21] ).
Round 3: We adjusted the set of questionnaires by adding new factors discovered from the second round. In this round, we asked the panelists to rate this set of statements again and they were allowed to change their judgments. In addition, by calculating the results in round 2, we also presented the statistical values measuring the central tendency (mean, mode, minimum, and maximum) as supportive information for making decisions in this round. According to Ludwig (1994) as cited in [21], the use of mode is also suitable when reporting data in the Delphi process.
3.3. Data Analysis
o Consensus
In Delphi studies, consensus measurements are varied. Some studies have used subjective criteria or descriptive statistics to indicate consensus and to quantify its degree [21, 22]. This study used the F-test to evaluate consensus of responses. If there were no change in group consensus from round n to round n+1, then we would have stopped the processes after round n+1.
o Degree of Consensus
We also adapted the concept of Kapoor’s majority agreement as cited in [7] for quantification of the level of consensus, not for providing a cut-off rate. In our study, we defined ‘Average Percentage of High Agreement’ (APHA) as the total number of responses, indicating ‘high agreement’ with statements, divided by the total opinions expressed. In our case, ‘high agreement’ includes the responses rated from 4 to 7 in a 7-point rating scale (1 = lowest to 7 = highest). Unlike, Kapoor’s majority agreement, we do not include the number of disagreements in the APHA. ‘Total opinions expressedi’ is the total number of opinions on each statement.
In the APHA formula, we calculate the sum of ‘High Agreementi’, as the number of responses indicating high agreement on statement i, for all statements. For comparison purposes, we also calculate a percentage of high agreements (PHA) for each statement i as follows:
After each round, we calculated the APHA and PHA in order to present the degree of consensus. By using these figures, we can compare the relative importance of data quality determinants indicated by the GIS data collectors.
4. Results
Round 1: We received some feedback from the open-ended questionnaire. We analyzed the results and formed a set of well-structured questionnaires consisting of 34 statements concerning data quality determinants for the next survey round.
Round 2: The APHA in the second round was calculated from 602 majority agreements (rating scores above 3) divided by the 747 opinions, which equates to an APHA rate of 81%. 16 out of 34 statements were rated above the APHA. Three statements were added by panelists.
Round 3: By conducting one-sided F tests on the responses of each statement in rounds 2 and 3, we found that the variation of group responses in the third round was equal or less than the variation of group responses in the second round survey (p=0.05) for most statements. There were significant changes in responses from round 2 to round 3 only for the three statements that were just added in round 3 (in system quality and human resource management categories).
Hence, we terminated these Delphi survey processes after round 3. Three iterations are sufficient to collect the needed information and to reach a consensus in most cases [21]. The APHA for the third round is 86%, greater than the second round APHA. Twenty out of 37 statements were rated above the APHA. The PHA values of all statements are above 66%, reflecting quite a high degree of agreement (greater than 50%) that all the determinants are critical for data quality among panelists. We conceptually categorized the statements into ten factors and calculated the mode values of each factor by counting the frequency of statement ratings in each category. The results are shown in Table 2.
Table 2. Determinants of GIS Data Quality
5. Discussion
As shown in Table 2, we can see that almost all categories of data quality determinants have critical issues rated with very high agreements among the panelists, having PHA values greater than the APHA (86%). Only supporting equipment and top management awareness of the importance of GIS data quality have all statements with the PHA rate below the APHA. However, this does not mean that supporting equipment or top management awareness is not important. In fact, the mode value of top management awareness is 7, the highest value. In addition, the geographical factor concerning difficulty in accessing data of electronic devices is considered very significant for GIS data quality, hence it is essential for the organization to provide effective and adequate tools for supporting data collection.
In addition, this study revealed new data quality determinants which have never been found in studies conducting in other information system contexts. They are geographical factors, supporting equipment, and human resource management (with specific GIS issues).
Geographical factors refer to the weather and geographic locations of equipment in power distribution systems which affect data collection and GPS device performance. In addition, equipment installation locations, such as on high poles, create difficulty in data observation. As a result, the quality of GIS data could be incomplete, not up-to-date, and inaccurate.
In a GIS context, supporting equipment is important for data observers who need to survey engineering assets in power distribution lines; such necessary equipment includes GPS devices and vehicles for surveying. In addition, the availability and effectiveness of computer facilities are important for supporting GIS data entry. The results conform to another study in human performance research which states that environmental supports influence performance improvement and that employees need the necessary tools to perform their jobs effectively [8]. In particular, the GIS environment is more complex than typical business information systems; sources of GIS data are usually disparate and difficult to obtain [11].
Managing human resources is imperative in the utility industry, since using GISs in both observing real world data and operating data entry, is comparatively more complicated than using other types of information systems. In an electric utility, technicians and engineers usually work in the fields or at construction sites; they are less likely to accept new technology with high complexity [9]. Hence, appropriate job assignment according to employee abilities is essential as is the allocation of adequate personnel for each task. Also, creating specific GIS data entry jobs is another choice that can increase the effectiveness of GIS data entry. In addition, relocating GIS personnel should be carefully managed; new staff needs sufficient training and support in using a GIS.
The other factors (top management awareness, system quality, system usefulness, data quality control, data source quality, training, and system support) have been indicated in previous studies as being important for maintaining high quality of data in other contexts [1-5]. However, some issues in each category of data quality determinants are different from those in the previous studies.
System quality was found to be very important in this GIS context as well, especially in the effectiveness of computer servers and computer networks. In general, processing geospatial data needs high performance computational infrastructure. System accessibility (software license and available usage time) was highly regarded as essential data quality determinants. If a GIS system is not available for users as required, data will not be entered in time nor completely.
The quality of GIS data source (data after implementation) was found essential for subsequent use of a GIS system and for maintaining quality of data after operating the system. Data consistency among GISs and other related systems (e.g., SAP and transformer system) is critical for GIS data quality as well.
System usefulness, the perception of users toward a GIS, was stated as an important factor conforming to a result of the previous study by Mulasastra and Krairit’s [1] that system usefulness significantly impacts data quality in a healthcare context. It is obvious that if users perceive that a system is beneficial such that it can make him work more efficiently, then they would be willing to collect or enter data appropriately. The last factor is data quality control where the majority of the panelists agreed that regular improvement, quality checks, and feedback of data quality can influence data quality, in accordance with the study by Mulasastra and Krairit [1].
6. Conclusion
Despite the importance of data quality, there has been little empirical research on the factors influencing GIS data quality, particularly in the electric utility context. This study conducted research using the Delphi technique for aggregating expert opinions in a Thai public utility. Three iterations of Delphi surveys were conducted. Consensus measurement was determined by a decrease in the deviation of responses from round 2 to round 3.
The analysis results reveal that technological, management, and geographical factors are very essential for GIS data quality. These technological factors concern system quality, system usefulness, and data source quality. The management factors relate to effective and adequate training, knowledge and skills of GIS data collectors, relocation of GIS data collectors, system support, and data quality improvement. The geographical factor relates to difficulty in accessing data of electronic devices.
The new GIS data quality determinants have been discovered in this study. They are geographical factors, supporting equipment, and human resource management (with specific GIS issues). The other determinants, common among the results of previous studies, are top management awareness, system quality, system usefulness, data quality control, data source quality, training, and system support.
This finding suggests that this electric utility should develop a data quality management strategy and plans to ensure high-quality data across the organization. This suggestion conforms to Thailand’s digital government policy which encourages all Thai government agencies to implement data governance. For this utility, attention should be paid to all ten determinants revealed by this study.
We derived all the factors from opinions gathered from experts in GIS data collection in a single utility company. Hence, a quantitative research should be conducted to develop valid and reliable measurement instrument for these factors. Then, they should be quantitatively tested across all branch offices in this utility industry to determine the association between the determinants and their effects on GIS data quality.
References
[6] ESRI, “GIS for transportation infrastructure management,” USA, 2011. [Online]. Available: http://www.esri.com/library/brochures/pdfs/transportation-infrastructure.pdf
[7] H. R. Cottam, M. Roe, and J. Challacombe, "Outsourcing of Trucking Activities by Relief Organisations," Journal of Humanitarian Assistance, pp. 1-26, 2004.
[8] C. Binder, "The six boxes™: A descendent of gilbert's behavior engineering model," Performance Improvement, vol. 37, pp. 48-52, 1998.
[9] J. Chin and S.-C. Lin, "A behavioral model of managerial perspectives regarding technology acceptance in building energy management system. ," Sustainability, vol. 8, pp. 1-13, 2016.
[10] R. Y. Wang and D. M. Strong, "Beyond accuracy: what data quality means to data consumers," Journal of Management Information Systems, vol. 12, pp. 5-33, 1996.
[12] P. E. Rosa DelaCruz, "Data quality management for electric utilities: essential data smart principles for managing your critical assets," DNV GL, Norway, 2016. [Online]. Available: https://www.dnvgl.com/Images/Cascade-Data-quality-management-for-electric-utilities-Whitepaper_tcm8-69544.pdf
[17] G. Zeiss, "GIS or data science ?," in Between the poles, ed, 2018.