I.INTRODUCTION
ChatGPT, an artificial intelligence tool developed by OpenAI, was launched on November 30, 2022. Much attention from both academia and industry has focused on the use of ChatGPT ever since its release. Companies are trying to use it to create position papers; students are trying to use it to solve problems, generate codes, and write term papers; professors are trying to use it for assessment purposes along with many other applications. The students who participated in this study were from a graduate course (MBA) in Supply chain and Operations Management. There were 13 students. All of these students did not have any prior exposure to ChatGPT. One of the assignments in the class is to write a 10-page term paper on any topic related to Supply Chain and Operations Management. Students selected their own topics. These topics are listed in Appendix A. This semester (Spring 2023), the professor decided to require students to use ChatGPT in writing their term paper.
Typically, ChatGPT generates one to two pages of original material for the novice user (who is not well trained in using ChatGPT). This was the case for the students in this class. Then, the students were asked to use the ChatGPT-generated material as a guide to writing a 10-page long paper, adding new material, references, and citations. Students were also asked to complete a brief survey about their experience in using ChatGPT. We employ a hybrid approach to study the effectiveness of ChatGPT from both qualitative and quantitative approaches. Specifically, we use a survey study to investigate how ChatGPT can help students in improving their coursework. We also utilize text mining algorithms to quantify the key measures, namely, information density and readability, that reflect the quality of students’ outputs.
The remainder of the paper is organized as follows: Section II presents a brief survey of the various researchers who have reported the use of ChatGPT in the literature; Section III contains the framework of this study, how the data was collected and analyzed; Section IV discusses the findings of the study and the further work that can be done; and, finally, Section V has concluding remarks.
II.LITERATURE SURVEY
Shortly after the ChatGPT was released, researchers started reviewing the impact of ChatGPT on the pedagogical practices in educational institutions [1]. It was a theoretical and analytical study that discussed both the possible positive and negative impacts of ChatGPT. It theorized that these impacts depended on the institution’s response to the innovative technology. The paper talked about the potential benefits such as ChatGPT becoming a learning enabler and enhancer among other things. The main potential negative impacts included a lack of developing critical thinking skills and problem-solving skills.
Within a short time that ChatGPT was made available, educators, researchers, and practice folks have focused on two tracks—one, what are the capabilities of ChatGPT and how they can be used; the other track has focused on examining the accuracy of the material generated. One of the first surveys presented a comprehensive review of its underlying technology, applications, and challenges [2]. The paper also discusses how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.
Another paper [3] reviewed the literature published within three months of the release of ChatGPT in November 2022 with a focus on educational applications. The paper reported that the performance varied across subject domains, ranging from outstanding (e.g., economics) and satisfactory (e.g., programming) to unsatisfactory (e.g., mathematics).
Another topic of research was focused on the future of learning, teaching, and assessment in higher education in the context of ChatGPT [4]. Their literature review was among the first peer-reviewed academic journal articles to explore ChatGPT.
One paper compared the accuracy of human-graded results with the ChatGPT-graded responses [5]. They found that the human-generated results were superior. One of the concerns regarding the use of ChatGPT by students is to complete take-home assignments and exams without genuinely acquiring knowledge [6]. Their study reported ChatGPT’s high degree of inaccuracy in answering a diverse set of questions related to topics in an undergraduate computer science course.
Related to this is the effort to develop an AI education policy for higher education [7]. This paper proposed a framework consisting of three dimensions: Pedagogical (teaching and learning), Governance (privacy, security), and Operational (needed infrastructure). Another paper analyzed over 300,000 tweets and more than 150 scientific papers to investigate how ChatGPT is perceived [8]. They reported that social media viewed ChatGPT generally with positive sentiments. In recent scientific papers, ChatGPT was viewed not only as a great opportunity across various fields but also as a threat concerning ethics.
Another concern with ChatGPT has been whether ChatGPT can replace the role of a teacher in the classroom [9]: The study concluded that ChatGPT could not replace the role of a teacher entirely and recommended integrating this tool in learning, which would require teachers to develop competency.
The ethical use of ChatGPT has been another concern. This concern was addressed in a thorough analysis of the responsible and ethical usage of ChatGPT in education [10]. The study found that the use of ChatGPT in education requires transparency, respect for privacy, fairness, and nondiscrimination.
Another area that was explored in a study of the potential benefits and limitations of ChatGPT in promoting teaching and learning [11]. Benefits include promoting personalized and interactive learning, generating prompts for formative assessment activities that provide ongoing feedback. The paper also highlights some inherent limitations such as generating wrong information, biases in data training, which may augment existing biases, and privacy issues. The study offers recommendations on how these evolving generative AI tools could be used safely and constructively to improve education and support students’ learning.
Another area of research has been the role of ChatGPT in educational transformation, response quality, ethics, etc. [12]. This paper reported that there was enthusiasm with some caution regarding its use in educational settings. Finally, they investigated user experiences through ten educational scenarios that revealed various issues, such as cheating, the truthfulness of ChatGPT, privacy, and manipulation.
ChatGPT is being more and more used in the peer-review process for submitted research papers for possible publications [13]. The authors anticipate that the reviewers are going to generate peer-review reports using these tools. Currently, no guidelines exist on how to use these systems for this purpose.
Another use of ChatGPT is to solve programming bugs and its limitations [14]. The paper recommended using ChatGPT in conjunction with other debugging tools to identify and fix bugs more effectively.
ChatGPT is being used by many diverse applications such as in the healthcare and the law schools. A review of the use of ChatGPT in the healthcare industry was reported [15]. ChatGPT had achieved only moderate success and is unreliable for actual clinical deployment. They recommended the use of specialized natural language processing (NLP) models trained on biomedical datasets as the right direction to pursue for critical clinical applications.
One study reported how well ChatGPT generates answers on four real exams at the University of Minnesota Law School [16]. ChatGPT performed on average at the level of a C+ student, achieving a low but passing grade in all four courses. The paper also provided advice on how ChatGPT could assist with legal writing.
Another area of the use of ChatGPT is for the students and instructors of communication, business writing, and composition courses [17]. This study recommends instructors refrain from making theory-based questions and provide students with detailed case-based and scenario-based assessment tasks that call for personalized answers. Another study explored the use of ChatGPT in teaching and learning English for Specific Purposes and found that it can be an effective and time-saving tool for the preparation and implementation of teaching units and evaluation of students’ written assignments [18].
One researcher applied ChatGPT to the most challenging part of science learning. It reported success in the automation of assessment development, grading, learning guidance, and recommendation of learning materials [19]. Another application was attempted to apply ChatGPT to science education [20]. It is important for educators to model responsible use of ChatGPT, prioritize critical thinking, and be clear about expectations. ChatGPT is likely to be a useful tool for educators designing science units, rubrics, and quizzes.
Another ChatGPT application examined how well ChatGPT performed when tasked with answering common questions in a popular software testing curriculum [21]. They found that given its current capabilities, ChatGPT is able to respond to 77.5% of the questions we examined and that, of these questions, it is able to provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases.
ChatGPT can significantly assist with finance. There are clear advantages to idea generation and data identification [22]. However, it is weaker in literature synthesis and developing appropriate testing frameworks. Importantly, they demonstrated that the researcher’s domain expertise input and private data are key factors in determining the quality of output.
ChatGPT can enhance e-commerce via chat and other sectors such as education, entertainment, finance, health, news, and productivity [23]. It also discusses how this tool can be used to create more personalized content for users and to make customer service more efficient and effective for businesses.
ChatGPT has been applied to the hospitality and tourism industry [24]. This paper discusses the benefits, challenges, and threats of ChatGPT. In particular, we investigate how users search for information, make decisions, and how businesses produce, create, and deliver customized services and experiences.
Yet another application of ChatGPT has been in business education and research with a particular focus on the areas of management science, operations management, and data analytics [25]. Professors can design courses, create syllabi and content, and help with grading. Students can explain complex concepts, create and debug code, and create sample exam questions. They found that writing code, debugging, and grading are the greatest strengths of ChatGPT. Limitations include that it often makes errors and requires a deeper knowledge of the domain to catch them.
As one can see, the published work in this area is in its early stages. More applications and data are needed to really understand the use of ChatGPT, not only in the classroom but also in the field. As we understand the capabilities of this or other similar tools, a more useful picture is going to evolve.
III.FRAMEWORK OF THE STUDY
In the discussions about the potential use of ChatGPT (and similar tools) by students for cheating in completing classroom assignments, it is generally accepted that it cannot be stopped. Then, the question arises how we can use these tools in a constructive manner that will enhance the student learning outcomes. This is what led to the study reported in this paper. Students who participated in this study were from a graduate course in Supply Chain and Operations Management. It had 13 students. All of these students did not have any prior exposure to ChatGPT. One of the assignments in the class is to write a 10-page term paper on any topic related to Supply Chain and Operations Management. This semester (Spring 2023), the students were required to use ChatGPT in writing their term paper. Typically, ChatGPT generates one to two pages of original material for a novice user who is not well trained in using it. This was the case for the students in the two classes. Then, the students were asked to use the ChatGPT-generated material as a guide to write a 10-page long paper, adding new material, references, and citations. The students were also asked to share their experience in using ChatGPT by answering the following four questions:
- 1.Was the material generated by ChatGPT helpful in writing your paper? How? Please give examples.
- 2.In what ways, it was not helpful?
- 3.What were the challenges in using ChatGPT?
- 4.Would you have been better off not using ChatGPT?
The main objective of this study was to examine the feasibility of a constructive use of ChatGPT in the classroom. The definition of constructive use was to demonstrate that students, indeed, can take the basic material generated by ChatGPT and build on it to write a full paper. The analysis presented in this paper is primarily based on a qualitative analysis. A text-mining analysis is also conducted as additional analysis. A brief roadmap for conducting qualitative research has been discussed in the literature [26].We are applying in this analysis some of the suggestions made in that paper. This includes (1) getting a sense of the whole, (2) extracting the facts, and (3) identifying key topics or major story lines. The research questions that we address are as follows:
- Q1. Did the quality of the ChatGPT output reflect the fact that the students were novice users of ChatGPT?
- Q2. How many key points were identified by ChatGPT that students were able to use and expand in the full paper?
- Q3. How many key points were missed by ChatGPT and identified later by the students?
- Q4. In what ways did students find ChatGPT helpful?
- Q5. In what ways did students find it unhelpful?
- Q6. What were the challenges faced by students in using it?
- Q7. Would the students have been better off without using ChatGPT? (students’ perception)
The first three questions deal with the quality of the ChatGPT output. We have used three different measures of quality—the number of pages generated, the number of main points captured by ChatGPT, and the number of main points missed and added later by the student. The next four questions deal with the survey results. In addition, we conduct text analysis to measure the quality of ChatGPT-generated and student-generated content. Specifically, we focus on readability and information density. The automated readability index, featured by its flexibility in various domains, was used to proxy the readability of submissions in terms of sentence and word difficulty. To measure information density, we consider named entity recognition that extracts essential information such as location, organization names, date, values, and person names. The information density is then calculated as the ratio of the number of identified named entities and the length of the submission. In the next section, we will present an analysis of the data to answer the above questions.
IV.DATA ANALYSIS
Appendix A contains the list of topics. Out of 13 topics, seven were related to supply chain (SC), two were related to both supply chain and outsourcing (SC/OS), one was related to outsourcing (OS), and three were related to quality management (QM). To make sure that we do not miss anything, we did a manual analysis of both ChatGPT output and the full paper submissions by each student. Since the topics were varied, each submission had its own unique main points. Table I summarizes the data derived from the submissions. The first column is the number of pages a student has generated after entering the keywords in ChatGPT. The second column represents the number of key points that were made in the ChatGPT output. The third column contains the number of additional key points that a student added to the full paper. The last column is subjective as it reflects the professor’s assessment of how a student made use of the ChatGPT material in writing the full paper.
#Pages (ChatGPT) | #Points (ChatGPT) | # Points Added By Student | Effectiveness of ChatGPT Output (Paper Code) |
---|---|---|---|
1 | 6 | 6 | Medium (SC1) |
1 | 4 | 5 | Medium (SC2) |
1 | 6 | 8 | Medium (SC3) |
1 | 5 | 4 | Medium (QM1) |
1.5 | 6 | 0 | High (SC4) |
1.5 | 2 | 2 | Low (OS/SC1) |
2 | 6 | 5 | Medium (QM3) |
2 | 4 | 0 | High (SC5) |
2 | 5 | 1 | High (SC6) |
2 | 6 | 7 | Medium (OS1) |
2 | 4 | 5 | Medium (OS/SC2) |
2.5 | 4 | 2 | Medium (QM2) |
3.5 | 2 | 4 | Low (SC7) |
SC = Sully Chain | OS = Outsourcing | QM = Quality Management |
A.NOVICE USER EFFECT
As mentioned earlier, all students had not used ChatGPT before. Whatever ways they tried to get ChatGPT output was self-learned. This is clear from Fig. 1, which shows that 11 out of 13 students could generate less than two pages of output. One student got three pages, and one got three and a half pages. We could not observe the effect of having trained ChatGPT users as that was outside the scope of this study. It would be expected that an advanced user gets more quality output.
B.EFFECTIVENESS Of ChatGPT OUTPUT
We measured the effectiveness of ChatGPT output in a number of ways. The first was to examine if there was any relationship between the number of main points in the ChatGPT output covered and the number of pages of output. How many missing points were identified and added by the student? As one can see in Fig. 2, there was none. In fact, the trend seems the opposite of what might be expected. This was a surprising result and probably reflects the behavior of novice users.
One important factor that we considered in this exercise was how many additional main points a student identified from their own research. Figure 3 shows this data. The blue bars represent the ChatGPT main points, and the orange bars represent the number of additional main points added by the student. Except for two students, all students had to add anywhere from 1 to 7 points, the average being around 4. This clearly demonstrates that ChatGPT can do a better job.
The other exercise in further establishing the effectiveness of ChatGPT was to determine how effectively the student used those points, how many points students had to discover on their own, and how they assimilated them in the full paper. For the purpose of this research, we assumed that the student’s capability to do this assimilation was not significantly different from each other. Each paper was judged as having one of the three designations—high impact, medium impact, and low impact. Since each topic was different, this determination was important. These are shown in the last column of Table I and also in Fig. 4. Three papers were considered to have high impact, seven medium, and two low. This once again shows that the ChatGPT exerts high impact only if the user is a skilled user.
C.THE USEFULNESS OF ChatGPT, DEFICIENCIES, CHALLENGES, ETC.
Factor | Comments |
---|---|
All 13 students found ChatGPT extremely useful in getting started; good starting points (10); helpful in further literature search (3); some new ideas that would have been missed (2). | |
Deficiencies | Repetitive (10), sources not identified (5), false references and citations (4), outdated material (2), not helpful in analyzing the concept (1), no depth (basic material) (1) |
Challenges | Learning challenges (2), availability (13) |
For Better or For Worse | Only one person said he would have been better off not using ChatGPT; all others (12) said the life would have been much harder without it and commented that it was a good idea to use it and recommended a continuing use for such assignments. |
To further understand how students use ChatGPT to develop essays, we conducted text analysis to compare the ChatGPT output and students’ output. Specifically, we examined the readability and information density of output. The readability is measured by the Gunning fog index [27], which estimates the years of formal education a person needs to understand the text. We consider Gunning fog index (to measure readability) because it has been widely used in evaluating students’ submission in teaching [28]. We then used NLP algorithms to extract named entities such as people, company names, locations, dates, and quantities from text. The information density was then calculated by the percentage of named entities extracted from the text. A higher value of information density indicates that more concepts and information are embedded in the text. If students can effectively use ChatGPT output as an outline to generate a term paper, we would expect more detailed supply chain-related concepts can be added through students’ search of knowledge. For example, given the topic of logistics, students can explore subtopics in this area such as transportation, transshipment, network flow, and shortest path. We used the python package spaCy and NLTK to carry out the analysis. The results of text analysis are presented in Table II. It can be shown that the readability of student-generated output is significantly higher than that of ChatGPT-generated output, indicating students’ effort in revising ChatGPT output. According to Gunning fog index, an average readability score of 14 is aligned with college sophomore level, while an average readability score of 16 matches college senior level. Moreover, the information density of student-generated output is higher than that of ChatGPT-generated output. This further validates the revisions made by students, that is, students have made efforts that significantly enrich the original draft by ChatGPT. To summarize, compared to ChatGPT-generated content, student-generated content is more difficult to read. Nevertheless, student-generated submissions add significant essential information to the ChatGPT outputs, indicating students’ effort in improving the quality of their term papers, albeit they are less trained in using ChatGPT.
ChatGPT generated | Students generated | Difference | |
---|---|---|---|
Readability | 14.21 | 16.32 | 3.11 |
Information density (%) | 1.72 | 2.98 | 1.26 |
p < 0.1.
p < 0.05.
p < 0.01.
V.CONCLUSIONS AND SUGGESTED FURTHER WORK
This paper has presented the result of an experiment done in a constructive use of ChatGPT in a classroom. It consisted of requiring students of a graduate class to use ChatGPT to generate an outline of the paper on a topic related to operations and supply chain management. Students were not at all familiar with ChatGPT and had not used it before. Students were given minimal instructions on how to generate such material. Then, they were to use the ChatGPT-generated material to write a full 10-page paper on the topic. They were required to submit both in a single submission. A manual analysis was done to extract the data from these submissions. The analysis determined that ChatGPT was a very useful tool for writing a term paper in a classroom setting. It provided a very constructive starting point for the paper. However, students had to be aware of the limitations of ChatGPT as it was not comprehensive in that it did not capture all the relevant points. Also, the citations were found to be erroneous. Overall, the students had a very positive experience and recommended that the use of ChatGPT be encouraged in a classroom.
The paper also identified a major factor that can hamper an effective use of this tool. This has to do with students’ skills in using ChatGPT. Since after a while, ChatGPT starts repeating the material, the user has to be well trained to generate new and useful material. This level of required training in a classroom needs to be investigated further. Furthermore, some of the topics that can be pursued for future research include:
- 1)A cross-disciplined study can be conducted to explore the heterogeneous impact of ChatGPT in classroom learning. Researchers can extend the framework of study by collecting data from diverse majors. Particularly, a comparative study on the effectiveness of ChatGPT between STEM and non-STEM majors can be investigated.
- 2)Sentiment and semantic analysis can be conducted to attain a deep understanding of the role of ChatGPT in helping students in learning. While this study looks at text mining measures such as information density and readability, researchers can also explore the ChatGPT outputs from other perspectives such as tone and semantic similarity between students’ outputs and ChatGPT outputs.
- 3)Information system models such as Technology Acceptance Model can be used to explain the possible mechanism that ChatGPT may be effective in classroom teaching.