A Conversation with an Open Artificial Intelligence Platform on Osteoarthritis of the Hip and Treatment

Ishith Seth; Aaron Rodwell; Reece Tso; John Valles; Gabriella Bulloch; Nimish Seth

A Conversation with an Open Artificial Intelligence Platform on Osteoarthritis of the Hip and Treatment

Ishith Seth^1,2,3*, Aaron Rodwell², Reece Tso ³, John Valles ³, Gabriella Bulloch⁴, Nimish Seth¹

¹Faculty of Science, Medicine, and Health, Monash University, Victoria, Australia

²Department of Surgery, The Wollongong Hospital, New South Wales, Australia

³Department of Orthopaedic Surgery, Peninsula Health, Melbourne, Victoria, Australia

⁴Faculty of Science, Medicine, and Health, The University of Melbourne, Victoria, Australia

^*Corresponding Author: Dr Ishith Seth, Peninsula Clinical School, Central Clinical School at Monash University, The Alfred Centre, 99 Commercial Rd, Victoria 3004, Australia.

Received: 28 February 2022; Accepted: 06 March 2023; Published: 09 March 2023

Article Information

Citation:

Ishith Seth, Aaron Rodwell, Reece Tso, John Valles, Gabriella Bulloch, Nimish Seth. A Conversation with an Open Artificial Intelligence Platform on Osteoarthritis of the Hip and Treatment. Journal of Orthopedics and Sports Medicine. 5 (2023): 112-120.

DOI: 10.26502/josm.511500088

View / Download Pdf Share at Facebook

Keywords

<p>Osteoarthritis; Artificial Intelligence; Hip; Chat Generative Pre-Trained Transformer</p>

Article Details

1. Introduction

The Chat Generative Pre-Trained Transformer (Chat-GPT) is an Artificial Intelligence (AI) platform that was made available to the public in November of 2022 [1]. This technology, which is capable of generating humanoid text, has been regarded as a tool that can reduce the workload burden of writing scientific journal articles while ensuring the academic standards of writing [2]. Its potential has been demonstrated through its acceptance for publication as an author in multiple journals, and even passing the United States medical board examinations [2-5].

However, the emergence of Chat-GPT has raised concerns about the teaching and assessment practices of academic institutions and has even called into questions the ethical practices of academia [3,6,7]. The fact that a manuscript written by Chat-GPT went undetected through plagiarism software has sparked concerns about academic integrity. While a tool designed to detect Chat-GPT produced works has been released by the founding company, OpenAI, it has been warned that is not entirely reliable. As a result, some journals have banned the algorithm as an author for publication in their issues, while others have freely published it as an author [3].

One of the major challenges in the acceptance of Chat-GPT in academia is its lack of informed judgement about academics topics. This is due to the vast amount of information available on the internet, much of which is not peer-reviewed and could be outdated. Additionally, the training databases for Chat-GPT are limited to 2021, which prevents timely discussion on topics that naturally evolve on day-to-month basis. Studies evaluation the use of Chat-GPT for orthopaedic surgery research has highlighted a lack of creativity and judgement as the main concerns [2,5].

This article aims to evaluate the prospects of using Chat-GPT in the field of orthopaedic surgery research by examining its responses to a series of curated questions on hip osteoarthritis (OA). The accuracy and reliability of Chat-GPT’s answers are scrutinized by orthopaedic researchers (IS, RT, and NS), and subsequent questions are designed to assess whether Chat-GPT can improve its responses.

2. Methods

A series of questions were designed by IS and AR which were then asked to Chat-GPT via its online chat-box system (insert link). The answers were evaluated by two medical doctors and experts in hip OA (IS, RT and NS), and if the responses were deemed unsatisfactory a series of follow-up questions were designed to probe whether the correct answer could be elicited by changing the question formation. The responses were evaluated for their accuracy as well as its capacity to identify prospective research ideas.

3. Results

The first prompt was “In 300 words, describe the current evidence on the surgical management of hip osteoarthritis with 5 references”. The Chat-GPT provided reasonable comprehensive descriptions of the standard surgical interventions for hip osteoarthritis, but with some omissions, such as resurfacing and femoral head resection procedures. The response provided was concise, grammatically correct, and informative, with relevant descriptions of the benefits and risks of the surgical options presented, though the advice was generalized to all populations. Chat-GPT also provided an accurate depiction of the fundamental principles of Total Hip Arthroplasty (THA) procedures. However, some significant potential complications, including injury to adjacent structures, blood clots, and anesthetic risks, were not mentioned [8]. The sources referenced by Chat-GPT were reputable; however, the model failed to utilize in-text referencing, which made it challenging to evaluate the accuracy and interpretation of the information presented (Figure 1). In summary, while Chat-GPT's response provided a reliable overview of the surgical management of hip osteoarthritis, some areas of omission and an absence of in-text referencing detract from its overall academic rigor. Further elaboration on some of the key aspects, such as potential complications and references, would enhance the response's scientific and academic quality.

Figure 1: Prompt - In 300 words, describe the current evidence on surgical management of hip osteoarthritis with 5 references.

Next, Chat-GPT was asked “In 300 words, what are the clinical outcome differences between the anterior and posterior total hip arthroplasty approaches, describe the quality of evidence and provide 5 references”. Chat-GPT provided a concise and accurate comparison while emphasizing the importance of personalized patient care. The strengths and weaknesses of each approach were presented, although the latter were not explicitly discussed. Notably, Chat-GPT did not provide in-text citations, rendering it challenging to assess the accuracy of the presented information. While Chat-GPT referenced meta-analyses, Randomized Controlled Trials (RCTs), and reviews, it exclusively mentioned RCTs in the text, indicating a level of understanding regarding the hierarchy of evidence (Figure 2).

Figure 2: Prompt - In 300 words, what are the clinical outcome differences between the anterior and posterior hip approaches, describe the quality of evidence and provide 5 references.

However, the omission of any studies published after 2018 could potentially undermine the comprehensiveness of the analysis. Moreover, Chat-GPT did not offer expert-level insights into choosing between the anterior and posterior THA approaches based on patient demographics and anatomical features. The surgical descriptions provided by Chat-GPT were succinct and limited in scope, but adequate within the specified word limit (Figure 3).

Figure 3: Prompt - In 300 words, describe the difference between cemented versus uncemented hip arthroplasty, describe the quality of evidence using centre for evidence-base medicine levels of evidence, and using 5 references.

Chat-GPT provided an accurate and sufficiently specific comparison between surgical approaches and noted the importance of individualized care. Advantages of each THA were accurate and succinct however disadvantages were not explicitly mentioned. No in-text referencing was provided, making it difficult for authors to fact-check the validity of information provided by Chat-GPT. Only meta-analyses, RCTs and reviews were referenced, inferring Chat-GPT understood level of evidence, although it referred only to RCTs in the text. Chat-GPT did not mention any studies after 2018, leaving out critical and relevant evidence [9]. It did not provide expert-level insight for choosing anterior or posterior approached, such as patient demographics and anatomical features. Their surgical descriptions were also superficial and brief, but fair in context of the word limit (Figure 4).

Figure 4: Prompt - In 300 words, provide future recommendations for surgical management of hip osteoarthritis and innovation that is needed for further advancements in this field, using 5 references.

Figure 5: Prompt - In 300 words, describe the duration of venous thromboembolism prophylaxis in hip arthroplasty that has best clinical outcomes, describe the quality of evidence and provide 5 references.

The third prompt asked, “In 300 words, describe the difference between cemented versus uncemented hip arthroplasty, describe the quality of evidence using centre for evidence-base medicine levels of evidence, and using 5 references”. The response provided by Chat-GPT was accurate in terms of the level of evidence associated with the cited references, however, it lacked critical meta-analyses. The comparison of outcomes between cemented and uncemented THA was appropriate, and Chat-GPT highlighted the significance of personalized treatment. However, it neglected to address the suitability of each approach for specific patient populations. While the reported complications and outcomes for each method were precise, the response was not comprehensive and did not encompass all relevant outcomes. Regrettably, the response did not include in-text referencing, and the majority of the references cited were not germane to the topic. Furthermore, the response failed to incorporate key meta-analyses that would have contributed significantly to the overall quality of the analysis.

Next, Chat-GPT was asked “In 300 words provide future recommendations for surgical management of hip osteoarthritis and innovations needed for further advancements in this field, using 5 references”. Chat-GPT recognized the importance of current innovation but failed to provide new technological ideas or identify areas for improvement. Specifically, Chat-GPT suggested the need for more durable implants, which represents an unoriginal viewpoint in the field. Additionally, Chat-GPT noted the necessity for new joint-preserving techniques and technologies that target specific joint areas, which the authors deemed to be overly broad and inaccurate given current techniques' capability in this regard. While Chat-GPT highlighted the need for more evidence on some techniques, this represents an accurate but not particularly innovative perspective. Notably, Chat-GPT neglected to mention timely technologies such as robotic assistance, three-dimensional printing, and implant customization, which were discovered before 2021 and represent significant areas of innovation in hip osteoarthritis treatments.

The final prompt was “In 300 words, describe the duration of venous thromboembolism prophylaxis in hip arthroplasty that has best clinical outcomes, describe the quality of evidence and provide 5 references”. While Chat-GPT provided up-to-date and accurate evidence in support of extended VTE prophylaxis, it lacked specificity regarding the various medication regimens available for prophylaxis. Moreover, the algorithm failed to provide in-text references for the multiple randomized control trials and meta-analyses it cited, making it difficult to verify the sources of its information. Additionally, the studies referenced by Chat-GPT were not specifically focused on VTE prophylaxis in to THA patients, and the algorithm neglected to mention a 2018 systematic review and network meta-analysis on this topic that informed the guidelines issued by the National Institute for Health and Care Excellence [10].

4. Discussion

This case study evaluated the efficacy of Chat-GPT's ability to generate academic content on the topic of hip OA. While Chat-GPT's responses were largely accurate, they tended to be superficial due to the imposed word limit. Furthermore, Chat-GPT lacked the judgement and breadth to accurately analyze important topics in hip OA surgical treatments, such as new surgical techniques and high-level evidence meta-analyses. As a result, Chat-GPT's usefulness as an active participant in science and medical research is limited. The language used by Chat-GPT to summarize scientific information was generally presented in layman's terms and avoided the use of specialized orthopaedic jargon. While this is useful for patients seeking to understand medical terminology, it lacked the expert-level opinions necessary for orthopaedic research. For example, while Chat-GPT provided information about the advantages of THA, it did not provide expert-level opinions on outcomes such as blood loss and range of motion. Thus, while Chat-GPT may be used as a summary for medical students or non-orthopaedic medical specialists, it may not provide additional value to most orthopaedic surgeons.

Despite these limitations, Chat-GPT may be useful as a source of preoperative information for patients seeking to understand their care and potential outcomes. However, important references and up-to-date research were occasionally missed, making Chat-GPT a potential hazard in this format. As such, Chat-GPT could not be considered superior to patient handouts as a source of perioperative information, although it could provide patients with critical insights into their health and potential outcomes. Overall, Chat-GPT's utility in orthopaedic research may be limited, but it has potential as a tool for patient education and engagement. Using Chat-GPT in this way could give patients more critical insight to their health and potential outcomes, thereby easing anxiety and promoting better outcomes. Despite this, Chat-GPT could be a potential hazard in this format considering important references and up-to-date research were missed. Therefore, if considered as a source of perioperative information, it could not in its current state, be considered superior to patient hand-outs despite providing a dynamic interface for information.

Of additional concern is Chat-GPT's failure to cite important references that have previously influenced management guidelines, raising ethical concerns surrounding authorship, accountability, and consent. This issue underscores the need for careful revision and peer-review. Furthermore, Chat-GPT is known to provide different responses to repeated questions, which raises questions about whether the algorithm also varies the literature it provides with each response. This issue is problematic because it could lead to discrepancies in the distribution and accessibility of information and raises ethical concerns. However, it is also possible that with enough iterations, Chat-GPT could provide more informed and original answers due to its machine-learning capabilities. In this context, the superficiality of responses on hip OA and the disorganization of references may be attributed to the low volume of questions on this topic received by the algorithm. It is plausible that in six months, Chat-GPT may learn and generate answers with improved information and references and should be monitored accordingly.

The same principle could be stated for the originality of the responses provided when Chat-GPT was asked about future changes to management and potential innovations. It is clear in the current context that the answers provided were unoriginal and lacked creativity, albeit they were issues posed by the orthopaedic community. These findings are alike to the previous case-studies on Chat-GPT which showcased a disappointing level of insight and judgement on other orthopaedic issues. Clearly, the current algorithm has limited contextual awareness by nature of its lack of use, which promotes a lack of originality and convergent thinking. It would be interesting if in a matter of months these responses were to change and evolve along with the learning mechanism of the AI, if it learnt to produce something truly original and accurate. To better promote divergent thinking the makers of Chat-GPT could also consider training and testing the algorithms on journal articles rated by experts for their level of evidence, relevance, and accuracy of information. This could lead to a big jump in the applicability of Chat-GTP in its next prototype and better applicability to orthopaedic surgeons on this topic.

5. Conclusion

In conclusion, Chat-GPT is unable to provide personalized recommendations for orthopaedic surgical care and may, in its current form, be more appropriate for acquiring general knowledge for non-academic use. Further research using larger word-limit parameters could manifest deeper, more insightful Chat-GPT results than demonstrated in this study, but greater development of the algorithm by experts in the field could also benefit its accuracy and specificity. For now, Chat-GPT’s current role in research should be limited and should be scrutinized by expert researchers.

Acknowledgements:

None.

Conflict of Interest:

No authors have any conflict of interest to declare.

Funding:

None.

References

Aljanabi M, Ghazi M, Ali AH, et al. ChatGpt: Open Possibilities. Iraqi Journal for Computer Science and Mathematics 4 (2023): 62-64.
Seth I, Rodwell A, Bulloch G, et al. Exploring the Role of Open Artificial Intelligence Platform on Surgical Management of Knee Osteoarthritis: A Case Study of ChatGPT S13 (2023): 217-222.
Flanagin A, Bibbins-Domingo K, Berkwits M, et al. Nonhuman “Authors” and implications for the integrity of scientific publication and medical knowledge. JAMA (2023).
O'Connor S. Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse? Nurse Education in Practice 66 (2022): 103537-103537.
Kim J-h. Search for Medical Information and Treatment Options for Musculoskeletal Disorders through an Artificial Intelligence Chatbot: Focusing on Shoulder Impingement Syndrome. medRxiv (2022).
King MR and chatGPT. A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cellular and Molecular Bioengineering (2023): 1-2.
Else H. Abstracts written by ChatGPT fool scientists. Nature 613 (2023): 423.
Healy WL, Iorio R, Clair AJ, et al. Complications of total hip arthroplasty: standardized list, definitions, and stratification developed by the hip society. Clinical Orthopaedics and Related Research® 474 (2016): 357-364.
Fagotti L, Falotico GG, Maranho DA, et al. Posterior versus anterior approach to total hip arthroplasty: a systematic review and meta-analysis of randomized controlled trials. Acta Ortop Bras 29 (2021): 297-303.
Lewis S, Glen J, Dawoud D, et al. Venous thromboembolism prophylaxis strategies for people undergoing elective total hip replacement: a systematic review and network meta-analysis. Value in Health 22 (2019): 953-969.

Journal Menu

Abstracting and Indexing