By now, most people have heard about ChatGPT. According to a UBS study, it recently set the record for the fastest growing consumer application in history. There is no question ChatGPT is taking the world by storm and the applications of such a powerful tool are endless. I had already heard of patient advocates using GPT-4, the large language model that ChatGPT uses to power its chatbot, to help summarize health records, ask for assistance in dealing with doctors, and giving patients insights into their care with a high degree of certainty and empathy. After all, GPT-4 has already passed the Bar and US medical licensing exams and a recent study showed chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy.
My Testing Process
With all of this hype, I decided to use ChatGPT to test out its ability to help a patient understand a frozen embryo transfer (FET) treatment plan. To help with this, I used a real treatment plan that a reddit user uploaded to the Reddit r/IVF community a few years ago and ensured that any information that could compromise the user or identity was removed. It was important to use a real document a patient would receive directly from their clinic so make this as realistic as possible.
I adopted the persona of a 34 year old woman with unexplained infertility undergoing IVF for the first time. Although my personal experience gives me a better understanding many of the aspects of the IVF process and find resources, I wanted to see how well ChatGPT could help a novice understand their situation, provide additional insights, and see shortcomings of the current version of ChatGPT (March 2023 version).
To start, I created a new account with ChatGPT and a new conversation, ensuring I was starting fresh. The first prompt I gave ChatGPT went as followed:
In under 1 second, ChatGPT gave me a list of the medications in the treatment plan along with a brief description of each and why they are used.
As the adopted persona of a first time IVF patient, I wanted to know if this was a standard FET plan as a way to start to compare what I was going through to someone else. So I asked ChatGPT if these medications were standard.
Once again, ChatGPT responded in less than a second. This answer was less detailed, and started to show some deficiencies as a tool for patients.
I decided to change my approach slightly and see what ChatGPT could do to help me better communicate with my fertility doctor and act as a patient advocacy tool. I asked it to give me a list of questions I should ask my doctor to help me better understand the treatment plan.
Here, ChatGPT really shines and provides me with a list of questions that not only addresses the current FET plan, but also begins to prepare me for next steps. Many fertility patients know it is hard to balance looking one day and one month ahead when you are in the thick of treatment.
Finally, even though ChatGPT had already told me I should ask my doctor about the success rate for my specific case, I wanted to know what it would tell me the success rate for me could be.
ChatGPT gave the measured response the model tends to give users, giving a very generic range with a number of caveats.
I decided to wrap up our conversation by asking ChatGPT for some advice and resources to help me prepare for and understand the FET process.
ChatGPT once again showed an empathetic response and gave me some good tips, focusing on myself by telling me to stay informed and practice self care. It even gave me a list of resources from reliable websites on the FET process.
However, this is where ChatGPT failed outright for the first time.
When I clicked on the URLs, each one gave me a 404 error. Noticing the URLs, I saw that every one ended with frozen-embryo-transfer-fet. While the websites it chose (RESOLVE, FertilityIQ, and ASRM) are amazing organizations that provide amazing resources to patients, ChatGPT clearly just attached the keywords I gave it to the websites and made up a webpage. This is a demonstration of a phenomenon known as hallucination. You can read more about it here.
ChatGPT is Pretty Amazing...
As I used ChatGPT, I was amazed by a few things right off the bat.
First off, the speed of the answers was faster and more personalized than any other search engine I have used. You can see why Microsoft wants to use GPT-4 in the Bing search engine and other Microsoft products. It is hard to imagine that FET plans are widely available on the internet and ChatGPT was trained on a large dataset with a lot of information on this topic. But that did not matter and ChatGPT gave me varying degrees of detail in answers almost instantaneously.
Second, the empathy that the tool provided was pretty striking. There were times where you could imagine a nurse or trained professional who deals with fertility patients providing the exact same response. The ability for an algorithm to appear to relate to a profoundly human experience is a pretty amazing concept. It also is slightly unsettling.
Finally, the ability for ChatGPT to take a large amount of complex information and quickly digest and summarize is an amazing capability and probably the most beneficial component of the tool available today. There are plug-ins users can download to digest PDFs and other documents (such as unstructured health records) and quickly summarize or find essential information. I know people are already doing this today.
...But ChatGPT Has Limitations for Fertility Patients
However, there are some major limitations that patients need to consider when using ChatGPT or any public facing large language model.
First, the ability for ChatGPT to hallucinate and just make up information cannot be discounted as a major liability.
There is no reason to think ChatGPT would not make up facts and responses to other questions if it so easily just made up resources when prompted. It is really important for patients to try to double check any outputs from ChatGPT from trusted, scientifically validated resources.
I know this is very hard to do for most patients, which can limit the utility of this as a tool.
When I double checked all of the descriptions and uses of the medications ChatGPT listed, none appeared incorrect. However, some of the claims were not verifiable or settled. Some of the steps have disputed efficacy according to studies I found online. I decided not to go into detail about this, but instead flag it as a caution and point that patients once again need to double check the truth of outputs from ChatGPT. Perhaps a clinician in the fertility community would want to collaborate on future writing to check the clinical efficacy of ChatGPT as a patient advocacy tool in fertility treatment? If so, email me at firstname.lastname@example.org.
Second, and maybe most importantly, currently the data that you input ChatGPT leaves your protection the second you use it and frankly, we don't know where it could end up.
Reproductive health information is amongst the most sensitive information and it is currently being weaponized in states across the US. Most people do not understand the limitations of laws like HIPAA and ChatGPT or many of the other tools you can use right now are not subject to these protections.
I used dummy data for a reason for this test. We know this will change as healthcare companies integrate GPT-4 and other large language models into their clinical systems that must comply with HIPAA protections. But for now, it is the wild west when it comes to data protections.
The Future of ChatGPT and Fertility Advocacy
In the end, ChatGPT and the GPT-4 large language model is a fascinating tool that holds immense promise in helping fertility patients. In the coming years, these models will be used to help reduce administrative burdens clinicians face, streamline paperwork, provide personalized educational resources to patients, and even help the development of personalized protocols that can increase successful pregnancy rates for patients.
I know I will use it to help outline, draft, and edit copy (I even used it to help draft this article) and there are times I would consider using it as a tool to help with my own health. I believe it has immediate utility for fertility patients and a few of the use cases I described above really can help people. I would absolutely use a closed version of GPT-4 to help augment the capabilities of my own fertility application, Grain Fertility, that I am developing and will most likely do so at some point in the future.
However, it cannot be overstated that displaying incorrect information about a patient’s health information and failing to sufficiently protect it are severe limitations that inhibit the utility of the tool. If ChatGPT gives a patient wrong information and they decide to take it to their doctor, convinced that the AI is correct, it could have a detrimental impact on that relationship. If a person’s fertility information was inadvertently exposed on the internet and data they uploaded into ChatGPT was leaked, the harm can be irreparable.
I look forward to the day when AI is a trusted tool that helps reduce the cognitive burdens fertility patients face today and believe this will happen very soon. Until then, we should all have an optimistic, but cautious, approach to using it as a sole resource.