Are Fake Papers Prior Art?

TL;DR

Because cited non-patent literature (NPL) may be increasingly AI-generated, patent practitioners should examine NPL more closely and develop enablement-based strategies for challenging questionable references.

Overview

A recent Harvard Kennedy School study¹ on AI-generated scientific papers infiltrating Google Scholar raises important considerations for patent attorneys and IP counsel regarding prior art rejections under § 102 and § 103.

There is a long and often entertaining history of fictional science being cited as prior art in patent rejections and invalidations, including an allegation of iPad being anticipated by a disclosure in the film “2001: A Space Odyssey”² and various patent rejections based on Star Wars, Iron Man, and Donald Duck.³  While published fictional works may be used as prior art, these fictional works still represent human conception of ideas.  In contrast, AI-generated text presents the novel challenge of a printed publication that may not be based on a conception of an idea.

I have been interested in the ethics of fake scientific papers since my days working as an engineer,⁴ which is why I find Harvard’s research paper fascinating.  The researchers identified 139 questionable papers likely produced using GPT without disclosure, with 62% failing to declare AI use.  These papers have been published to both non-indexed journals and established publications, including some in indexed journals and conference proceedings.

The researchers discovered these papers by searching Google Scholar for telltale phrases like “as of my last knowledge update” and “I don’t have access to real-time data,” which you probably recognize as common responses from ChatGPT.  The researchers then analyzed the papers using multiple coding methods to classify them by subject area and determine whether they showed signs of fraudulent or underreported AI use.

Patent Attorney Implications

As a patent attorney, I am concerned about the implications for prior art verification.  MPEP § 2120 and § 2128 focuses on “public accessibility,” providing that most NPL publications available to the public may qualify as a “printed publication” that may be used to support a rejection under § 102 or § 103.  However, the MPEP does not seem to provide the ability to overcome a cited reference based on the validity or source of the reference, which creates significant vulnerabilities with AI-generated content:

  1. Public Accessibility Without Quality Control:  The study reveals that Google Scholar “lacks the transparency and adherence to standards that usually characterize citation databases” and “the inclusion criteria are based on primarily technical standards.”  This means AI-generated papers easily meet the MPEP’s “public accessibility” standard despite potential fabrication.
  2. No Peer-Review Requirement:  The study found that “most questionable papers we found were in non-indexed journals or were working papers,” yet these would still qualify as prior art under MPEP guidelines despite lacking quality controls.
  3. Impractical or Impossible Retraction: The researchers discovered that “most of the identified papers exist in multiple copies and have already spread to several archives, repositories, and social media,” and determined “[i]t would be difficult, or impossible, to remove them from the scientific record.” This creates a permanent prior art problem that current MPEP provisions do not adequately address.
  4. Evidence Hacking Vulnerability:  The researchers coin the term “evidence hacking” to refer to the “strategic and coordinated malicious manipulation of society’s evidence base.”  This evidence hacking was focused on policy-relevant subjects susceptible to influence operations, such as computing (23%), environment (19.5%), and health (14.5%).  The MPEP’s focus on accessibility rather than reliability fails to protect against this type of evidence hacking.

Recommendations:

Given these MPEP limitations, patent professionals should:

  1. Implement improved NPL verification beyond MPEP requirements when evaluating scientific literature cited as prior art.  For example, while the first verification should probably still be to verify the prior art date against your earliest priority date, the second step may include determining whether any the NPL is from a published issue of a peer-reviewed publication.⁵
  2. Develop strategies for challenging examiner rejections based on potentially AI-generated prior art, focusing on enablement issues.  For example, does the reference clearly and unambiguously disclose the claimed invention?  Is the prior art enabled under 112(a) principles, where a person of ordinary skill in the art be able to make and use the invention based on the reference?  (Enablement is not required for cited prior art, but can provide a useful framework for analyzing whether the reference provides the teaching alleged by the Office Action.)
  3. Advocate for updated MPEP guidance that addresses the unique challenges of AI-generated content.  Examiners might have their hands full at the moment, but it may be worth mentioning these NPL concerns to SPEs and USPTO policy-makers.

Has your team encountered AI-generated papers during prosecution?  How are you addressing this challenge given the MPEP’s limitations?


[1] https://misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/

[2] https://www.npr.org/sections/thetwo-way/2011/08/24/139925696/samsung-objects-to-ipad-patent-saying-stanley-kubrick-came-up-with-it-first

[3] https://belda.blog/2022/02/27/when-the-cited-prior-art-is-not-run-of-the-mill/

[4] Back in 2009, long before the current era of ChatGPT-assisted homework, I wrote a simple deterministic program (i.e., non-probabilistic, non-generative-AI) to produce a self-generating academic paper.  The exercise was mostly to show that a simple text program can be used to retrieve online data, analyze that data, and generate a professional-looking (if rudimentary) academic paper analyzing current ionospheric activity, which was the focus of my undergraduate research.

I was inspired by and gave credit to the legendary fake paper generator SciGen (https://pdos.csail.mit.edu/archive/scigen/) and the epistemological Sokal Affair (https://en.wikipedia.org/wiki/Sokal_affair).  Unlike SciGen, which “generates random Computer Science research papers” (whose aim is “to maximize amusement, rather than coherence”), the text of my paper was pre-written.  The only deterministically generative aspects of my paper were the download and ionospheric numerical analysis of RINEX and US-TEC data in Octave, automatic generation of resulting plots, and compilation of the plots and pre-written content in LaTeX.  (As an aside, if you have ever been asked to write a formula or full homework assignment in non-user-friendly LaTeX, then you might have also been motivated to write your own semi-automated program.)

[5] A recent Patently-O article examined a recent Federal Circuit decision that seem to increase the standard required for determining that a later-filed patent is supported by an earlier-filed priority parent patent application for purposes of prior art, however this may be limited to pre-AIA patents: https://patentlyo.com/patent/2025/03/federal-redefines-requirements.html.