Protecting your website from AI content scraping.

14 March 2023 min SEO

Don't let ChatGPT harm your content and SEO - find out how to protect your website from scraping and unauthorised usage with these simple steps.

In today's digital landscape, website owners must be proactive in protecting their content from scraping and unauthorised usage. ChatGPT, a large language model developed by OpenAI, is a tool that can easily scrape and replicate website content, potentially causing harm to your Search Engine Optimisation (SEO) and brand reputation

However, with a few simple steps, website owners can effectively block ChatGPT from using their content. 

Using Robots.txt File

The robots.txt file is a standard file used to communicate with search engine crawlers, such as Googlebot, about which pages on your website should not be crawled. 

To block ChatGPT completely from scraping your website, simply add the following code to your robots.txt file: 

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

To block ChatGPT partially, add the following code:

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

User-agent: ChatGPT-User
Allow: /directory-1/
Disallow: /directory-2/

This will prevent AI Crawlers like ChatGPT from accessing any pages on your website, ensuring that your content is not used without your permission. 

How to block on WordPress Websites

If you’re lacking the dev resources and don’t know what a robots.txt file is, but are using a WordPress websites, we have an alternate solution for you. This is using through a WordPress Plugin:

Prerequisite: Install the Yoast Plugin

  1. Click on “Yoast SEO” in the menu
  2. Click on “Tools”
  3. Click on “File Editor”
  4. Click on the “Create robots.txt file” button
  5. Copy/paste the above lines into the file
  6. Click on “Save changes to robots.txt”

Implement CAPTCHA 

Another effective method to prevent ChatGPT from scraping your website is to use CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). 

This is a tool that requires users to complete a task that is difficult for a computer to perform, such as solving a simple puzzle or entering text from an image. 

By implementing CAPTCHA on your website, you can prevent ChatGPT from accessing your content and scraping it.

Implement IP Range Blocks  

Another common method that can be considered when blocking Generative AI crawlers, is through the implementation of a blocked IP range based on the latest publications from respective Generative AI Crawlers.

Note: It’s possible to block that IP range through .htaccess, but the IP range can change, which means that the .htaccess file will have to be updated in line with the updates from the Generative AI publications. 

These are the current GPTBot IP ranges as of 08-09-2023:

  • 20.15.240.64/28
  • 20.15.240.80/28
  • 20.15.240.96/28
  • 20.15.240.176/28
  • 20.15.241.0/28
  • 20.15.242.128/28
  • 20.15.242.144/28
  • 20.15.242.192/28
  • 40.83.2.64/28

Follow: https://openai.com/gptbot-ranges.txt to stay up to date. 

How to Opt-Out of ChatGPT Scrapped Data?  

  1. Head to this OpenAI Data Opt-Out Request.
  2. Type in your email address associated with the account.
  3. Enter the Organisation ID.
  4. Type in your Organisation Name found in your ChatGPT settings.
  5. Solve the Captcha, and the data opt-out form will be submitted to OpenAI. 

Note: Check your email to see if a copy of the form has been emailed to your user account as a verification process. For more information on OpenAI’s ChatGPT service, ensure you read through its privacy policy and the ChatGPT terms of use.

Conclusion 

By using these simple methods, website owners can effectively block ChatGPT from using their content, preserving their SEO and brand reputation. Implementing these measures will also help protect your website from other types of content scraping, ensuring that your hard-earned content remains protected. 

Last Update: 03 October 2023

Learn more about our SEO Audit services

Talk to our SEO experts to grow your business. 

Insights & News

Insights 1.

Resolution Digital wins 2x Gold at Cannes in Cairns

Not one but two Gold Crocodile Awards at Cannes in Cairns with MECCA.
Insights 1.

Google Search API Leak: Top 5 Key Findings for Marketers (2024)

Google may have accidentally leaked SEO secrets, and experts Rand Fishkin & Mike King have revealed key findings. We share the top 5 takeaways for marketers to maximise their website's ranking and SEO performance.
Insights 1.

Resolution Digital wins myPlates media account

Following a rigorous competitive pitch of five shortlisted agencies, Resolution Digital has been chosen as personalised number plates business myPlates’ traditional and digital media and SEO agency.

Stay informed with our latest insights & news

Our Offices

Sydney.

Gadigal Country
Bay 7, 2 Locomotive Street
South Eveleigh, NSW, 2015

Visit your Sydney office

Melbourne.

Wurundjeri Country
Level 6, 650 Chapel Street
South Yarra, VIC, 3141

Visit your Melbourne office

Brisbane.

Turrbal and Jagera Country
200 Adelaide Street
Brisbane City, QLD, 4000

Visit your Brisbane office
l

Contact Us