Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How I Built a Back-Link Checker Using ChatGPT and Google Colab

#1
How I Built a Back-Link Checker Using ChatGPT and Google Colab

5/5 – (1 vote)

I recently started a new website and to rank up in the search engines I hired an SEO expert. He suggested several activities out of which one was building lots of backlinks. We agreed on the budget and the timeframe in which he delivers 200 links in a month. These links are a mix of social bookmarks, profile pages, and guest articles. After he delivers the Links, I have to verify the work done and give feedback. The process of checking a URL containing the backlink is as follows –

  • Open a URL from a list of URLs in the sheet
  • Search for my “mydomain.com” in the source code or click on the visible link pointing to my domain
  • In the sheet, mark it as LINK FOUND or LINK NOT FOUND

Only if it was this easy.

The link-building project will go on for several months. He will be building several hundred links so I must have some tool to do the checking for me. So, I came up with the idea of asking ChatGPT to give me a code in Python. Here is my prompt:

Prompt


Give me a Python code that scrapes all given URLs and looks for any link that contains "mydomain.com" and prints out the URL and the link it found

ChatGPT reply: ?

import requests
from bs4 import BeautifulSoup # list of URLs to scrape
urls = ['https://example.com', 'https://example.net', 'https://example.org'] # loop through each URL
for url in urls: # send a request to the URL response = requests.get(url) # parse the HTML content with BeautifulSoup soup = BeautifulSoup(response.content, 'html.parser') # find all links in the page links = soup.find_all('a') # loop through each link for link in links: # check if the link contains "mydomain.com" if 'partymap.in' in link.get('href', ''): # print the URL and link found print(f"{url} - {link.get('href')}")

Choosing Google Colab as Web-Based Python Interface


Now I have to run this code in a web interface. The interface should have an input box and an output box.

I find Google Collab to be the best option to do this in a quick, easy, and time-efficient manner. But I need the code to read the list of URLs from an input widget box. So I returned to ChatGPT and defined my environment along with many improvisations. There were several trials and errors.

Here are some of the prompts

Improvisation Prompt 1: ?‍?

Also add the following Display domains that are duplicate
Display unique list of domains in which the string was not found

Improvisation Prompt 2: ?‍?

I got this error ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

Improvisation Prompt 3: ?‍?

Check for Redirection, if the URL redirects, print "E:REDIRECTED" and skip iteration

Improvisation Prompt 4: ?‍?

I got a mod_security error in request.get, how can I fix it

Improvisation Prompt 5: ?‍?

Add a try catch block around request and beautiful soup

Improvisation Prompt 6: ?‍?

If there are no Links found, print "E:ZERO LINKS" and skip iteration

Improvisation Prompt 7: ?‍?

The list of URLs will come from a google collab input box can you make the change

And there were many more prompts to achieve the final results. But, since I am a Python coder, I could exit the back and forth with ChatGPT and change the code my way.

ERROR/STATUS CODES


Explanation of error codes is as follows

Errors found in URL given in the sheet

  • UNRESOLVED – The URL in the sheet is malformed
  • DUPLICATE DOMAIN – There are multiple URLs from the same domain
  • REDIRECTED – The URL redirected to another URL, if this happens ask the SEO analyst to post the final URL in the sheet

Errors found in Links found in the source code of the URL

  • FOUND – Our domain backlink was found
  • NOT FOUND – Our domain backlink was not found
  • BAD LINK – Our domain backlink was not found
  • ZERO LINKS – No links were found in the source code

I begin each error code with ‘E:’ to easily identify them in sheet for conditional formatting process.

So here is the final code:

The Code


This goes in the first code cell of Google Colab

from IPython.display import display
import ipywidgets as widgets url_box = widgets.Textarea( placeholder='Enter URLs here', description='URLs:', layout=widgets.Layout(width='70%')
) # display the text box widget
display(url_box)

This goes in the second code cell of Google Colab

/enl

import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse # disable SSL certificate verification
requests.packages.urllib3.disable_warnings() # get the input URLs as a list
urls = url_box.value.split()
# create lists to store URLs and domains
scraped_urls = []
unique_domains = []
duplicate_domains = []
notfound_domains = []
inputstring = "" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
} # loop through each URL
for url in urls: parsed_url = urlparse(url) domain = parsed_url.netloc
# add the domain to the list of unique domains if domain not in unique_domains: unique_domains.append(domain) else: # add the domain to the list of duplicate domains if domain not in duplicate_domains: duplicate_domains.append(domain) print("Duplicate domains:", len(duplicate_domains))
print(duplicate_domains)
print() # loop through each URL and check if the backlink exists
for url in urls: inputstring = "" parsed_url = urlparse(url) domain = parsed_url.netloc if not domain: print('E:UNRESOLVED',',',domain) continue if domain in duplicate_domains: print("E:DUPLICATE DOMAIN") continue # send a request to the URL try: response = requests.get(url, headers=headers, verify=False) except Exception as e: print('REQ:',str(e)) # check if the URL is redirecting to "mydomain.com" # check if the response is a redirect if hasattr(response, 'is_redirect') and response.is_redirect: print("E:REDIRECTED",',',domain) continue # parse the HTML content with BeautifulSoup try: soup = BeautifulSoup(response.content, 'html.parser') except Exception as e: print('BS:',str(e)) # find all links in the page links = soup.find_all('a') # print(links) #if no links found if len(links) == 0: print('E:ZERO LINKS',',',domain) continue # loop through each link for link in links: # Get the domain name from the link parsed_url = urlparse(link.get('href', '')) domain_name = parsed_url.netloc # print(domain_name) # domain_name = link.get('href', '') if domain_name: # Check if the domain name is "mydomain.com" if 'mydomain.com' in domain_name: # print(domain_name) inputstring = "FOUND" break else: inputstring = "E:NOT FOUND" # if domain not in notfound_domains: # notfound_domains.append(domain) else: inputstring = "E:BAD LINK" # add the URL to the list of scraped URLs # scraped_urls.append(inputstring) print(inputstring,',',domain)

See the CELL setup in the image. Press play in the first cell. You will get a URL input box. Paste your URLs in it.

Input Box:

https://sketchfab.tld/mydomain https://30seconds.tld/mydomain/
https://speakerdeck.tld/mydomainus
https://www.ted.tld/profiles/<some page>/about
https://dzone.tld/users/mydomainindia.html
https://www.reddit.tld/user/mydomainusa
https://medium.tld/@mydomainusa/about

https://www.intensedebate.tld/people/mydomainusa
https://www.growkudos.tld/profile/<some page>
https://www.universe.tld/users/<some page>
https://www.dostally.tld/post/<some page>
https://www.socialbookmarkzone.info/<some page>
https://app.raindrop.io/my/-1/item/<somepage>/web
https://www.tamaiaz.tld/posts/<somepage>
https://www.socialbookmarkzone.info/<some page>/
https://gab.tld/mydomain/posts/<some page>

Now press Play in the second cell and watch output panel

Output:

Duplicate domains: 5
['www.socialbookmarkzone.tld, 'www.reddit.tld', 'www.instapaper.tld', 'www.wibki.tld', 'diigo.tld'] FOUND , sketchfab.tld
E:BAD LINK , 30seconds.tld
FOUND , speakerdeck.tld
E:BAD LINK , www.ted.tld
FOUND , dzone.tld
E:DUPLICATE DOMAIN
FOUND , medium.tld
FOUND , www.pinterest.tld
FOUND , www.intensedebate.tld
FOUND , www.growkudos.tld
E:ZERO LINKS , www.universe.tld
FOUND , www.dostally.tld
E:DUPLICATE DOMAIN
E:ZERO LINKS , app.raindrop.io
FOUND , www.tamaiaz.tld
E:DUPLICATE DOMAIN
E:NOT FOUND , gab.tld

INPUT BOX CODE [GOOGLE COLLAB]



GOOGLE COLLAB CODE CELL SETUP



PASTE THE OUTPUT IN YOUR SEO TRACKER SHEET in the same line as the URLs & APPLY SPLIT TEXT TO COLUMN



STEPS TO APPLY CONDITIONAL FORMATTING



FINAL OUTPUT



Based on the above output the SEO analyst can rework on the links or drop these sites completely.

If you like the code leave a comment and I am available on Upwork for Prompt Engineering, AI Art jobs. I use ChatGPT, Midjourney, Python and many more tools for my client jobs.

My Upwork profile is https://www.upwork.com/freelancers/~018645334d3b757e4d


?‍? Recommended: 7 Effective Prompting Tricks for ChatGPT



https://www.sickgaming.net/blog/2023/04/...gle-colab/
Reply



Possibly Related Threads…
Thread Author Replies Views Last Post
  [Tut] Prompt Engineers Use This ChatGPT Prompting Formula xSicKxBot 0 1,942 08-29-2023, 04:17 AM
Last Post: xSicKxBot
  [Tut] Alien Technology: Catching Up on LLMs, Prompting, ChatGPT Plugins & Embeddings xSicKxBot 0 1,578 08-18-2023, 09:46 AM
Last Post: xSicKxBot
  [Tut] Auto-GPT vs ChatGPT: Key Differences and Best Use Cases xSicKxBot 0 1,289 08-12-2023, 08:48 AM
Last Post: xSicKxBot
  [Tut] Auto-GPT vs ChatGPT: Key Differences and Best Use Cases xSicKxBot 0 1,531 05-26-2023, 05:13 PM
Last Post: xSicKxBot
  [Tut] How to Integrate ChatGPT on Your Website Easily (Overview) xSicKxBot 0 1,508 05-18-2023, 01:20 AM
Last Post: xSicKxBot
  [Tut] Did ChatGPT Just Kill Freelancing? ? xSicKxBot 0 1,332 05-10-2023, 11:00 AM
Last Post: xSicKxBot
  [Tut] 11 Best ChatGPT Alternatives xSicKxBot 0 1,368 05-07-2023, 11:16 AM
Last Post: xSicKxBot
  [Tut] 47 Fun and Creative ChatGPT Prompt Ideas xSicKxBot 0 1,365 04-08-2023, 08:31 AM
Last Post: xSicKxBot
  [Tut] Annual Income of Prompt Engineers in the US (ChatGPT) xSicKxBot 0 1,402 04-03-2023, 02:54 AM
Last Post: xSicKxBot
  [Tut] 10 Best ChatGPT Books for AI Enthusiasts in 2023 xSicKxBot 0 1,403 04-01-2023, 10:07 AM
Last Post: xSicKxBot

Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016