Domaining, Programming

Domain Lead Generator #3 – The Better Automated Process

This is the 3rd post of a series. If you haven’t read the previous posts, here they are :

Domain Lead Generator – Intro
Domain Lead Generator #1 – The Manual Process
Domain Lead Generator #2 – The Automated Process

In the previous post, I talked about the program introduced by the company and it’s flaws. Here, I’ll take you through how I made my own better version of the software using Python. If you haven’t started using Python yet, you should read this.

The whole source code is available on GitHub.

What I wanted my program to do:

1. Take a file named “keywords” as input.
This is exactly same as the program made by the company. Here’s the Python code to read the file and store the keywords in a list

keywordlist = []
f = open('keywords','r')
for line in f:
    keywordlist.append(line)

2. Run a Google search and pull URLs for each of the keywords.
Google doesn’t allow bots to run search queries. So I had to use an external module GoogleScraper.py to accomplish this.

Click here to read more about GoogleScraper.py

GoogleScraper.py has a function scrape that returns the URLs on a search page. Here’s the function geturls that returns the domain names for a specific keyword.

def geturls(keyword,results_per_page,pages):
    result = []
    temp = scrape(keyword,results_per_page,pages,0)
    for url in temp:
        #Extracting only domain names from URL
        hostname = url.hostname.split(".")
        hostname = ".".join(len(hostname[-2]) < 4 and hostname[-3:] or hostname[-2:]) # 
        result.append(hostname)
    return result

4. Do WHOIS searches for each of the domain names

I used FreeWHOIS.US to do WHOIS searches. Here is the function that creates the URL to pull WHOIS info from. It takes the domain name as an input.

def whois_urlcreator(domain):
    #base_url="http://www.whoisfly.com/"
    base_url="http://www.freewhois.us/index.php?query="
    fullurl=base_url+domain+"&submit=Whois"
    return fullurl

Next, I use the URL generated to obtain the WHOIS info.

def getwhoisinfo(whoisurl):
    f=urllib.request.urlopen(whoisurl)
    try:
        result = f.read().decode('utf-8')
    except:
        result = ""
    return result

After that, I extract only the e-mail IDs from the WHOIS info obtained.

def getwhoisemail(whoisinfo):
    r = re.compile("[-a-zA-Z0-9._]+@[-a-zA-Z0-9_]+.[a-zA-Z0-9_.]+")
    results = r.findall(whoisinfo)
    return list(set(results))

4. Store domain names, statuses and email Ids in a excel file.
For each domain name, I create an object with the following definition:

class excelitem(object):
    def __init__(self,domain,status,emails):
        self.domain = domain
        self.status = status #status=0 if no emails found, 1 if any e-mails found
        self.emails = emails
    def showstatus(self):
        print (self.status)

I use the functions listed above to create a list of objects from the keywords file.

After this comes the main improvement over the company’s software. I write all these objects to an excel file. When I view this file, I can know exactly which domains I’ve been able to get WHOIS emails for. And later I can fill in the rest by doing manual WHOIS searches.
Here’s a sample of how the excel file will look like:

Excel File generated by Domain Lead Generator

Excel File generated by Domain Lead Generator

Shortcomings :

1. WHOIS info for GoDaddy domains, Special TLDs
GoDaddy stopped providing complete WHOIS info to third-party sites and now requires that you search using their site only and enter a captcha for every search you do. Similarly certain TLDs such as .com.au require a captcha too. For these, I’ll have to manually update the Excel file.

The complete project is available on GitHub

Advertisements
Standard

4 thoughts on “Domain Lead Generator #3 – The Better Automated Process

  1. Surya Kencono says:

    Nice program. I’d like to try your software. How can I put your source code into operation. I use Windows. Can you help me?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s