Python Script for Getting Data You Need From Domain Names Lists

Tips for Using the Domains Index’s Lists

Today I would want to show you an interesting way of getting the data you need for your сustomized request. It is super easy!
For example, if you need to have the list with all domain names which contain in their names the word “domains”, using the script below and any of our domain lists you want to analyze, you can get a bunch of files with scattered statistics and selected data. Just install the script and put “your request description” and a domain list you recently bought from us (for example, the whole ccTLD list CSV file) into the script:

#the word/phrase or combination we are looking for in domain names
query = "domains"
#specify the list name
domain_list = "cctld.csv"
import pandas as pd
import numpy as np
query_full = []
query_entries = []
tlds = []
from collections import defaultdict
counter_full = defaultdict(int)
counter_entries = defaultdict(int)
with open(domain_list,'rb') as f:
for x in f:
if x.count('.') == 1:
tld = x.split(".",1)[1].rstrip('\r\n')
dname = x.split(".",1)[0].rstrip('\r\n')
#looking for the domain names fully equal to the query and marking the TLDs with such names in dictionary, creating a list of tuples (name,tld)
if query == dname:
counter_full[tld] = 1
query_full.append((dname,tld))
#looking for the domain names with the query string and marking the TLDs with such names in dictionary, creating a list of tuples (name,tld)
if query in dname:
counter_entries[tld] += 1
query_entries.append((dname,tld))
#creating a total list of TLDs found un domain_list
if tld in tlds:
pass
else:
tlds.append(tld)
#adding zeros to the dictionary values for TLDs without fully equal to the query domains.
for tld in tlds:
if tld in counter_full:
pass
else:
counter_full[tld] = 0
import pandas as pd
import numpy as np
#creating dataframe from dictionary
count_query_entries = pd.DataFrame(counter_full, index=[query])
count_query_entries.head()
tld_with_max = max(counter_entries.iterkeys(), key=lambda k: counter_entries[k])
print "{} maximum names with - {} - are in - {} - TLD".format(counter_entries[tld_with_max], query, tld_with_max)
import pylab as pl
import numpy as np
X = np.arange(len(counter_entries))
pl.bar(X, counter_entries.values(), align='center', width=0.5)
pl.xticks(X, counter_entries.keys())
ymax = max(counter_entries.values()) + 1
pl.ylim(0, ymax)
pl.show()
#adding zeros to the dictionary values for TLDs without query in a names
for tld in tlds:
if tld in counter_entries:
pass
else:
counter_entries[tld] = 0
import pandas as pd
import numpy as np
#append dataframe with counter_entries
count_query_entries = count_query_entries.append(pd.DataFrame(counter_entries, index=['has "{}" in name'.format(query)]))
count_query_entries.tail()
#saving query usage statistics in domain names to a csv file
fname = 'usage_of_{}_statistics.csv'.format(query)
count_query_entries.to_csv(fname, sep='\t', encoding='utf-8')
#saving the list of tld's without registred query domain names
tld_with_0 = []
for tld in counter_full:
if counter_full[tld] == 0:
tld_with_0.append(tld)
fname = 'list_of_tlds_without_{}.csv'.format(query)
save_f = open(fname, 'w')
for item in tld_with_0:
save_f.write("%s\n" % item)
#saving the list of domain names consists only from query string
fname = 'list_of_tlds_with_full_{}.csv'.format(query)
save_f = open(fname, 'w')
for name,tld in query_full:
save_f.write('{}.{}\n'.format(name,tld ))
#saving the list of domain names includes query string
fname = 'list_of_tlds_with_partially_{}.csv'.format(query)
save_f = open(fname, 'w')
for name,tld in query_entries:
save_f.write('{}.{}\n'.format(name,tld ))

The results of the script are combined in four CSV files:

1. List of domain names consisting from query keyword(sample list_of_tlds_with_full_domains):

fullnames list

2. List of domain names containing a query keyword (sample list_of_tlds_with_partially_domains):

partialnameslist

3. List of zones (TLDs), were domains names are consisting from the whole keyword are still available for registration (sample list_of_tlds_withoutdomains):

list_of_tlds_withoutdomains

4. Statistical information for all TLDs about names consisting or containing query keyword (sample usage_of_domains_statistics):

usegeof in names

So, you simply could have very helpful information with alive names according to your demand.

Enjoy and give us your feedback about how useful do you find it and how you used it in everyday tasks.

Thanks and have a good one!