site to scrape / harvest job sites, search engines and emails

Project Description:

site to scrape / harvest job sites, search engines and emails


this web site is for my personal use only, for now. it runs job site job searches on job titles to find company names, then search engine searches on the company names to find urls, then walks through the company web site in search of email addresses for applicants to send a resume, and sends my resume. it can also be configured as 2 sites, one of which receives the email addresses from the other and sends them.

detailed specifications, a list of 31 job sites, and files of keywords that are checked during the searches are attached. on each job site the company names are listed in the search results, so we get 20 complete jobs (instead of 1) per page fetched. there are 6 options described below.


please bid on the cost of the system with the first job site, the cost per additional job site, and the cost of the system with all 31 job sites. milestones for system with one job site:

1. 10% screens and technical design
a. mockup of all screens & fields
b. annotated diagram of flow through screens
c. programming languages and systems used
d. database software & structure, application tables & fields.
e. programs/subroutines: name, language, purpose, input, output, variables
2. 30% i, ii
3. 30% iii, v
4. 30% iv, vi

i. create data

there are 2 background jobs that create company names, urls and email addresses. continual job runs forever, repeating the searches when done. it is automatically restarted after being interrupted by system failure, unless i stopped it last. single job goes through selected job sites and job titles only once and is not started up automatically.

select which job sites, job titles and page numbers to include, and which data to generate: company names (job search), urls (search engine search), emails (company site search) and send (send my resume to the email address.)

you can start, stop and monitor each job, and redefine the single job in which you reset the data created to empty and select the job sites and job titles again. the job sites and job titles are displayed and the ones accessing now are highlighted. rectangles fill up with the company names, urls and email addresses gathered by the continual job (if chosen) since we began its monitor, or by the single job since it was defined. emails sent are blue. a spreadsheet lists 19 columns which can be edited or downloaded as described in ii.1-3:

1. line number
2. job site domain
3. job site run # (# of times job title 1 page 1 has been accessed by the job site.)
4. job title
job search result:
5. company name
6. job link: the url of the page that describes only this 1 job.
7. job text
8. job keys found in the job text.
9. date / period company name was created
search engine result:
10. url
11. result text
12. search result keys found in the result text
13. date / period url created
company site:
14. email address
15. email text
16. email keys found in the email text
17. email page link
18. date / period email created
send email
19. date / period sent

ii. detailed report: the data can come from 3 sources:

a. historical data created by the continual job during a selected from/to date/period.
b. single job data after it is finished
c. single job data while it is still being created, showing the current data changing real-time.

1. the spreadsheet is displayed.
2. the user may edit it:
a. remove a column
b. require a column: any row with this column blank is removed.
c. sort by a column
3. it can be downloaded:
a. tab delimited text file of rows
b. printable document

iii. timeline report: a spreadsheet with job sites across, and date plus time period for a given from/to time interval down, lists the number of jobs checked and percentage of company names that were new.

iv. productivity report: a spreadsheet with job titles across and job site down lists the number of jobs checked and percentage of company names that were new for a given from/to time interval.

v. queues: display the numbers of company names, urls and email addresses seen, and the number not processed by running a search engine search, walking through the company website or sending the email, and the percentage that are not processed.

vi. system parameters: files and numbers that control the runs.

documentation with each milestone: programming languages/systems used, programs/subroutines: purpose, input/output.

ownership and confidentiality agreement
Skills Required:
Project Stats:

Price Type: Negotiable

Total Proposals: 2
1 Current viewersl
9 Total views
Project posted by:


Proposals Reputation Price offered
  • 4.9
    95 Jobs 45 Reviews
    $0 in 0 Day