Find Jobs
Hire Freelancers

Scrape information from web pages -- 2

$30-250 USD

Anulat
Data postării: peste 7 ani în urmă

$30-250 USD

Plata la predare
I need this project to be completed as soon as possible. It requires a programmer with well-developed web scrapping skills. If interested, please send me: (i) A bid; (ii) An estimate of how long this will take you; and (iii) A very brief explanation of how you will execute this task. These are the instructions in detail: 1. The comma-delimited text file “[login to view URL]” is a list of 12977 names with 4 columns: ROWID, NOMBRES, APELLIDO_PATERNO, and APELLIDO_MATERNO. 2. For each row in [login to view URL], go to [login to view URL] and enter the NOMBRES, APPELIDO_PATERNO, and APELLIDO_MATERNO in the search engine. Then click on “buscar”. 3. Click on the person that EXACTLY matches the information entered in the step above. (see [login to view URL] for more information on this). 4. Click on the PROCESOS ELECTORALES tab. (URL finishes in “IdTab=1”). Check if the politician was a mayoral candidate (i.e., either “ALCALDE DISTRITAL” or “ALCALDE PROVINCIAL”) for the election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. You will see these in the sub-table (see [login to view URL]). If yes, go to 5. If not, move on to the next name. 5. Click on the “HOJA DE VIDA” of that corresponds to the 2014 election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. This link is embedded in the PROCESOS ELECTORALRES sub-table. The link in the uppermost part of the webpage saying “ver hoja de vida” is NOT the one we want. 6. Scrape all the data found in the HOJA DE VIDA. The freelancer will need to make sure that his/her code extracts *all* the information available. Also, the freelancer will figure out the best way for him/her to report the scrapped data. I suggest a rectangular format (or several tables) where each row correspond to a politician and each column to an item of the HOJA DE VIDA. The key is that I will need to be able to link each piece of information to a rowid in [login to view URL] and the politician id that can be found in the URL of PROCESOS ELECTORALES (IdPolitico). 7. Save the PROCESOS ELECTORALES tab (URL finishes in “IdTab=1”) as HTML with the name “IdTab1_IdPolitico#.html, where # is the politician’s id number. Do the same for the HISTORIAL PARTIDARIO tab (URL finishes in “IdTab=0”). Save that web page as HTML with the name “IdTab0_IdPolitico#.html”. 8. Record all your steps in “[login to view URL]”. The idea is to save all the URLs from which information was downloaded and the corresponding file names. See the attached example for details. 9. I am attaching and example ([login to view URL]), the name list, and further clarifications. Please, do take a detailed look at each of these. Also, use the example logfile I provide as a template for yours. The deliverables for this project are: a) All downloaded files. b) Dataset(s) with the scraped information of the HOJAS DE VIDA (XLSX). c) A complete logfile (XLSX). d) The code you used to download the information. Thanks,
ID-ul proiectului: 12073602

Despre proiect

5 propuneri
Proiect la distanță
Activ: 7 ani în urmă

Vrei să câștigi bani?

Avantajele de a licita pe platforma Freelancer

Stabilește bugetul și intervalul temporal
Îți primești plata pentru serviciile prestate
Evidențiază-ți propunerea
Te înregistrezi și licitezi gratuit pentru proiecte
5 freelanceri plasează o ofertă medie de $74 USD pentru proiect
Avatarul utilizatorului
Hi there, I have read the project description.. I will write a scraper script/software to do the job. will provide both data and script. Let me know & we can discuss details.. Thanks..
$100 USD în 1 zi
5,0 (118 recenzii)
6,2
6,2
Avatarul utilizatorului
Text me if you are OK with my bid
$111 USD în 2 zile
0,0 (0 recenzii)
0,0
0,0

Despre client

Steagul UNITED STATES
Durham, United States
5,0
3
Metoda de plată a fost confirmată
Membru din aug. 6, 2016

Verificarea clientului

Mulțumim! Ți-am trimis prin e-mail linkul pe care trebuie să-l accesezi pentru a revendica creditul gratuit.
A apărut o eroare la trimiterea e-mailului. Încearcă din nou.
Utilizatori înregistrați Totalul proiectelor postate
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Se încarcă previzualizarea
S-a oferit permisiunea de depistare a locației.
Ți-a expirat sesiunea pentru conectare sau te-ai deconectat. Conectează-te din nou.