Find Jobs
Hire Freelancers

Improve webpage scrapping solution -- 2

$30-250 USD

Închis
Data postării: peste 3 ani în urmă

$30-250 USD

Plata la predare
I have a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string; and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information. This program does the basic functionality of extracting the information but has a few problems: It depends on an external non-Java component: Chrome WebDriver It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed. Deliverables You will get the current program Java code and you will need to solve the problems above. To do so, you will need to: A. Find out how to authenticate and refresh the 3 headers and the query string without depending on Selenium, Chrome Webdriver, and BrowserMobProxy. As most of this data is likely generated in JavaScript, you will need knowledge about JavaScript and how to execute JavaScript from within Java or convert the JavaScript code to Java (preferable solution). B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes. Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.
ID-ul proiectului: 26802689

Despre proiect

5 propuneri
Proiect la distanță
Activ: 4 ani în urmă

Vrei să câștigi bani?

Avantajele de a licita pe platforma Freelancer

Stabilește bugetul și intervalul temporal
Îți primești plata pentru serviciile prestate
Evidențiază-ți propunerea
Te înregistrezi și licitezi gratuit pentru proiecte
5 freelanceri plasează o ofertă medie de $177 USD pentru proiect
Avatarul utilizatorului
Hi, Please see my previous project done in Java. I'm expert in Java and have good knowledge in scrapping methods using selenium. I', also fully familiar with JavaScript (client and nodeJS server side) and so may use this knowledge to converting your mentioned javascript to java to remove currently using third party tools from your program. as a starting point, May I have the website URL and which data currently the program scrape? Regards, Ramin.
$250 USD în 10 zile
4,9 (84 recenzii)
5,5
5,5
Avatarul utilizatorului
I Specialize in API scripts. Have worked on financial data, eCommerce sites, B2B sites, real estate data. Have 150+ 5/5 star reviews on another platform for API related work and very good with captcha work..
$190 USD în 6 zile
0,0 (0 recenzii)
0,0
0,0
Avatarul utilizatorului
Good day. I'm interested in your project. I have about 3 years of scraping and 6 years of python programming experience. A big plus of using python is that everything will be automated and that I can write a program quickly. I use all modern scraping libraries like: beautifulsoap, selenium, request, scrapy and so on. I can deliver you the result in a form convenient for you: json, csv, txt, sql database. I also worked on large projects, and scraped large sites such as: Amazon, Alibaba, YouTube and so on, so I know how to work with large amounts of data. If I suit you as a specialist, we can discuss the project in more detail.
$140 USD în 7 zile
0,0 (0 recenzii)
0,0
0,0

Despre client

Steagul ROMANIA
Băilești, Romania
5,0
1
Membru din mar. 8, 2020

Verificarea clientului

Mulțumim! Ți-am trimis prin e-mail linkul pe care trebuie să-l accesezi pentru a revendica creditul gratuit.
A apărut o eroare la trimiterea e-mailului. Încearcă din nou.
Utilizatori înregistrați Totalul proiectelor postate
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Se încarcă previzualizarea
S-a oferit permisiunea de depistare a locației.
Ți-a expirat sesiunea pentru conectare sau te-ai deconectat. Conectează-te din nou.