Building a scalable web scraper for a large number of different websites

Închis Postat la acum 3 ani S-au achitat serviciile după ce au fost prestate
Închis

The goal of the project is to build a scalable web scraper which should scrape data from more a dozen different websites at first. Later on, it should be possible to upscale the scraper to a few thousand websites.

Those websites are known and should be added iteratively to the scraper. The websites have a different structure each which is why the development and maintenance costs per site need to stay as small as possible. The aim is to scrape the websites on a weekly basis at first. Later on, the scraping intervals should be reduced to a daily basis or even shorter. The scraped data needs to be stored in an useful and efficient way in a database in the cloud. Furthermore, the scraping must be intolerant to changes in the designs of the websites and it must prevent being blocked.

Currently, a simple scraper in Python exists which can scrape a few websites by using the Selenium library. However, this does not need to be continued at all cost.

The following tasks are part of your engagement for the project:

o Developing a modular and scalable software architecture for the web scraping project (preferably with Python)

o Containerizing the program in Docker

o Deploying and managing the containers in the cloud, probably with AWS and Kafka

o Implementing different measures to prevent blacklisting and being blocked

o Setting up a SQL database, probably PostgreSQL with AWS

The following tasks might be part of a further engagement:

o Implementing the web scrapers for a large number of different websites

o Maintaining and monitoring the scrapers for the websites

o Adding a web crawler to find additional websites

o Parsing the stored data and processing them into a more useful format

Your qualifications:

o Web Scraping (Importance: 9/10)

o Python (Importance: 7/10)

o Docker (Importance: 8/10)

o AWS (Importance: 5/10)

o Kafka or other Pipelining/Queuing Tools (Importance: 8/10)

o Cloud Databases (Importance: 6/10)

o PostgreSQL (Importance: 10/10)

You are expected to work closely together with our developer in Germany. The tasks above need to be coordinated and done in cooperation with him. Therefore, a willingness to work between 10 AM and 10 PM Central European Time is required.

We wish to get to know you first by working together in a limited project scope. If you are a fit for our team, we are willing to intensify our cooperation with you and hire you for future projects.

Web Scraping Python Docker Servicii Web Amazon PostgreSQL

ID Proiect: #28930972

Detalii despre proiect

8 propuneri Proiect la distanță Activ acum 3 ani

8 freelanceri plasează o ofertă medie de 10€/oră pentru proiect

TheScorpion93

we are using python in scraping Please, contact me and send me the link to the site so I could make a FREE SAMPLE Please, contact me and send me the link to the site so I could make a FREE SAMPLE Hi there, I’ve read Mai multe

€8 EUR / oră
(100 recenzii)
6.7
yanakhokhlova199

Hello there. I am very interested in your project. *** As web scraping and python expert ***. I can handle this and am confident of winning. So I have rich experience in scraping app development with python , seleni Mai multe

€10 EUR / oră
(7 recenzii)
4.9
amineghennou3

Hello, This is Amine from Malaysia, a full stack web developer, who has working 5 years of working experiences in this field. I am fully feeling comfortable working with Python, web Scraping, AWS, PostgreSQL.. I will Mai multe

€10 EUR / oră
(4 recenzii)
3.8
webxtor

Hello. An experienced web extractor doing projects mainly in PHP but Python might also be an option. Thanks for considering Eugene

€15 EUR / oră
(6 recenzii)
4.0
stepinnsolution

Hi Sir Nice to meet you i am expert in python with web scraping at high level. I agree with your time zone confidential level of skiils you wrote above. Plase come in chat and show me details

€12 EUR / oră
(1 părere)
4.1
sokolovicstefan3

Hi, there. Here is an expert web scraping and automation developer who is very familiar with python/Selenium. After checking your job description and skill set, I found this job suits me as well. I can work in the tim Mai multe

€12 EUR / oră
(4 recenzii)
3.4
joseji

This project really caught my eyes. I have the required qualification to do this work. I will be working with python using scrapy framework. There are really javascript heavy website nowadays which really makes it diff Mai multe

€8 EUR / oră
(15 recenzii)
3.1
Krishnamuthyam9

I have strong experiance on below, please give chance to work on this project. qualifications: o Web Scraping (Importance: 9/10) o Python (Importance: 7/10) o Docker (Importance: 8/10) o AWS (Importance: 5/10) o Kafka Mai multe

€6 EUR / oră
(0 recenzii)
0.0