Building a scalable web scraper for a large number of different websites
€6-12 EUR / oră
The goal of the project is to build a scalable web scraper which should scrape data from more a dozen different websites at first. Later on, it should be possible to upscale the scraper to a few thousand websites.
Those websites are known and should be added iteratively to the scraper. The websites have a different structure each which is why the development and maintenance costs per site need to stay as small as possible. The aim is to scrape the websites on a weekly basis at first. Later on, the scraping intervals should be reduced to a daily basis or even shorter. The scraped data needs to be stored in an useful and efficient way in a database in the cloud. Furthermore, the scraping must be intolerant to changes in the designs of the websites and it must prevent being blocked.
Currently, a simple scraper in Python exists which can scrape a few websites by using the Selenium library. However, this does not need to be continued at all cost.
The following tasks are part of your engagement for the project:
o Developing a modular and scalable software architecture for the web scraping project (preferably with Python)
o Containerizing the program in Docker
o Deploying and managing the containers in the cloud, probably with AWS and Kafka
o Implementing different measures to prevent blacklisting and being blocked
o Setting up a SQL database, probably PostgreSQL with AWS
The following tasks might be part of a further engagement:
o Implementing the web scrapers for a large number of different websites
o Maintaining and monitoring the scrapers for the websites
o Adding a web crawler to find additional websites
o Parsing the stored data and processing them into a more useful format
Your qualifications:
o Web Scraping (Importance: 9/10)
o Python (Importance: 7/10)
o Docker (Importance: 8/10)
o AWS (Importance: 5/10)
o Kafka or other Pipelining/Queuing Tools (Importance: 8/10)
o Cloud Databases (Importance: 6/10)
o PostgreSQL (Importance: 10/10)
You are expected to work closely together with our developer in Germany. The tasks above need to be coordinated and done in cooperation with him. Therefore, a willingness to work between 10 AM and 10 PM Central European Time is required.
We wish to get to know you first by working together in a limited project scope. If you are a fit for our team, we are willing to intensify our cooperation with you and hire you for future projects.
ID Proiect: #28930972
Detalii despre proiect
8 freelanceri plasează o ofertă medie de 10€/oră pentru proiect
we are using python in scraping Please, contact me and send me the link to the site so I could make a FREE SAMPLE Please, contact me and send me the link to the site so I could make a FREE SAMPLE Hi there, I’ve read Mai multe
Hello there. I am very interested in your project. *** As web scraping and python expert ***. I can handle this and am confident of winning. So I have rich experience in scraping app development with python , seleni Mai multe
Hello, This is Amine from Malaysia, a full stack web developer, who has working 5 years of working experiences in this field. I am fully feeling comfortable working with Python, web Scraping, AWS, PostgreSQL.. I will Mai multe
Hello. An experienced web extractor doing projects mainly in PHP but Python might also be an option. Thanks for considering Eugene
Hi Sir Nice to meet you i am expert in python with web scraping at high level. I agree with your time zone confidential level of skiils you wrote above. Plase come in chat and show me details
Hi, there. Here is an expert web scraping and automation developer who is very familiar with python/Selenium. After checking your job description and skill set, I found this job suits me as well. I can work in the tim Mai multe
I have strong experiance on below, please give chance to work on this project. qualifications: o Web Scraping (Importance: 9/10) o Python (Importance: 7/10) o Docker (Importance: 8/10) o AWS (Importance: 5/10) o Kafka Mai multe