Find Jobs
Hire Freelancers

Developing a file scraper to scrape firmware files from various vendors websites

€12-18 EUR / hour

Finalizat
Data postării: peste 4 ani în urmă

€12-18 EUR / hour

A python based CLI script that can download all product’s firmware (including all versions) from web pages for a given list of predefined vendors and store the information (meta data) in SQLite [login to view URL] mandatory metadata fields include ( Manufacturer, Model, Version, Type, Name, Release Date(if available), Download link ) i.e. ( Cisco, Video Surveillance 6030 IP Camera, 2.7.0, IP Camera, [login to view URL], 21/08/2015, "link" ) There is a non-mandatory binary field which indicates if the device is discontinued or not depending on the fact that vendor mention that on the website or not. The firmware files itself will be stored in the file system and will be referenced by index ID in SQLite. The arguments to the script should be a list of comma separated vendor names or the location of a text file containing the vendor name. There are no GUI components in the server where the script will run hence headless mode for browser should be used by the script Solution Scope 1. Script will be written per vendor. This is required because each vendor website will have its own implementation of the firmware download page. However, efforts will be put to identify and implement reusable components, if any. 2. The script will only download new firmware that have been added by the vendor. Hence first execution of script will download all the firmware available but the subsequent runs will only download new ones which will get added. This will be achieved by analysing data available in SQLite and skipping the files that are already been downloaded and processed. 3. Each vendor, that will be provided, will be analysed manually to identify the following, which will be required to develop the script: a. URL for the firmware download page b. Credential Requirements (Simple Signups, Specific Signups, No Signups) c. Any Captcha on the page d. Any honeypot traps 4. If there are credential required to download the firmware and the credentials are simple ones where a simple sign up is required, the signup will be done manually as part of the manual analysis using a gmail account dedicated for this work. 5. Script will try to imitate human like behaviour (to a limit) while scraping the web page as well as uses Tor, so that if the vendor site has scraper/crawler detection logic implemented, it can be skipped. This will be achieved by adding random delays, random view time, avoiding honeypot traps through manual analysis Solution Brief A Python Selenium and SQLite based solution will be developed which will have the following features/components: 1. File Management Module: Responsible for storing and managing the downloaded files and meta data. Firmware and installer files will be stored on the filesystem which will have a structured folder hierarchy. Meta data of the files will be stored in SQLite. Meta Data will refer to the stored files through paths on the file system and file index/name. 2. Vendor Scrappers: Python Selenium based scrapper will be written for each of the vendor, responsible for downloading the files and grabbing the meta data from the vendor’s site. This will make use of the file management module to store the file and meta data to SQLite. 3. Configuration File: All the configurations for the framework (including vendor specific like credentials, url etc) will be stored in a json file which can be easily modified manually. 4. Execution Script: The configuration file can be setup to represent the polling interval for each of the vendor scraper and when the execution script is run it will go and schedule each of the vendor scripts individually according the polling interval defined in the config. Deliverable: 1) Python Source Code including the comments in the code explaining each function & its details. We should be able to give any required input as an argument and execute it as one line command in the Linux terminal. 2) Dependencies 3) Manual to install, configure and use the scraper
ID-ul proiectului: 22076792

Despre proiect

16 propuneri
Proiect la distanță
Activ: 4 ani în urmă

Vrei să câștigi bani?

Avantajele de a licita pe platforma Freelancer

Stabilește bugetul și intervalul temporal
Îți primești plata pentru serviciile prestate
Evidențiază-ți propunerea
Te înregistrezi și licitezi gratuit pentru proiecte
Acordat utilizatorului:
Avatarul utilizatorului
Hello, I am Mr. Martin F, an experienced german developer. I do web scraper developing for years, this is why I can say that I would be the perfect guy for this project. I want to build several scraper for you, which you can run on your server or even on my deployment server cluster (if you want). I like that you have development understanding when you speak about headless mode etc. I think we could get your big project done together, just message me for further communication. Best regards, Martin!
€18 EUR în 40 zile
4,9 (32 recenzii)
5,7
5,7
16 freelanceri plasează o ofertă medie de €17 EUR/oră pentru proiect
Avatarul utilizatorului
I can offer my high quality web scraping services to you. I have huge experience in web scraping area. Please contact if interested
€17 EUR în 40 zile
5,0 (23 recenzii)
7,9
7,9
Avatarul utilizatorului
I have +12 years of experience in Python programming. I reviewed your detailed project description. I'm expert in webscraping and have completed tens of webscraping tasks here on freelancer. How many vendors do you have and what is your estimate for time that is required for each one? Thanks
€15 EUR în 30 zile
4,8 (223 recenzii)
8,2
8,2
Avatarul utilizatorului
Hi, I have gone through your requirement to scrape lots of websites. I am EXPERT in building scraping tools /scripts. Hence, I can SURELY work on your project. I am having 4 YEARS of EXPERIENCE in developing PHP-PYTHON (Scrapy, Selenium) based web scraper as well as WINDOWS BASED web scraping software through which I have crawled many sites such as Craigslist, Amazon, Yelp and many others. I have also worked on complex site to bypass CAPTCHA with the use of PROXY IP bouncing techniques.. Let's work together :) Have a great day! I am glad to see your WORK HISTORY and positive reviews of other freelancers. I am really excited to work with you and would love to have a long-term business association for any of your data related needs less ,,,,,,,  , , ,
€16 EUR în 3 zile
4,9 (97 recenzii)
5,8
5,8
Avatarul utilizatorului
Hello, We have 6+ years of experienced full-stack (Python, Django, Machine learning, Anaconda) developers team for your existing website projects. Also We have 8+ years of experienced JavaScript MEAN/MERN stack developers Team for your project. We are expert in Python website and apps development, Python programming, Python integration framework, designing, etc. We are familiar to work with GIT, cloud-based services and extensive experience in Python3, Flask, SQLite, HTML5, CSS3, Bootstrap, JavaScript & jQuery development. Python development services: • Web development and test automation • Google App Engine Cloud Platform development • Zope and Plone development • Django development • Pyramid framework development Recently we have started the work in django framework based project energy management system and also migrate from old python to latest python version. Kindly initiate the chat i.e we can discuss further. Thanks! Emizentech
€18 EUR în 40 zile
5,0 (60 recenzii)
5,8
5,8
Avatarul utilizatorului
Hello sir! As a Python expert, I am glad to place the bid on your project. As you can see in my profile, I am fully experienced and lots of skills in automation scripts. I want to discuss more via chat. Regards.
€12 EUR în 40 zile
4,9 (15 recenzii)
5,5
5,5
Avatarul utilizatorului
Hi there, this is Pandelis, Im a Python developer and Web-Scraping specialist. Just finished a similar project that required scraping metadata in SQL DB, downloading documents for each record, storing them in an organised manner and scraping the documents for additional info. Deliverables were JSON files that included both info from the web records and from within the PDFs/DOCs. I can focus solely on this project and deliver with no problems. I have a few questions and propositions to make, contact me in chat to discuss further. Thanks, Pandelis
€17 EUR în 40 zile
4,9 (18 recenzii)
5,2
5,2
Avatarul utilizatorului
Hello, I have experience in web scraping with Python. I have read your project's requirements and I'm capable of delivering the Python scripts with the config files, instructions to install and dependencies. I can use Selenium, Scrapy, BeautifulSoup and Requests to make the best web scrapers! I hope to work with you!
€15 EUR în 40 zile
5,0 (5 recenzii)
4,1
4,1
Avatarul utilizatorului
Hello my name is Fares, I can get it done perfectly as you want let's chat when you are online, Thanks
€15 EUR în 40 zile
4,5 (18 recenzii)
3,6
3,6
Avatarul utilizatorului
Hello! I am very interested in your post project. While I read your description carefully, I was excited with feeling that I would be able to satisfy for your requirements in this job. We can negotiate on price/Budget If you award me for your project, you will get good result. looking forward to work with you. thanks regards  
€15 EUR în 40 zile
5,0 (4 recenzii)
2,5
2,5
Avatarul utilizatorului
Hello. I am a software developer with strong habilities in scraper to web applications using automated browser solutions like selenium in python. Also, I have a strong knowledge of the Linux ecosystem in any distribution(ubuntu, OpenSUSE ....). We can continue the conversation in chat to define the web pages to scrap and any other requirement. Thanks.
€12 EUR în 30 zile
5,0 (2 recenzii)
2,5
2,5
Avatarul utilizatorului
Hi there, I have worked with creating custom web scraping solutions in the past to suit varying webpage requirements, specifically with Python (multithreaded). Can provide link to Github repository showing an example of this. Liam
€17 EUR în 20 zile
0,0 (0 recenzii)
0,0
0,0

Despre client

Steagul BELGIUM
Brussels, Belgium
3,6
3
Metoda de plată a fost confirmată
Membru din sept. 20, 2019

Verificarea clientului

Mulțumim! Ți-am trimis prin e-mail linkul pe care trebuie să-l accesezi pentru a revendica creditul gratuit.
A apărut o eroare la trimiterea e-mailului. Încearcă din nou.
Utilizatori înregistrați Totalul proiectelor postate
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Se încarcă previzualizarea
S-a oferit permisiunea de depistare a locației.
Ți-a expirat sesiunea pentru conectare sau te-ai deconectat. Conectează-te din nou.