Beautifulsoupproiecte
...a CSV: company_name, address, suburb, state, postcode, telephone, email and website. The data sits across multiple pages, so the script will need to step through pagination (or any “next” links) until the list is exhausted. Language is flexible—PHP 5.4 (legacy server) or modern Python—so choose whichever lets you work fastest and cleanest. If you prefer Python, I’m happy with requests + BeautifulSoup or Scrapy; in PHP, cURL with DOMDocument or Simple HTML DOM is fine. Keep external dependencies minimal and open-source. Deliverables • A fully commented source file (.php or .py) that I can run from the command line. • A sample CSV showing at least a handful of correctly scraped rows. • A brief README outlining setup, any libr...
I need a cost-effective developer who can turn small scraping and browser-automation requests into simple, working code at around $2 an hour. Things like logging into a site, collecting fields, exporting them to CSV, or automating a short click-through sequence. write work as you first word in your bid. You can use whichever stack you prefer: • Python with Requests, BeautifulSoup, Selenium, or Playwright • VB.NET with HttpClient, WebBrowser, or Selenium If you’re reliable, comfortable working at this rate, and ready to start right away, I’d love to discuss the first task and get you onboard.
I have access to an online business directory containing roughly 30,000 company profiles. I need every publicly visible field on each profile—think phone numbers, email addresses, physical addresses, descriptions, website links and any other details the page exposes—captured and delivered in a single Google Sheets file. Please build or run an automated scraper (Python + BeautifulSoup, Scrapy, Selenium, or a comparable stack) that can: • Crawl every profile, including deeper pages reached via pagination or “load more” buttons. • Respect the site’s structure and timing so we stay under any rate-limit or anti-bot radar. • Deduplicate records and keep data clean (no broken lines, hidden HTML tags, or merged cells). • Push the fin...
...brief demo run or sample dataset that proves the script successfully captures the fields above from at least one social-media platform. Nice-to-have (but not strictly required): retry logic, proxy support, and lightweight scheduling so I can trigger the job daily. I’m looking to move quickly, so please outline your proposed approach, relevant experience with tools like Selenium, Playwright, BeautifulSoup, Scrapy, or similar, and any past examples of social-media scraping work. If you're an experienced developer, you can use your best approach to automate the process. Here's the full cycle to automate:
...fail. My goal is to restore the script to peak performance — faster execution, higher data accuracy, and a verification process that succeeds every single time. To achieve this, the code needs performance profiling, refactoring, and improvements in how it handles Amazon’s constantly changing front end. If you’re comfortable optimizing Python performance using tools like Selenium, Requests, BeautifulSoup, or similar, you’ll be a perfect fit for this project. All the libraries currently used in the project and the bot that needs optimization are shown in the attached image within the project details. Deliverables: Optimized and well-documented Python code Performance report showing measurable improvements in speed and accuracy Reliable, fast, and robus...
...capture every available record, then merge the data so that any entries sharing the same email address are treated as duplicates and removed. Columns required in the final file: Name, Address, Email, Phone Number—nothing more and nothing less. The finished spreadsheet should open smoothly in Excel with consistent formatting and no blank rows or stray characters. A quick scrape with Python (BeautifulSoup, Scrapy, or similar), followed by a solid dedupe pass in Pandas or your preferred toolset, is fine by me as long as the end result is accurate. Deliverable • One Excel (.xlsx) file containing the fully deduplicated list of contacts. I’ll spot-check the sheet against the live sites, so completeness and the email-based deduplication rule are the key ac...
...Extract every relevant column—drug name, pack size, price, date stamp, and any other cost fields shown on the page or in its underlying JSON. • Output to a single .xlsx workbook, one sheet per run (timestamped) or one rolling sheet with an updated “last refreshed” column—whichever keeps the file lightweight and easy to read. Automation • Provide a runnable Python script (requests + pandas/BeautifulSoup/Scrapy—your choice) with clear comments. • Include a short README explaining setup and how to schedule a monthly job with Windows Task Scheduler or a Linux cron. Acceptance • First delivery: script, README, and a sample Excel file generated from the latest data. • I’ll test it on my machine; once it runs end-t...
**Project Overview:** I need a custom automation tool that can qui...potential CAPTCHAs and rate limiting - User-friendly interface for non-technical users **Technical Specifications:** - Must work around anti-bot measures - Should use residential proxies or similar undetectable methods - Fast response times even during peak traffic - Support for JavaScript-heavy websites - Data export capabilities (CSV, JSON, or database) **Preferred Technology Stack:** - Python (BeautifulSoup, Selenium, Playwright, Scrapy) - Residential proxy integration - Headless browser automation - Any other tools you recommend for this purpose **Deliverables:** 1. Complete source code 2. Installation/setup documentation 3. User guide 4. Technical support for 2 weeks post-completion **Timeline:** as soon a...
...that I can reuse or extend the code later without hassle. Here’s what I expect the scraper to capture for every item it encounters: product name, full description, all available images, and the specifications shown on the page. Accuracy is critical; the data must mirror exactly what the site displays at the time of scraping. I’m comfortable with Python-based solutions that rely on requests, BeautifulSoup, Selenium, Scrapy, or an equivalent stack, provided the final script is well-commented and easy to schedule. Feel free to suggest a different language or library if it offers a clear advantage for this task. Deliverables • A fully functional script (and any helper modules) that logs in or navigates as needed and reliably extracts the fields above • St...
I need the candidate details currently displayed on individual profile pages of a recruitment agency site copied into a single Excel file. The four fields I’m after are: • Name • Email • Phone • Location Simply list each candidate on its own row with those columns as headers. If you prefer scripting, feel free to use Python, BeautifulSoup, Scrapy, or any other web-scraping tool, but accurate manual entry is perfectly fine too—as long as the final spreadsheet is clean and complete. I’ll share the site URL after we start. Let me know if you see any access limits or captchas so we can work around them. When you reply, please outline how you plan to gather the data and roughly how many profiles you expect to capture in this initial pass.
I need every publicly visible email address listed on (newline-separated, exactly as requested). Quality I will quickly spot-check 10–15 random addresses; any bounce rate above 10 % or more than five obvious errors will require a revision. Timeline Let me know how soon you can finish. A concise status update after the first 30 % of pages scraped is appreciated so we can correct course early if needed. Tools & Method Whether you use Python, BeautifulSoup, Scrapy, Selenium, or manual collection is up to you—as long as the final list is accurate and complete. If the site uses dynamic loading, be prepared to handle it. If you’ve scraped WLW or similar B2B portals before, mention it; past success there will weigh heavily in my decision.
...Typical fixes include: • stripping HTML tags and inline CSS • normalising quotation marks, accents, and other UTF-8 characters • removing stop-words that slipped into meta sections • de-duplicating repeated paragraphs and boilerplate text • producing a final UTF-8 CSV (or XLSX) with one article per row and clean body text in a single column Feel free to automate with Python (pandas, BeautifulSoup, regex) or OpenRefine if that speeds things up; a well-documented Excel or Google Sheets workflow is also fine as long as the same quality bar is met. Before final hand-off, I’ll run a small spot check on ten random articles: each must be free of HTML, free of duplicate content, and save correctly as UTF-8 without garbled characters. Pass th...
...core details—name, job title when available, email address, plus phone or city if publicly listed—directly from those sites. • Deliver everything in a neatly formatted Excel workbook with separate columns for each field and zero duplicates. I have no interest in social media, e-commerce, or directory pulls—just information that is publicly visible on business websites. A Python-based stack (BeautifulSoup, Scrapy, Selenium, or a headless browser) is ideal, but use whatever stack keeps the process efficient and transparent. Data accuracy is paramount; I’ll spot-check every batch. Please respect GDPR and other privacy guidelines—only scrape content that’s openly published and allowed for B2B outreach. When you respond, briefly outline:...
...Bypassing Cloudflare is also essential. This is a core technical requirement for the project. Because we’re dealing with a court portal, you’ll have to contend with paging, possible session time-outs, and rate limits. Please engineer the solution to cope gracefully with those hurdles while staying within the site’s terms of use. Python is my preferred stack and I’m comfortable with Requests, BeautifulSoup, Selenium, or Playwright—choose whatever combination you feel is most robust, but deliver the final script with clear comments so I can run it again later without your help. Deliverables • Well-commented scraper script • Complete dataset (CSV or JSON) containing all party information • Brief read-me covering setup, dependencie...
...supplier 1. Company name 2. Website URL (if any) 3. Full street address 4. Phone and mobile numbers 5. Contact email 6. Key people or decision-makers 7. Link to catalog PDFs (when available) 8. Credentials or certifications noted on the source page 9. Public rating or review score (if provided by the source) Technical expectations • Written in Python 3.x using requests/BeautifulSoup, Selenium, or the Google Places API—whichever combination you believe will reach quota limits fastest while remaining stable. • Clean, well-commented code that I can run from the command line with a single configuration file for cities and categories. • Output as CSV or TSV with UTF-8 encoding; one row per supplier, duplicate suppression built in. &bul...
...newest records, and push them into a structured store I can query or export. The data points I rely on are: • Tender title and ID • Submission deadlines • Full project details / requirements • Any published proposal-format instructions Please build the pipeline so that it first looks for an available API; if none exists, fall back to robust web-scraping (Python, Scrapy/Playwright/BeautifulSoup—your call). I expect sensible de-duplication across sources, clear logging, and a straightforward way to add or remove URLs without touching core code. JSON or PostgreSQL is fine for storage as long as I can plug BI tools into it later. Deliverables 1. Source-agnostic crawler with hourly scheduler 2. Normalised database (or flat JSON dumps) containing th...
...case-number form in Zoho Creator; your job is to build the scraper or API bridge, connect it with Deluge or a REST hook, and prove that it reliably pulls those three data points for each of the three court tiers. Deliverables and acceptance criteria • Source code (Python preferred, but open to alternatives) with clear comments • Any helper libraries or headless browser scripts (Selenium, BeautifulSoup, Playwright, etc.) • A Deluge or webhook snippet that writes the returned JSON into my Creator fields • Step-by-step setup notes so I can redeploy if the server moves • Demo on at least one matter from a District Court, one from a High Court and one from the Supreme Court showing correct data in Zoho Creator I’m keen to see your prior e...
...The programs still launch and reach their respective pages without crashing, yet both come back empty—no records, no errors. I need someone to step in, pinpoint why the scrapers no longer capture the data, and update the code so each script reliably returns the expected text output again. You’ll be working with existing Python code that was originally built around libraries such as Requests, BeautifulSoup, and a bit of Selenium. The fixes may involve adjusting selectors, handling new page-load behaviour, or adding smarter waits, but the goal is simple: make the two scrapers functional, cleanly returning the data just as they did before. Deliverables • Updated working .py file for each scraper • Any new helper modules or entries • Brief note or RE...
I need a web scraping tool to extract job postings from various job posting websites. The scraper should collect the following information: - Job title and description - Company details - Application links - Location - Job posted date Ideal Skills and Experience: - Proficiency in web scraping tools (e.g., Python, BeautifulSoup, Scrapy) - Experience with handling CAPTCHA and anti-scraping mechanisms - Familiarity with job posting websites - Ability to deliver data in a structured format (e.g., CSV, JSON)
...single batch of data that needs to be harvested from a public website, then exposed through a lightweight API so I can pull it directly into my own system. Once the scrape is complete, there is no need for ongoing monitoring or refreshes—this is strictly a one-time job. Here’s what I need from you: • Write and run the scraper (Python, Node, or another language you’re comfortable with—Scrapy, BeautifulSoup, Puppeteer, Playwright, or Selenium are all fine as long as they finish the task reliably). • Bypassing Cloudflare is a must. • Clean and normalize the results so every field is machine-readable. • Stand up a small REST endpoint that returns the data in JSON when queried. A simple Flask/FastAPI, Express, or comparable microser...
I have a list of 294 Tourism Boards in the exhibitor directory (link below). I need every entry’s email, telephone number, website URL and—whereve...with complete, accurate data drawn exclusively from the directory above; no other sources are required. Deliverables • 294 fully-populated rows in the Google Sheet, matching the template’s column order (Email | Telephone | Website URL | Contact Person). • A brief note on any listings where a field is genuinely unavailable, so I know nothing was missed. If you automate the task with Python (BeautifulSoup, Selenium, Scrapy etc.) feel free to share the script for transparency—otherwise a manual scrape is fine as long as the information is precise. Let me know your estimated turnaround time and any cla...
...supply the niches, keywords and any profile filters; you will return a clean CSV that includes at minimum: • email address • profile or page URL • person / page name • platform tag (LinkedIn, Facebook or Instagram) Duplicates, bounces and role-based emails (info@, support@, etc.) should be removed before delivery. Please work with whichever stack you know best—Python, Selenium, BeautifulSoup, Apify, Phantombuster or similar—as long as the process respects rate limits and the platforms’ terms of service. If you have an existing workflow or proxy solution that keeps blocks and captchas to a minimum, let me know. When you apply, focus on your experience scraping these three platforms and any success metrics you can share (volume pe...
...company IDs and need a small, fast Python script that loops through them, opens each page, and captures three fields—ID, company name, and email—only when a real company is present. Pages with an empty company name should be ignored, and any URL that times out or throws an error can simply be skipped so the run continues uninterrupted. The script must: • Use standard libraries (requests, BeautifulSoup, or similar) so it stays lightweight and easy to run. • Write the results to a .txt file, one company per line in the format: id, company name, email. • Validate that the company name is not blank before saving the record. When you hand it over I’ll test it against my full list; once everything parses correctly within the agreed budget range (...
...and descriptions • Certification dates, certification types, and certifying bodies for each listed company Export the scraped data into a clean Excel sheet, then build a simple dashboard inside the same workbook that lets me sort, filter, and view basic insights—counts by certification type, highlights of recently certified products, and any gaps you notice. A lightweight Python script (BeautifulSoup, Selenium, or similar) or a small Power Query setup is fine; I just need the working file plus clear instructions so I can refresh the data myself later. Keep it straightforward: one script, one Excel file, and a functional dashboard....
I need a robust scraper built to harvest job-post data from a specific website and write each record directly into my database. The scraper must reliably capture three fields on every listing: the full job description, the associated company information, and the application deadline. Please code the solution in a language and stack you’re comfortable with—Python + Scrapy, BeautifulSoup, or even Selenium are all fine so long as the run-time is headless and can be scheduled. Pagination, dynamic content, and duplicate prevention all have to be handled gracefully, and the script should pause or rotate headers/IPs if the target site begins throttling requests. Deliverables • A well-documented script or small module that performs the scrape and inserts the data as...
I need two separate, fully-functional notification and alert agents to support our company-scouting workflow. Process 1 – Company Deck & Website Monitor • Continuously scan company-provided pitch decks (PDF, PPT) and public website pages. • Extract key market signals, milestones, and product updates using Python with libraries such as BeautifulSoup, pdfplumber, or LangChain document loaders. • Trigger real-time alerts—preferably via Slack and email—whenever new language around funding rounds, customer wins, or major roadmap changes is detected. Process 2 – Market-Trend Sentinel • Scrape broader internet sources (news sites, industry blogs, analyst portals) for trend keywords we define. • Aggregate findings, identify se...
...of preference indicators—e.g., product lines mentioned, initiatives highlighted, pain points, technology keywords, sustainability statements. 4. Summarisation and scoring so I can see, at a glance, which topics will resonate in outreach. 5. Output: structured JSON/CSV plus a short natural-language brief for each lead. Technical expectations • Python with libraries such as LangChain, BeautifulSoup/Scrapy, GPT-4 (or comparable LLM), and a lightweight vector database (Pinecone, FAISS, etc.) for context retrieval. • Modular code so new sources can be added easily. • Respect and include a basic rate-limit. • README with set-up steps, environment variables, and example commands. What success looks like - I feed in 50 domains and receive a file...
...several hundred pages of a single database-style website gathered, cleaned, and delivered to me in one well-structured Excel workbook. The site uses a consistent layout—each page lists record-type entries with roughly a dozen text fields that include titles, short descriptions, and a few categorical tags. Here’s what will make this project a success for me: • A repeatable script (Python, BeautifulSoup/Scrapy or a similar tool) that automatically navigates pagination, respects rules, and handles any lazy-loaded content. • An Excel file where each row corresponds to one record and each column maps cleanly to the on-page text fields, with no HTML artifacts or extra whitespace. • Basic documentation so I can rerun the scraper if the site content u...
...engagement must be completed in **15 calendar days**. I want a hands-on developer who will deliver production-ready, well-documented code and then teach me how to run, troubleshoot, and maintain the system. --- ### Scope of Work / Responsibilities The developer will: 1. **Write modular scraping & enrichment scripts** (Python): * Provide scripts for parsing static pages (Scrapy or requests/BeautifulSoup) and for dynamic pages where needed (Playwright or Puppeteer). * Provide an **Impressum (imprint) extraction** script to fetch missing contact details from company sites. * Include URL queueing, rate limiting, retry logic, and per-domain politeness. 2. **Proxy setup & management** * Recommend Germany-targeted proxy providers (residential/ISP/mobile) and ...
...page, or a clear grid layout on a single sheet—whichever keeps the workbook tidy. Images must be embedded, not provided as hyperlinks. • Repeatability: Running the script a second time shouldn’t corrupt or duplicate the existing workbook; it can either overwrite or create a fresh file—let’s discuss the cleanest approach. • Tech stack: I’m comfortable with Python (Selenium, Playwright, or BeautifulSoup + requests) or a Node.js solution using Puppeteer. Use whichever you consider most stable, but the final code must be well-commented and runnable on Windows 10. • Deliverables: – Fully-functional script with clear setup instructions – A sample Excel file generated from two or three URLs I’ll provide for tes...
I already have a working Python scraping script built on SeleniumBase and BeautifulSoup, but two things are slowing me down: • CAPTCHA is triggering sometimes. I want a reliable, low-latency way to bypass or solve it automatically so the script can flow straight to the target data without manual intervention or long waits. • The current code grew organically and now feels bloated. I’d like it reorganised into a leaner, modular structure that shaves seconds off each scrape and is easier to maintain. The refactor should keep the same inputs and outputs I use today; only the internals and runtime need to change. If you bring your own anti-CAPTCHA techniques or third-party solving service, that’s fine as long as the turnaround is near-instant and the integratio...
I need a fresh, well-structured dataset of German gastronomy venues—specifically restaurants, cafés and bars—complete with the name of the restaurant, the owner’s name, a working e-mail address, postal code and town. The fastest way to achieve scale here is web scraping, so please rely on your favourite Python stack (Scrapy, BeautifulSoup, Selenium or a comparable tool) rather than manual look-ups. Scope • Nationwide coverage: all federal states in Germany are relevant. • One line per venue, no duplicates. • Contact person must be the owner or manager; other roles are not required. • Data sources must be publicly available pages (official websites, online menus, Imprint pages, Google Maps, etc.) to stay GDPR-compliant. Deliverab...
...points: • Source method: web scraping only—no API keys or sensor feeds are involved. • Process: fully automatic capture on a scheduled run or trigger I can adjust. • Output: clean, deduplicated product records ready for further processing. • Hand-off: commented code, a quick setup guide, and a brief README explaining dependencies and how to add new target URLs down the line. Python with BeautifulSoup, Scrapy, or similar libraries is fine, but I’m open to alternatives if they achieve the same reliability and speed. Please make sure the solution is lightweight, easy to maintain, and keeps website etiquette in mind (respect , reasonable request rates). If you already have a working prototype you can adapt, let me know—otherwise, outline ...
...via REST API. Ability to push both updates and new products directly. Option to log every change (added/updated/skipped) in an output Excel or CSV. Technical Preferences: Language: Python (BeautifulSoup / Scrapy / Requests preferred). Excel/CSV data handling with Pandas. WooCommerce REST API for upload and update. Clean, modular code with comments and documentation. Optionally: a simple GUI (to upload Excel and start the process). Deliverables: Fully working scraper and automation script/tool. Documentation (setup + usage guide). Testing on my live WooCommerce website. Skills Required: Python (BeautifulSoup / Scrapy / Requests) WooCommerce REST API / WordPress integration Excel/CSV handling Web scraping & automation Data cleaning & metadata formatting Example Workf...
...necessary. Please make the login fully automatic, maintain the session while visiting every URL, and finish by writing a clean, UTF-8 CSV that mirrors the input order. A brief README explaining how to set my credentials and run the script is all I need beyond the .py file and a short sample output. Feel free to rely on common libraries such as requests (or Selenium if form tokens demand it), BeautifulSoup, pandas, etc., as long as setup remains straightforward in a fresh virtual environment....
I need a clean, automated scrape of business addresses from roughly 11-50 websites. For every site, capture ...Company / Listing Name • Street and number • ZIP / Postal code • City • Country • Contact person Put the results in a single Excel workbook where every website’s full list sits in one dedicated row—so the sheet ends up with 11-50 rows total, each containing all addresses found on that particular site. Only web scraping is required; no manual copy-paste. Feel free to use Python (BeautifulSoup, Scrapy, Selenium) or another reliable tool, as long as the final file opens flawlessly in Excel and data is accurately mapped to the columns above. Deliverable: the finished .xlsx file plus your runnable script or method so I can repr...
...fields will be clarified). • Handle pagination or infinite scroll if present. • Respect basic rate-limiting, user-agent rotation, and error handling so it runs unattended without tripping the site’s defenses. • Export the collected data to either CSV or JSON (please make the format easy to switch via a command-line flag or simple config value). Use well-known libraries such as requests, BeautifulSoup, Selenium, or Scrapy—whatever you feel is most efficient for this site’s structure. Keep the code clean, well-commented, and isolated in functions so I can tweak selectors or add new fields later. Deliverables: 1. Fully working .py file(s) 2. with pinned versions 3. A short README showing setup, run command, and example output file 4....
...junior-level Python developer who feels at home with Tkinter to build a fresh desktop application for me. The core of the job is designing a clean, responsive GUI and wiring it up with solid, well-structured Python code. While the first priority is the desktop app itself, I do want the codebase kept flexible enough to bolt on some basic web-scraping features later, so familiarity with requests, BeautifulSoup or a similar library will be useful. You’ll work on a month-to-month arrangement, pushing updates to a shared Git repository and checking in with me regularly so we can refine the interface and workflow together. Clear comments, sensible file structure, and readable variable names are must-haves—this project is intended to be maintainable and easy to extend. The...
...ongoing automation required. Scope • Access the URL I provide and extract all relevant on-page text (headings, paragraphs, table cells, and lists). • Ignore images, PDFs, or embedded media; text only. • Consolidate the cleaned content into a well-structured worksheet with clear column headers (e.g., Page URL, Section, Extracted Text). Technical Notes A lightweight approach using Python, BeautifulSoup, Scrapy, or a similar tool is fine as long as the final XLSX is clean, deduplicated, and readable. No API work is expected. Acceptance I’ll consider the task complete once I can open the Excel file, see every page/section represented without missing text, and cross-check a random sample against the live site. If you can start right away and finish quickly...
...points: • Source method: web scraping only—no API keys or sensor feeds are involved. • Process: fully automatic capture on a scheduled run or trigger I can adjust. • Output: clean, deduplicated product records ready for further processing. • Hand-off: commented code, a quick setup guide, and a brief README explaining dependencies and how to add new target URLs down the line. Python with BeautifulSoup, Scrapy, or similar libraries is fine, but I’m open to alternatives if they achieve the same reliability and speed. Please make sure the solution is lightweight, easy to maintain, and keeps website etiquette in mind (respect , reasonable request rates). If you already have a working prototype you can adapt, let me know—otherwise, outline ...
...backend. If images are downloaded, place them in a tidy folder with predictable filenames and reference those names in the data file. Graceful error handling, basic logging, and the ability to resume from the last successful page are must-haves. If the site uses dynamic loading, feel free to leverage Selenium, Playwright, or similar headless browser tools; otherwise a straightforward requests/BeautifulSoup approach is perfect....
...is straightforward: collect the product name and its current price for well over five hundred unique items and drop the results into a single sheet (Excel or Google Sheets—whichever you prefer, as long as the file is ready to download). A working URL for each item would be helpful for future checks, but the absolute must-haves are the two fields stated above. I’m fine with you using Python, BeautifulSoup, Scrapy, Selenium, or any other tool that speeds up and automates the process; I only care that the final sheet is complete and every entry is error-free. Please double-check currency symbols, decimal placement, and any characters that may break a comma-separated format. Deliverables • Spreadsheet with columns: Product Name | Price | (optional) URL • A...
...those details captured once and delivered in a tidy Excel workbook. The page has no official API, so the information will have to be gathered directly from the HTML. Here’s what I expect: • All attendee-related fields that appear publicly (name, company, title, email, etc.) placed in separate, clearly labeled columns of one .xlsx file. • A repeatable, well-commented script—Python with BeautifulSoup, Scrapy, or similar is perfect—that I can run again should the list change. • No scheduling or automation beyond the initial scrape; this is strictly a one-time job. Before we start, I’ll share the URL and point out the exact section that contains the attendee list. Let me know if you anticipate any blockers such as CAPTCHAs or dynamic...
...hours to cope with several different product page templates from the same marketplace. What I need done: • Modify the scraping logic so it detects and adapts to varying layouts. • Update all CSS/XPath selectors where the structure changes. • Add a few small extras—grab fields like SKU, image gallery URLs, and store everything in the existing CSV output. The current stack is requests and BeautifulSoup; please keep it lightweight and stick to that unless a Selenium fallback is absolutely required. Keep my existing function structure intact so I can merge updates quickly. I’ll share the repo and sample URLs as soon as we start. Deliver an updated script, a short README explaining the new sections, and a quick screen capture (or log) showing a succes...
...commentary. 2. Provide the final dataset in CSV or Google Sheet with at least: URL, platform, publish date, brand, product name, headline, and the full text or caption. 3. Organise everything so I can filter quickly by brand, model, and launch date. 4. Timestamp and keep source links intact for verification. I’m happy to clarify naming conventions or preferred scraping tools (Python + BeautifulSoup, Scrapy, or social media APIs). Once a small sample is approved for accuracy and purity of source, we can move on to a full historical sweep. Camera example in the attache .png...
...backend. If images are downloaded, place them in a tidy folder with predictable filenames and reference those names in the data file. Graceful error handling, basic logging, and the ability to resume from the last successful page are must-haves. If the site uses dynamic loading, feel free to leverage Selenium, Playwright, or similar headless browser tools; otherwise a straightforward requests/BeautifulSoup approach is perfect. I’ll test the script against a small category first, then run a full crawl. Once the output matches what’s visible on the site, the job is complete....
...dynamic content, respect rate limits, and save the harvested information in a clean, structured format (CSV or JSON works). I will provide the list of target profiles, hashtags, or pages once the contract starts; the script should be flexible enough for me to adjust those inputs later without touching the core logic. Please build on Python 3 and feel free to use libraries such as Requests, BeautifulSoup, Selenium, or Scrapy—whatever achieves stable results while keeping external dependencies lightweight. Deliverables (all required): • Fully-commented Python script (.py) ready to run from the command line • A short README explaining setup, required libraries, and runtime instructions • Sample output file generated from a test scrape to show the for...
...processing and automation. Built classes/objects to model real-world entities. Handled errors with try-except for stable execution. Applied list/set comprehensions for efficient data transformation. Solved real-world problems with automated scripts, enhancing problem-solving skills. Movie Data Web Scraping Automated the collection of movie titles, genres, and ratings from web sources using BeautifulSoup. Extracted structured data from web pages for analysis. Created a clean dataset for further visualization and insights. Reduced manual data collection effort through automation. Tableau Visualization Challenge Built over 30 interactive dashboards using Tableau to showcase realworld data insights. Utilized Tableau features such as calculated fields, parameters, filters, sets, and m...
...like bedrooms or street width). Specify target websites (start with ; expand to 2-3 others if feasible). Design system architecture: Scraping scripts, database schema, export formats, analytics structure. Deliverable: Design document (PDF/Word) with architecture and dashboard wireframes. Web Scraping System Development (Phase 2 - 2-3 Weeks): Build scrapers using Python (Scrapy, BeautifulSoup, or Selenium for dynamic sites). AI Tools and Agents is also suitable, and prefered. Implement filtering based on criteria (e.g., property type, Riyadh districts, price ranges, listing age). Schedule scraping: Frequent updates (e.g., every 4-24 hours) to capture changes. Handle pagination, anti-scraping measures, Arabic text encoding, and proxies to avoid blocks. Deliverable: Working...
...company. My focus is on harvesting telephone numbers, and I also want the company name, address, and any publicly listed email address captured at the same time. Please pull the data into a clean CSV or Excel sheet with separate columns for: • Company Name • Address • Email Address • Telephone Number Accuracy is vital—only live, valid numbers and addresses. A quick-turn Python script (BeautifulSoup, Scrapy, or similar) is fine as long as it respects each site’s public terms. I just need the working code and the finished spreadsheet; nothing more complex than that. The following company types are listed below Vehicle valeting Wheel repairs Tyre repairs Air conditioning Windscreen repairs Smart repairs Motor repairs Paint repair Bumpe...