În desfăşurare

scrapy debug

I have a scrapy project that collects data from this website:

[login to view URL]

The website has API to feed data:

[login to view URL]

There is “since_id” in [data][pageInfo] and gives me the next page url:

[login to view URL]

The problem I’m having now is that when it comes to certain since_ids, the server couldn’t return data and the status code is 0. For example:

[login to view URL]

Above url gives me next page since id: 4603826461738206, but I couldn’t get any data from

[login to view URL]

I got this issue when I collected ~80K of data and I tried starting the spider from beginning, it still stopped there at the same since_id.

While I’m pretty sure there are more pages after that, I don’t know how to debug it. The ultimate goal is getting all posts from the website and there should be ~208K posts

Here is my current scrapy scripts

[login to view URL]

I'd like to get a quote on two goals:

1. Solving my current problem and making sure my spider can get 85%+ data of the 208K posts (must)

2. Currently I save scraped data into my AWS S3 bucket, is there a way to check against the data in S3 bucket when I re-run the spider and save only new data to my S3? (optional)

Aptitudini: Python, Web Scraping, Scrapy, Amazon S3

Vezi mai multe: project body mass index, project management software index card, quick debug project must done today, sample project using google maps api aspnet, usenet index api, debug project sencha, index api, data project data google analytics api, c project enhancements and baumer api integration, c# project enhancements and baumer api integration, logo design real estate project - inmbobiliariaria p[atagonia, logo design for real estate project - inmbobiliariaria p[atagonia, elasticsearch index api, scrapy start project, scrapy debug, visual studio 2017 debug project, android p hidden api, there are 36 communications channels on a project how many stakeholders are there in the project, google scholar h index api, web api return status code with message

Despre angajator:
( 0 recenzii ) McLean, United States

ID Proiect: #29933837

Acordat lui:


Hello! ----------------------------------------------------------------- I DON'T WANT ANY MONEY FROM YOU ----------------------------------------------------------------- I am python scripting and web scraping expert Mai multe

%selectedBids___i_sum_sub_4%%project_currencyDetails_sign_sub_5% USD în 1 zi
(0 Recenzii)

2 freelanceri licitează în medie 10$ pentru acest proiect


Hi There...! Note : I give you 25% off on my all services. So grabs this special limited discount offer. Let’s get to the Project. I came to know that your Looking a developer which have rich knowledge about Python Mai multe

%bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% USD în 1 zi
(29 recenzii)