I have a scrapy project that collects data from this website:
[login to view URL]
The website has API to feed data:
[login to view URL]
There is “since_id” in [data][pageInfo] and gives me the next page url:
[login to view URL]
The problem I’m having now is that when it comes to certain since_ids, the server couldn’t return data and the status code is 0. For example:
[login to view URL]
Above url gives me next page since id: 4603826461738206, but I couldn’t get any data from
[login to view URL]
I got this issue when I collected ~80K of data and I tried starting the spider from beginning, it still stopped there at the same since_id.
While I’m pretty sure there are more pages after that, I don’t know how to debug it. The ultimate goal is getting all posts from the website and there should be ~208K posts
Here is my current scrapy scripts
[login to view URL]
I'd like to get a quote on two goals:
1. Solving my current problem and making sure my spider can get 85%+ data of the 208K posts (must)
2. Currently I save scraped data into my AWS S3 bucket, is there a way to check against the data in S3 bucket when I re-run the spider and save only new data to my S3? (optional)
Hello!
-----------------------------------------------------------------
I DON'T WANT ANY MONEY FROM YOU
-----------------------------------------------------------------
I am python scripting and web scraping expert and I have 5 years of [login to view URL]
I am very interested in your post project.
While I read your description carefully, I wasexcited with feeling that I would be able to satisfyfor your requirements in this [login to view URL], you will get good result.
Looking forward to work with you. Thanks regards
$10 USD în 1 zi
0,0 (0 recenzii)
0,0
0,0
2 freelanceri plasează o ofertă medie de $10 USD pentru proiect
Hi There...!
Note : I give you 25% off on my all services. So grabs this special limited discount offer.
Let’s get to the Project. I came to know that your Looking a developer which have rich knowledge about Python development .you need to fix scrapy problem in with Python.
According to your post description, you need the job done very quickly with an affordable budget.
Your requirements are fully clear to me and I am applying only because I meet all the requirements that you are looking for.
I have been working as a full stack Python developer for more than 10 years and I know all the ins and outs of Python development and customization
Project should be 100% correct and Complete that will be my first guarantee service in this project. Moreover, I will be available for you for any kind of future edits and customizations. Sometimes my ten years back clients search for me till now only because of my excellent service.
I am mostly in love with communication. I like to be in touch with my clients 24/7 so that they can catch me at any time when they need me.
I am looking for the clear final requirements docs of your project so that I can start working immediately. If you need to discuss with me for any kind of suggestions or information, please knock me anytime. I am looking forward to hearing from you.
Thanks
Zohaib jamil