I have advertised for this role before, and to some degree, we found a solution but I have now realized I need an individual with specific experience in this. When I say experience, at least 4-5 years of machine learning or data science experience who can quickly identify how to make these issues right and work with other developers to perfect s structural framework
Using a scraping framework on Django and Python
Here is the issue/problem defined.
There various methodologies for this, but to explain the process.
Imagine there many court websites, in which lawyers post their upcoming court dates and in courts.
we scrape. web crawl 5 of this court websites to make a master table of future court appearances by the lawyers.
This creates 5 data tables on various dates.
We then want to harmonize the data t create a master table in which we see all of the dates the lawyers are in court in a master table.
in the master, table are the various features
The Cases they are working on.*The Court Dates
The Courts (*the Venues with GPS coordinates)
In order to take the 5 data tables into one, we need to take the scraped data into a master table which contains unique data about the lawyers
The lawyers are unique by DNA
The court cases, only occur on one day, but since we have five records of each we need a simple algorithm to know they are the same thing
the courts are unique because they have a GPS coordinate. and we can use this to locate the court cases easily,
Once we have got our unique data tables, we want to go back to all the websites the next week, and add the information about new court cases that are put up online, but not affect the existing found unique cases too much
SO WE don't get weekly duplicates by date. !
MAYBE NOT QUITE SO EASY
we have about 50,000 Court cases on record in the future and about 10 times that in the past. so its a mid-sized data set.
A) be familiar and have extensive experience in scrappy, beautiful soup etc to look at what we have but I think we got that right
B) have experience with ML * prioritization, NLP possibly) in sorting out this problem
You will need demonstrable experience in solving this kind of problem and be 100 percent confident in advising and improving existing working progress.
Then you can find add more websites that feature court information so our bounty hunters can find more people who skip bail or whatever analogy you want to use for the actual process.
Your Skills can take a shining product into the stratosphere fast and we would love to hear from you. Also, you will write clean lovely code and other people will check it and think so too.
APPLY- CV or By answering how you approach the question.
Data Scientist with Significant Python Experience could probably solve this fast