The task is to write a script in R or Python, which can match two or more sets of third party text entries of corporate names. Each set has its own format and spellings.
The final deliverable is the script and it has to be general enough to handle various inputs, that is, various lists of corporate names with various formats and spellings.
You are encouraged to use any available packages & tools to implement the task as efficiently as possible.
Training datasets are:
Tab-separated [login to view URL]
[login to view URL]
To give an example, the script should be able to identify
Janssen Research & Development, LLC
and
Janssen Biotech, Inc.
as a match.
Please introduce an additional column with a unique text-based identifier. For the example above it could be “Janssen”.
In your bid please include the information on:
Expected delivery date
How many hours are you available to work per day
How many hours do you estimate the task will take you
Language of choice (R or Python)
Price bid - the project will be assigned as with a fixed budget.
Please do not start the work until the project has been officially assigned to you.
Thank you!
Expected date- 6/12, Hours per day-6-8, Estimated time- 12, Python, 20 Euros. Will use a variety of string manipulations and subset matching methods to create a generalised matching method.
Relevant Skills and Experience
Have programmed extensively in Python. Did my Google Summer of Code in Python Software Foundation. Do a lot of Text processing in my Naturla Language and AI research.
Proposed Milestones
€24 EUR - Final delivered script
€26 EUR în 4 zile
5,0 (1 recenzie)
0,0
0,0
4 freelanceri plasează o ofertă medie de €28 EUR pentru proiect
Lets start. Having a team of Professionals. We provide high quality work with accuracy. Would you like to discuss more about your current requirements?
Relevant Skills and Experience
Data mining
Proposed Milestones
€19 EUR - Lets start. Having a team of Professionals. We provide high quality work with accuracy.