- Reading Data from Different Sources.
- Data Collecting :
1. Retrieval scenarios :
-- Downloading ready-made data files
-- Manipulation URLs to access to pages
-- Dealing with forms
-- Retrieving data from enriched webpages through the JS framework
-- Retrieving data from API(JSON and XML) and semi-structured(PDF, Word, TXT)
2. Extraction strategies :
CSS Selectors, XPath , Regular expressions and APIs queries.
3. Data storage :
-- Sending data to the SQL Server, Oracle etc... databases
-- Saving data into natives formats
- Data Quality :
Applying an ETL process which detects and rectifies the corrupt or inaccurate
records from record set, table and database in order to ensure that data
responds to the objectives and expectations of the customer.
- Machine Learning :
1. Supervised learning algorithms
-- k-nearest neighbors (Classification)
-- Naive Bayes (Classification)
-- Decision trees (Classification)
-- Classification rule learners (Classification)
-- Linear regression (Numeric prediction)
-- Regression trees (Numeric prediction)
-- Model trees (Numeric prediction)
-- Neural networks (Dual use)
-- Support vector machines (Dual use)
2.Unsupervised learning algorithms
-- Association rules (Pattern detection)
-- k-means clustering (Clustering)
- Data Visualization :
Creating dashboards according to the kind of data (Comparison, Distribution,
Composition and Relationship).
- Network analysis :
Creating static and interactive network graphs.