Data Mining and Machine Learning

Data Mining and Machine Learning

Why Data Mining

  1. Credit ratings/targeted marketing: Given a database of 100,000 names, which persons are the least likely to default on their credit cards?
    • Identify likely responders to sales promotions
  2. Fraud detection
    • Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
  3. Customer relationship management:
    Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor?

Data Mining helps extract such information

 Data Mining Applications

  1. Banking: loan/credit card approval
    • predict good customers based on old customers
  2. Customer relationship management:
    • Identify those who are likely to leave for a competitor.
  3. Targeted marketing:
    • Identify likely responders to promotions
  4. Fraud detection: telecommunications, financial transactions
    • from an online stream of events identify fraudulent events
  5. Manufacturing and production:
    • Automatically adjust knobs when process parameter changes
  6. Medicine: disease outcome, the effectiveness of treatments
    • Analyze patient disease history: find a relationship between diseases
      Molecular/Pharmaceutical: identify new drugs
  7. Scientific data analysis:
    • Identify new galaxies by searching for subclusters
  8. Web site/store design and promotion:
    • Find affinity of visitors to pages and modify the layout

Data Mining- Classification

Given old data about customers and payments, predict new applicants’ loan eligibility.

Data Mining – Association Rules

Given set T of groups of items

Example: a set of item sets purchased

Goal: find all rules on item sets of the form a-->b such that: Purchase of product A --> Service B

Example: Milk --> bread

Prevalent does not equal Interesting
  • Analysts already know about prevalent rules
  • Interesting rules are those that deviate from prior expectation
  • Mining’s payoff is in finding surprising phenomena
What makes a rule surprising?
  1. Does not match the prior expectation
    The correlation between milk and cereal remains roughly constant over time
  2. Cannot be trivially derived from simpler rules
    Milk 10%, cereal 10%
    Milk and cereal 10% … surprising
    Eggs 10%
    Milk, cereal, and eggs 0.1% … surprising!
    Expected 1%

What Is Being Returned The Most?

A little bit of everything needs further investigation.

Is There a Connection?

Everything! Notice the large quantities of Wood Finish materials.

What Are Some of These Products?

100% and more than 70 times

More than 60% and more than 65 times  

More than 50% and more than 100 times

Decomposition Tree, showing returns multiple times, why?

Data Mining – Clustering

  • Unsupervised learning when old data with class labels is not available e.g. when introducing a new product.
  • Group/cluster existing customers based on time series of payment history such that similar customers in the same cluster.


  1. Customer segmentation e.g. for targeted marketing
    • Group/cluster existing customers based on time series of payment history such that similar customers in the same cluster.
    • Identify micro-markets and develop policies for each
  2. Collaborative filtering
    • Group based on common items purchased

Application Example

Data Mining - Forecasting

Accurate Forecasting Roadmap (Expected 6 Months)

Data Mining Cycle

CRISP-DM: Cross Industry Standard Process for Data Mining


Connect with us

Head Quarter : 30 street 87 Thakanat Maadi, Cairo , Egypt.

Dubai Office : Sheikh Zayed Road, 3rd Interchange, Al Quoz Industrail Area 1, Al Shafar Investment Building,Office:208,Dubai UAE, P.O.Box:75244

  • dummy+971 56 392 9251 / 01098891700

  • dummy .


Enter your email and we'll send you more information