暗記メーカー
ログイン
MIDTERM-ADVANCED DBM C3 L2
  • Jamaica Rose Gilo

  • 問題数 97 • 11/3/2024

    記憶度

    完璧

    14

    覚えた

    36

    うろ覚え

    0

    苦手

    0

    未解答

    0

    アカウント登録して、解答結果を保存しよう

    問題一覧

  • 1

    fundamental technique in data mining used to classify data into different categories based on a set of predefined rules

    Rule-Based Classification

  • 2

    Rules used in Rule-Based Classification are often derived from what? making the process both intuitive and interpretable

    Data itself

  • 3

    supervised learning task where a model learns to map input data to a specific category or label

    Classification

  • 4

    The goal is to predict the class of unseen instances based on what was learned from the training data.

    Classification

  • 5

    takes the form: IF condition THEN class

    Rules

  • 6

    can be applied sequentially to classify data

    Rule Set

  • 7

    Rules are often ordered by a specific priority or confidence level.

    Rule Set

  • 8

    How Rule-Based Classification Works?

    Rule Generation, Rule Evaluation, Rule Matching, Conflict Resolution

  • 9

    Rules are generated from the training data based on the relationships between different features and the target class.

    Rule Generation

  • 10

    Techniques such as decision trees (e.g., ID3, C4.5), association rule mining (e.g., Apriori), or direct heuristics (e.g., covering algorithms) can be used to extract rules

    Rule Generation

  • 11

    Rules that correctly classify more instances with high accuracy are generally preferred.

    Rule Evaluation

  • 12

    Each rule is evaluated based on its accuracy or coverage.

    Rule Evaluation

  • 13

    When classifying a new instance, the rule-based system evaluates which rules match the instance’s attributes.

    Rule Matching

  • 14

    In some cases, multiple rules might apply to a single instance, leading to conflicts.

    Conflict Resolution

  • 15

    Conflict Resolution Strategies

    Rule Priority, Voting

  • 16

    Several rules contribute votes to different classes, and the majority vote determines the class.

    Voting

  • 17

    More specific rules or rules with higher confidence may take precedence

    Rule Priority

  • 18

    These methods generate classification rules directly from the data without creating an intermediate model.

    Direct Methods

  • 19

    an efficient algorithm for generating classification rules

    RIPPER

  • 20

    RIPPER stands for?

    Repeated Incremental Pruning to Produce Error Reduction

  • 21

    This is a simple algorithm that generates rules based on a single attribute at a time.

    OneR

  • 22

    OneR stands for?

    One Rule

  • 23

    These methods derive rules from another classification model.

    Indirect Methods

  • 24

    It’s possible, though more complex, to extract rules from this by analyzing the learned weights.

    Neural Networks

  • 25

    A model that can be converted into a set of classification rules

    Decision Trees

  • 26

    This makes rule-based classifiers highly interpretable compared to other methods like neural networks or support vector machines.

    Interpretability

  • 27

    The rules generated by this method are easy to understand and explain.

    Interpretability

  • 28

    Since rules are explicit, users can easily track how decisions are made, which is crucial for applications in sensitive fields like medical diagnosis or finance.

    Transparency

  • 29

    The system can be extended by adding more rules or adjusting the existing rules, allowing for greater adaptability to changes in the environment or new data.

    Flexibility

  • 30

    Rule-based systems often evaluate only a small subset of rules for each instance, reducing overall complexity.

    Efficiency

  • 31

    This happens when many specific rules are generated that capture noise rather than general patterns.

    Overfitting

  • 32

    Some rules may overlap and can complicate the classification process.

    Rule Redundancy

  • 33

    It is when rule-based classifiers with intricate patterns require more sophisticated models like ensemble methods or deep learning.

    Limited Performance on Complex Data

  • 34

    Applications of Rule-Based Classifiers

    Medical Diagnosis, Fraud Detection, Customer Segmentation

  • 35

    Businesses can classify customers based on behavioral data to tailor marketing strategies.

    Customer Segmentation

  • 36

    In banking and finance, rules can be used to detect fraudulent activities based on suspicious patterns.

    Fraud Detection

  • 37

    Rule-based classifiers can generate transparent decisionmaking models to assist healthcare professionals in diagnosing diseases.

    Medical Diagnosis

  • 38

    This offers an interpretable and intuitive approach to classifying data by deriving simple IF-THEN rules from training data.

    Rule-Based Classification

  • 39

    While this method excels in domains that require transparency and ease of understanding, it may struggle with complex or noisy datasets.

    Rule-Based Classification

  • 40

    With complex datas, it's often used in conjunction with other classification techniques for enhanced performance in real-world applications.

    Rule-based Classification

  • 41

    A process used by the companies to turn raw data to useful information.

    Data Mining

  • 42

    an essential tool that allows us to turn raw data into actionable insights

    Data Mining

  • 43

    6 Data Mining Tasks

    Classification, Clustering, Regression, Association Rule Learning, Anomaly Detection, Summarization

  • 44

    It's about predicting the category or class of a data point based on past data

    Classification

  • 45

    An example of this is Predicting whether an email is spam or not spam (spam filtering).

    Classification

  • 46

    The algorithm is trained on labeled data (data with known categories) and learns to assign new data points to one of these categories.

    Classification

  • 47

    Applications of classification

    Email Filtering, Medical Diagnosis, Sentiment Analysis, Image Recognition

  • 48

    Spam detection

    Email Filtering

  • 49

    Classifying diseases based on symptoms and test results.

    Medical Diagnosis

  • 50

    Determining whether a piece of text expresses positive, negative, or neutral sentiment.

    Sentiment Analysis

  • 51

    Classifying images into categories (e.g., identifying objects in photos).

    Image Recognition

  • 52

    It involves grouping similar data points together based on their characteristics without predefined labels.

    Clustering

  • 53

    An example of this is: Grouping customers based on their shopping habits.

    Clustering

  • 54

    Data points that are similar to each other are clustered into groups, helping to discover natural structures within the data.

    Clustering

  • 55

    Clustering Algorithms

    K-means, Hierarchical Clustering, DBSCAN, Gaussian Mixture Models

  • 56

    Partitions data into a fixed number of clusters based on distance to centroids.

    K-Means

  • 57

    Builds a tree of clusters based on similarity, allowing for a hierarchy of clusters.

    Hierarchical Clustering

  • 58

    Groups together points that are closely packed, marking points in low- density regions as outliers.

    DBSCAN

  • 59

    DBSCAN stands for

    Density-Based Spatial Clustering of Applications with Noise

  • 60

    Assumes data points are generated from a mixture of several Gaussian distributions.

    Gaussian Mixture Models

  • 61

    It ssion is used to predict a numerical value based on the relationship between variables

    Regression

  • 62

    An example of this is: Predicting house prices based on factors like location, size, and age.

    Regression

  • 63

    The algorithm tries to fit a line (or curve) that best describes the relationship between variables to predict continuous values.

    Regression

  • 64

    Common Regression Algorithms

    Linear Regression, Polynomial Regression, Ridge and Lasso Regression, Decision Trees and Random Forests, Support Vector Regression

  • 65

    Models the relationship between dependent and independent variables as a straight line.

    Linear Regression

  • 66

    Extends linear regression by fitting a polynomial equation to the data.

    Polynomial Regression

  • 67

    Regularization techniques that prevent overfitting by adding a penalty for large coefficients.

    Ridge and Lasso Regression

  • 68

    Can also be used for regression tasks by predicting values based on tree structures.

    Decision Trees and Random Forests

  • 69

    Uses support vector machines for regression tasks.

    Support Vector Regression

  • 70

    Applications of Regression

    Financial Forecasting, Sales Prediction, Risk Assessment, Marketing Response Modeling

  • 71

    It is about finding relationships between variables in a large dataset.

    Association Rule Learning

  • 72

    An example of this is: Market basket analysis, where you discover that customers who buy bread are likely to also buy butter

    Association Rule Learning

  • 73

    It identifies sets of items that frequently appear together in the data (like discovering frequent itemsets).

    Association Rule Learning

  • 74

    Applications of Association Rule Learning

    Market basket analysis to improve product placement and cross-selling strategies, Recommender systems in e-commerce, Customer segmentation and behavior analysis, Web usage mining to understand navigation patterns.

  • 75

    It identifies data points that deviate significantly from the normal pattern.

    Anomaly Detection

  • 76

    An example of this is: Detecting fraudulent credit card transactions.

    Anomaly Detection

  • 77

    In anomaly detection, the algorithm learns what normal data looks like, and anything that falls far outside this pattern is flagged as an what?

    Anomaly

  • 78

    3 types of Anomalies

    Point Anomalies, Contextual Anomalies, Collective Anomalies

  • 79

    A single data point that deviates significantly from the rest of the dataset. For example, a sudden spike in credit card transactions could indicate fraud.

    Point Anomalies

  • 80

    Anomalies that are only considered unusual in a specific context. For example, a high temperature reading is normal in summer but anomalous in winter.

    Contextual Anomalies

  • 81

    A group of data points that collectively deviate from the expected pattern, even if individual points are not anomalies. For instance, a sudden increase in network traffic over a short period may indicate a DDoS attack.

    Collective Anomalies

  • 82

    Applications of Anomaly Detection

    Fraud detection in banking and e-commerce, Network intrusion detection to identify malicious activity, Fault detection in manufacturing processes or machinery, Health monitoring to detect unusual patterns in patient data, Quality control in production processes.

  • 83

    This creates a compact representation of the data, providing a summary of key information

    Summarization

  • 84

    An example of this is: Summarizing a dataset by showing averages, counts, and other statistics.

    Summarization

  • 85

    This task reduces the complexity of the data by generating overviews and simplified reports.

    Summarization

  • 86

    2 types of summarization

    Extractive Summarization, Abstractive Summarization

  • 87

    Involves selecting key sentences or phrases directly from the original text to create a summary.

    Extractive Summarization

  • 88

    Techniques may include ranking sentences based on their importance using algorithms like TextRank or TF-IDF.

    Extractive Summarization

  • 89

    Involves generating new sentences that convey the main ideas of the text, potentially using different wording than the original.

    Abstractive Summarization

  • 90

    Often utilizes advanced machine learning models, such as transformer- based architectures (e.g., BERT, GPT).

    Abstractive Summarization

  • 91

    Applications of Summarization

    News aggregation services that provide concise articles, Summarizing research papers or reports for quick understanding, Document summarization in legal and business contexts, Enhancing user experience in chatbots by providing brief responses, Why These Tasks Matter

  • 92

    This helps in decision-making (e.g., identifying risks).

    Classification

  • 93

    It helps in segmenting customers or products for targeted marketing.

    Clustering

  • 94

    It helps in forecasting trends and making predictions.

    Regression

  • 95

    It helps businesses understand customer behavior.

    Association

  • 96

    This improves security by identifying irregularities.

    Anomaly Detection

  • 97

    It simplifies data interpretation by providing high-level insights.

    Summarization