A. Data Completeness
Identify missing values in a dataset.
Check for incomplete records across all columns.
Highlight null entries that may affect analysis.
Detect partially filled fields in a dataset.
Assess the proportion of missing data per column.
Suggest strategies to impute missing values.
Flag records missing key identifiers.
Identify patterns in missing data.
Evaluate how missing data could bias results.
Detect rows with inconsistent missing values.
B. Data Consistency
Identify inconsistent formats in date columns.
Detect duplicate records.
Find discrepancies in categorical values.
Check for inconsistent capitalization in text fields.
Validate consistent use of measurement units.
Identify inconsistent encoding across datasets.
Detect conflicting entries for the same entity.
Highlight inconsistent naming conventions.
Check for repeated records with different timestamps.
Detect anomalies in sequential identifiers.
C. Data Accuracy
Verify numerical data against reference sources.
Check for typographical errors in text fields.
Validate addresses or geolocations.
Assess correctness of email formats.
Verify phone number formats by region.
Detect inaccurate product codes or IDs.
Validate currency or monetary values.
Identify improbable dates or ages.
Check financial data for calculation errors.
Compare historical data against expected trends.
D. Data Validity
Validate entries against allowed values.
Check date ranges for logical consistency.
Detect invalid numerical ranges.
Identify text fields containing forbidden characters.
Validate categorical variables against a predefined list.
Detect invalid Boolean entries.
Identify entries violating data type rules.
Check geographical data for invalid locations.
Validate JSON or XML structures in data fields.
Assess compliance with schema rules.
E. Data Uniqueness
Identify duplicate IDs or keys.
Detect repeated names or identifiers.
Validate uniqueness in transactional records.
Check uniqueness across multi-column combinations.
Flag repeated customer email addresses.
Identify duplicate product SKUs.
Validate unique order numbers.
Detect recurring reference codes.
Check for repeated event timestamps.
Identify duplicate entries in master datasets.
F. Data Timeliness
Check for outdated records.
Assess latency in data updates.
Detect missing timestamps.
Validate chronological order of events.
Identify records with stale data.
Assess how recent the data entries are.
Detect inconsistencies in temporal granularity.
Highlight late submissions in time-sensitive datasets.
Evaluate impact of delayed updates on analysis.
Detect gaps in continuous time series data.
G. Data Integrity
Detect referential integrity violations.
Check foreign key relationships between tables.
Identify orphan records in relational datasets.
Validate hierarchical consistency in organizational data.
Detect broken links between related tables.
Assess data integrity in merged datasets.
Check primary key violations.
Identify inconsistent parent-child relationships.
Validate linked datasets against master records.
Detect anomalies caused by data merges.
H. Data Reliability
Evaluate variance in repeated measurements.
Check for inconsistent sampling methods.
Assess reproducibility of results.
Identify unstable data sources.
Detect data with fluctuating accuracy.
Validate reliability of external datasets.
Check consistency across multiple sources.
Detect unreliable time series entries.
Assess sensor data quality for errors.
Identify human-entered data prone to mistakes.
I. Data Relevance
Detect obsolete columns or features.
Assess relevance of data for analysis tasks.
Identify redundant information.
Check for irrelevant records.
Flag data outside target scope.
Detect variables that do not impact outcomes.
Evaluate importance of new features in datasets.
Identify low-information columns.
Detect entries that reduce dataset quality.
Assess alignment with business objectives.
J. Data Accuracy in Labels (for ML datasets)
Detect mislabeled data in training sets.
Identify inconsistent class assignments.
Assess label accuracy using a reference set.
Detect overlapping or conflicting labels.
Validate annotation quality.
Flag incomplete or ambiguous labels.
Evaluate inter-annotator agreement.
Detect misaligned labels in multi-class datasets.
Assess labeling consistency across dataset versions.
Identify noise in target variable annotations.
K. Outliers and Anomalies
Detect numerical outliers.
Identify categorical anomalies.
Highlight statistical deviations.
Detect unexpected spikes in time series data.
Find improbable combinations of features.
Assess extreme values affecting analysis.
Identify data entry mistakes causing outliers.
Detect anomalies in geospatial data.
Highlight outliers in multi-dimensional datasets.
Detect sudden changes in trends.
L. Data Distribution
Evaluate uniformity of data distribution.
Identify skewed numerical features.
Detect class imbalance in categorical data.
Check for normality in statistical variables.
Assess distribution shifts across time periods.
Detect sampling bias in datasets.
Compare distributions across multiple sources.
Evaluate representativeness of datasets.
Detect overrepresented or underrepresented groups.
Check feature correlations for anomalies.
M. Data Formatting and Standardization
Detect inconsistent capitalization in text data.
Check for mixed measurement units.
Identify variations in date formats.
Standardize currency symbols.
Detect inconsistent decimal separators.
Validate standard abbreviations.
Detect mixed language entries.
Check formatting of identification numbers.
Identify inconsistent naming conventions.
Ensure standardized column headers.
N. Data Traceability
Check source information for each record.
Track changes in dataset versions.
Validate audit trails for critical data.
Assess ability to reproduce dataset generation.
Detect missing source references.
Ensure traceability of computed metrics.
Validate provenance of external datasets.
Check for missing documentation.
Assess lineage of merged data.
Ensure reproducibility of preprocessing steps.
O. Data Governance Compliance
Validate adherence to privacy regulations (GDPR, CCPA).
Check for sensitive personal information.
Detect unauthorized data sharing.
Assess compliance with industry standards.
Identify data retention policy violations.
Validate secure storage of confidential information.
Check data anonymization consistency.
Detect PII leakage in publicly accessible datasets.
Assess compliance with organizational policies.
Validate ethical use of datasets.

0 comments:
Post a Comment
We value your voice! Drop a comment to share your thoughts, ask a question, or start a meaningful discussion. Be kind, be respectful, and let’s chat!