Classification of data
Data is classified into the following types:
- Structured Data
- Semi-Structured Data
- Unstructured Data
Structured Data
Structured data refers to data that is organized and formatted in a consistent manner, following a specific data model. This type of data is typically stored in databases, spreadsheets, and tables, making it easy to search, retrieve, and analyze.
Key characteristics of structured data include:-- Organized Format: Structured data is organized into rows and columns, where each column represents a specific attribute or field, and each row represents a record or entry.
- Easy to Query: Due to its well-defined structure, structured data can be queried using database query languages like SQL.
- Efficient Storage: Structured data is often stored in relational databases, which optimize storage and retrieval processes for structured information.
Examples of structured data include:
- Employee records in an HR database (columns: employee ID, name, position, salary).
- Sales transactions in an e-commerce database (columns: order ID, customer ID, product ID, quantity, price).
employee ID | name | position | salary |
---|---|---|---|
1 | Akshay | Software Developer | 50,000 |
2 | Manish | Web Developer | 50,000 |
Semi-Structured Data
It is information that does not reside in a relational database or excel sheet but has some organizational properties. With some effort we can store them in relational database. Examples are JSON and XML.
Key characteristics of semi-structured data include:- Flexible Schema: Semi-structured data does not require a predefined schema like structured data. Instead, each data entry can have varying attributes and fields.
- Variability: Different data entries in semi-structured data can have different attributes, and the same attribute might not exist in all entries.
- Readable by Humans and Machines: Semi-structured data is typically human-readable due to its use of tags and identifiers. However, it can also be processed by machines for data analysis.
Examples of semi-structured data formats include:
- XML (eXtensible Markup Language): XML uses tags to define data elements. Example:
- JSON (JavaScript Object Notation): JSON represents data as key-value pairs and supports nesting. It is widely used for APIs and web services. Example:
<book>
<title>Introduction to Data Science</title>
<author>John Smith</author>
<year>2022</year>
</book>
{
"book": {
"title": "Introduction to Data Science",
"author": "John Smith",
"year": 2022
}
}
Unstructured Data
Unstructured data refers to data that lacks a predefined structure or does not fit neatly into a traditional tabular or relational format. This type of data can be more challenging to process and analyze compared to structured or semi-structured data.
Key characteristics of unstructured data include:- Lack of Structure: Unstructured data does not have a predefined schema or consistent format. It can vary widely in terms of content, length, and organization.
- Multiple Formats: Unstructured data can take various forms, such as text, images, audio, video, social media posts, emails, documents, and more.
- Human-Centric: Unstructured data is usually created and consumed by humans.
Examples of unstructured data include:
- Textual Data: Emails, social media posts, articles, blogs, and free-form text content.
- Images: Photographs, scanned documents, screenshots, and other visual data.
- Audio: Voice recordings, podcasts, and sound files.
- Video: Recorded videos, live streams, and multimedia content.