Classification of data

Data is classified into the following types:

  1. Structured Data
  2. Semi-Structured Data
  3. Unstructured Data

Structured Data

Structured data refers to data that is organized and formatted in a consistent manner, following a specific data model. This type of data is typically stored in databases, spreadsheets, and tables, making it easy to search, retrieve, and analyze.

Key characteristics of structured data include:-
  1. Organized Format: Structured data is organized into rows and columns, where each column represents a specific attribute or field, and each row represents a record or entry.
  2. Easy to Query: Due to its well-defined structure, structured data can be queried using database query languages like SQL.
  3. Efficient Storage: Structured data is often stored in relational databases, which optimize storage and retrieval processes for structured information.

Examples of structured data include:
  1. Employee records in an HR database (columns: employee ID, name, position, salary).
  2. Sales transactions in an e-commerce database (columns: order ID, customer ID, product ID, quantity, price).

Employee records
employee ID name position salary
1 Akshay Software Developer 50,000
2 Manish Web Developer 50,000

Semi-Structured Data

It is information that does not reside in a relational database or excel sheet but has some organizational properties. With some effort we can store them in relational database. Examples are JSON and XML.

Key characteristics of semi-structured data include:
  1. Flexible Schema: Semi-structured data does not require a predefined schema like structured data. Instead, each data entry can have varying attributes and fields.
  2. Variability: Different data entries in semi-structured data can have different attributes, and the same attribute might not exist in all entries.
  3. Readable by Humans and Machines: Semi-structured data is typically human-readable due to its use of tags and identifiers. However, it can also be processed by machines for data analysis.

Examples of semi-structured data formats include:
  1. XML (eXtensible Markup Language): XML uses tags to define data elements.
  2. Example:
    
    <book>
        <title>Introduction to Data Science</title>
        <author>John Smith</author>
        <year>2022</year>
    </book>                          
    
    
  3. JSON (JavaScript Object Notation): JSON represents data as key-value pairs and supports nesting. It is widely used for APIs and web services.
  4. Example:
    
    {
        "book": {
        "title": "Introduction to Data Science",
        "author": "John Smith",
        "year": 2022
        }
    }
    

Unstructured Data

Unstructured data refers to data that lacks a predefined structure or does not fit neatly into a traditional tabular or relational format. This type of data can be more challenging to process and analyze compared to structured or semi-structured data.

Key characteristics of unstructured data include:
  1. Lack of Structure: Unstructured data does not have a predefined schema or consistent format. It can vary widely in terms of content, length, and organization.
  2. Multiple Formats: Unstructured data can take various forms, such as text, images, audio, video, social media posts, emails, documents, and more.
  3. Human-Centric: Unstructured data is usually created and consumed by humans.

Examples of unstructured data include:
  1. Textual Data: Emails, social media posts, articles, blogs, and free-form text content.
  2. Images: Photographs, scanned documents, screenshots, and other visual data.
  3. Audio: Voice recordings, podcasts, and sound files.
  4. Video: Recorded videos, live streams, and multimedia content.