1. What is Data?
Data is any collection of facts, values, or measurements that can be recorded, stored, and processed by computers or humans.
- Simple examples: Numbers (42), words (“hello”), dates (2025-08-06), true/false (yes/no), files, images, videos, etc.
- In technology: Data is the raw material for software, analytics, machine learning, business intelligence, etc.
2. Types of Data
Data can be classified in several ways. The most common are:
A. By Structure
Type | Description | Example |
---|
Structured | Organized in fixed fields/columns, like a table | Databases, Excel spreadsheets |
Semi-Structured | Has some organization but not a strict schema | JSON, XML, CSV, log files |
Unstructured | No pre-defined format or schema | Images, videos, emails, PDFs |
B. By Nature/Content
Type | Description | Example |
---|
Numeric | Numbers, integers, decimals | 100, 3.14, -7 |
Textual | Words, sentences, text blocks | “Customer name”, “Review: great product” |
Categorical | Labels or categories | Red/Green/Blue, Male/Female |
Boolean | True/False, Yes/No, 0/1 | true, false, 1, 0 |
Date/Time | Dates, timestamps | 2025-08-06, 12:30 PM |
Spatial | Locations, coordinates | GPS points, maps |
Multimedia | Images, audio, video, graphics | profile_pic.jpg, song.mp3, video.mp4 |
C. By How It Arrives
Type | Description | Example |
---|
Batch | Collected and processed in chunks | Daily sales report, nightly backups |
Streaming | Arrives and processed in real-time | Website clicks, sensor data, live chat |
3. Sources of Data
Data can come from almost anywhere. Here are typical sources in tech and business:
Source | Description | Example |
---|
Transactional | Systems that record daily business | Sales databases, banking systems |
Operational | Logs/events from running systems | Web server logs, app logs, error logs |
External APIs | Data from third-party services | Weather APIs, social media APIs, payment APIs |
Manual Entry | Human input | Online forms, surveys, spreadsheets |
IoT/Sensors | Physical devices measuring things | Thermometers, cameras, GPS trackers |
Files/Blobs | Uploaded or shared files | CSVs, Excel, PDFs, videos |
Web Scraping | Data extracted from websites | Product prices, news headlines |
Public Data Sets | Open government or community data | Census data, COVID-19 stats, Wikipedia dumps |
Streaming Services | Live feeds, events | Kafka, Kinesis, Event Hubs |
Summary Table
Type | Examples |
---|
Structured | SQL databases, Excel tables |
Semi-Structured | JSON, XML, CSV |
Unstructured | Images, emails, audio, PDFs |
Numeric | 1, 2.5, -99 |
Textual | “Hello”, “Feedback” |
Categorical | “Red”, “Male”, “Success” |
Boolean | true/false, 0/1 |
Date/Time | 2024-05-23, 17:45 |
Batch | Daily ETL jobs |
Streaming | Real-time clickstream, IoT |
Sources | Databases, APIs, Logs, Sensors, Files, Scraping |