Unstructured Data Mining
Course Code 620089
Semester 2025-1
Schedule Tue3/Thu1/Thu2
Credits 3/3
Year Year 3
Department 경영학
Instructor: 김용희
Email: yhkim1981@sunmoon.ac.kr
Course Description
This course aims to teach methods for handling unstructured data mining, which is at the core of big data. Since modern data environments utilize both structured and unstructured data together, students will review the basics of structured data analysis and expand to unstructured data mining. Students will learn various unstructured data analysis techniques including Python programming basics, web crawling for data collection, text mining, sentiment analysis, and social media analysis.
Learning Objectives
- Review and deepen understanding of structured data analysis techniques
- Understand the characteristics and analytical methodologies of unstructured data
- Acquire skills in unstructured data collection and preprocessing using Python
- Learn key unstructured data analysis techniques such as text mining, sentiment analysis, and social media analysis
- Understand integrated analysis methods for structured and unstructured data
- Strengthen data analysis capabilities through practice-oriented projects
Evaluation
Midterm
25%
Final
25%
Project
25%
Attendance
15%
Participation
10%
Weekly Schedule
| Week | Topic | Details |
|---|---|---|
| 1 | Course Introduction and Big Data Concepts | Course overview, big data concepts and importance, differences between structured/unstructured data |
| 2 | Python Programming Basics (1) | Python syntax fundamentals, data structures, NumPy basics |
| 3 | Python Programming Basics (2) | Pandas, Matplotlib, Seaborn data visualization |
| 4 | Advanced Structured Data Analysis | Descriptive statistics, exploratory data analysis, correlation analysis |
| 5 | Introduction to Web Data Collection | Understanding web structure, Requests, BeautifulSoup |
| 6 | Practical Web Data Collection | Public data, Open APIs, web scraping ethics |
| 7 | Text Preprocessing and Analysis Basics | Text preprocessing, KoNLPy usage |
| 8 | Midterm Exam | |
| 9 | Gaining Insights from Text | Word frequency analysis, word clouds, TF-IDF |
| 10 | Understanding Opinions through Sentiment Analysis | Sentiment analysis, review analysis, social media sentiment analysis |
| 11 | Discovering Hidden Topics in Documents | Topic modeling, news article analysis |
| 12 | Understanding Data through Network Relationships | Network analysis, graph theory, social media influence analysis |
| 13 | Analyzing Multiple Data Types Together | Text-numeric integrated analysis, image data basics |
| 14 | Data Storytelling Workshop | Data storytelling, effective visualization |
| 15 | Team Project Presentation and Final Exam | Final presentations and evaluation |
Textbook
Self-developed PPT materials