Yun Huang

Research Interests • Education • Projects and Publications • Invited Talks and Tutorial • Awards • Other Fun Projects

Yun Huang

PhD
Intelligent Systems Program
School of Computing and Information
University of Pittsburgh
[CV]

Information Sciences Building
135 North Bellefield Ave., Room 2A04,
Pittsburgh, PA 15213

yuh43@pitt.edu
huangyun.ai@gmail.com

News: I defended my PhD dissertation recently (July 5, 2018) and I am looking for collaborative projects.

I am passionate about building intelligent tutoring systems that promote engaging and robust learning. So far, I am working towards this goal by student modeling combining machine learning and cognitive psychology. I love simplicity, generality and interpretability in building solutions. I identify myself as a problem-driven researcher and aim at becoming a creative problem-solving researcher.

My advisor is Dr. Peter Brusilovsky. My dissertation committee members are: Dr. Marek Druzdzel (UPitt SCI), Dr. Christian Schunn (UPitt LRDC), Dr. Kenneth Koedinger (CMU HCII). My early projects received mentoring from Dr. Jose P. González-Brenes. I am maintaining our student modeling github repository in ml-smores.

Previously, I have been researching on Natural Language Processing and the application of Machine Learning in Beijing University of Posts and Telecommunications (top institution in Information Science in China) and Institute of Computing Techonology, Chinese Academy of Sciences (top institution in Computer Science in China) from 2010~2012.

Research Interests

Student Modeling, Cognitive Modeling, Skill Modeling

Intelligent Tutoring Systems, Educational Data Mining, Machine Learning and Artificial Intelligence in Education

Education

08/2012 - Now	Ph.D. in Intelligent Systems Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
08/2012 - 12/2015	M.S. in Intelligent Systems Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
09/2007 - 06/2011	B.E. in Intelligence Science and Technology (equivalent to Artificial Intelligence) School of Computer Science, Beijing University of Posts and Telecommunications (BUPT), Beijing, China

Projects and Publications [Google Scholar]

My work can be classified into three types: 1) model building, 2) model application, and 3) model behavior analytics. They can also be classified by topics:

	Feature-Aware Student Knowledge Tracing (More than 70 citations in total by 12/2017) 09/2013~now, Personalized Adaptive Web Systems Lab, University of Pittsburgh Huang, Y., González-Brenes, J. P., and Brusilovsky, P. The FAST toolkit for Unsupervised Learning of HMMs with Features. In: The Machine Learning Open Source Software workshop at the 32nd Intl. Conf. on Machine Learning (ICML-MLOSS 2015), Lille, France, 2015. [code] González-Brenes, J. P., Huang, Y., and Brusilovsky, P. General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge. In: The 7th Intl. Conf. on Educational Data Mining (EDM 2014). *( Co-first authors. Nominated for the Best Paper Award.)** [paper] [code] [slide] [tutorial slide] Khajah, M. M., Huang, Y., González-Brenes, J. P., Mozer, M. C., and Brusilovsky, P. Integrating Knowledge Tracing and Item Response Theory: A Tale of Two Frameworks. In: The 4th Intl. Workshop on Personalization Approaches in Learning Environments (PALE 2014) in the 22nd Conf. on User Modeling, Adaptation and Personalization (UMAP 2014). ( Co-first authors.)** [paper] [slide] González-Brenes, J. P., Huang, Y., and Brusilovsky, P.. FAST: Feature-Aware Student Knowledge Tracing. In: NIPS 2013 Workshop on Data Driven Education (NIPS 2013), Nevada, USA. [paper]
	Multiple Skill Student Modeling and Performance Prediction 08/2013~now, Personalized Adaptive Web Systems Lab, University of Pittsburgh Huang, Y., Guerra-Hollstein, J., Barria-Pineda, J., and Brusilovsky, P. Learner Modeling for Integration Skills. In: The 25th Conference on User Modeling, Adaptation and Personalization (UMAP 2017) (pp. 85-93). [paper] [slide] Huang, Y., Guerra-Hollstein, J., and Brusilovsky, P. Modeling Skill Combination Patterns for Deeper Knowledge Tracing. In: The 6th Intl. Workshop on Personalization Approaches in Learning Environments (PALE 2016) in the 24th Conf. on User Modeling, Adaptation and Personalization (UMAP 2016). [paper] Huang, Y., and Brusilovsky, P. Towards Modeling Chunks in a Knowledge Tracing Framework for Students’ Deep Learning. In: The 9th Intl. Conf. on Educational Data Mining (EDM 2016) Doctoral Consortium (pp. 666-668). [paper] Huang, Y. Deeper Knowledge Tracing by Modeling Skill Application Context for Better Personalized Learning. In: The 24nd Conference on User Modeling, Adaptation and Personalization (UMAP 2016) Doctoral Consortium (pp. 325-328). [paper] Huang, Y., Guerra-Hollstein, J., and Brusilovsky, P. A Data-Driven Framework of Modeling Skill Combinations for Deeper Knowledge Tracing. In: The 9th Intl. Conf. on Educational Data Mining (EDM 2016) (pp. 593-594). [paper] Huang, Y., Xu, Y., and Brusilovsky, P. Doing More with Less: Student Modeling and Performance Prediction with Reduced Content Models. In: The 22nd Conference on User Modeling, Adaptation and Personalization (UMAP 2014). (short presentation for full paper) [paper] [slide] Sahebi, S., Huang, Y., and Brusilovsky, P. Predicting Student Performance in Solving Parameterized Exercises. In: Proceedings of 12th Intl. Conf. on Intelligent Tutoring Systems (ITS 2014), Honolulu, USA, 2014. (short paper) [paper] Sahebi, S., Huang, Y., and Brusilovsky, P. Parameterized Exercises in Java Programming: Using Knowledge Structure for Performance Prediction. In: Proceedings of the 2nd Workshop on AI-supported Education for Computer Science in the 12th Intl. Conf. on Intelligent Tutoring Systems (ITS-AIEDCS 2014), Honolulu, USA, 2014. [paper]
	Making Student Models Reliable and Actionable 09/2014~now, Personalized Adaptive Web Systems Lab, University of Pittsburgh 05/2014~08/2015, Pearson's School Research Innovation Network (Summer Intern) Huang, Y., González-Brenes, J. P., Kumar, R., Brusilovsky, P. A Framework for Multifaceted Evaluation of Student Models In: Proceedings of the 8th Intl. Conf. on Educational Data Mining (EDM 2015), Madrid, Spain. (Full paper) [paper][slide] Huang, Y., González-Brenes, J. P., Brusilovsky, P. Challenges of Using Observational Data to Determine the Importance of Example Usage. In: Proceedings of the 17th Intl. Conf. on Artificial Intelligence in Education (AIED 2015), Madrid, Spain. (Short paper) [paper] Gonzalez-Brenes, J. P., Huang, Y. Your model is predictive— but is it useful? Theoretical and Empirical Considerations of a New Paradigm for Adaptive Tutoring Evaluation. In: Proceedings of the 8th Intl. Conf. on Educational Data Mining (EDM 2015), Madrid, Spain, 2015. (Full paper) [paper] Gonzalez-Brenes, J. P., Huang, Y. The Leopard Framework: Towards understanding educational technology interventions with a Pareto Efficiency Perspective. In: The ICML 2015 Workshop on Machine Learning for Education (ICML 2015), Lille, France, 2015. [paper] Gonzalez-Brenes, J. P., Huang, Y. Using Data from Real and Simulated Learners to Evaluate Adaptive Tutoring Systems. In: Proceedings of 2nd Workshop on Simulated Learners at the 17th Intl. Conf. on Artificial Intelligence in Education (AIED 2015), Madrid, Spain, 2015 (pp. 31-34). [paper] Gonzalez-Brenes, J. P., Huang, Y. The White Method: Towards Automatic Evaluation Metrics for Adaptive Tutoring Systems. In: NIPS 2014 Workshop on Human Propelled Machine Learning (NIPS 2014), Montreal, Canada, December 13, 2014.
	Textbook-based Student Modeling and Skill Modeling 09/2015~now, Personalized Adaptive Web Systems Lab, University of Pittsburgh Thaker, K., Huang, Y., Brusilovsky, P., & Daqing, H. (2018, July). Dynamic Knowledge Modeling with Heterogeneous Activities for Adaptive Textbooks. In The 11th International Conference on Educational Data Mining (pp. 592-595). [paper] *( Co-first authors)** Labutov, I, Huang, Brusilovsky, P., and He, D. Semi-Supervised Techniques for Mining Learning Outcomes and Prerequisites. In: The 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2017) (pp. 907-915). [paper] Huang, Y., Yudelson, M., Han, S., He, D., and Brusilovsky, P. A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning. In: The 24th Conference on User Modeling, Adaptation and Personalization (UMAP 2016) (pp. 141-150). [paper] [slide] Meng, R., Han, S., Huang, Y, He, D., and Brusilovsky, P. Knowledge-based Content Linking for Online Textbooks. In: Proceeding of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 13-16. IEEE Computer Society (WI 2016).
	Network Analysis for Student Models and Open Student Model 09/2014~now, Personalized Adaptive Web Systems Lab, University of Pittsburgh Barria-Pineda, J., Guerra-Hollstein, J., Huang, Y., and Brusilovsky, P. Concept-Level Knowledge Visualization For Supporting Self-Regulated Learning. In: The 22nd International Conference on Intelligent User Interfaces (IUI 2017) (pp. 141-144). [paper] Guerra, J., Huang, Y., Hosseini, R., Brusilovsky, P. Graph Analysis of Student Model Networks. In: The 1st International Workshop on Graph-Based Educational Data Mining at the 8th Intl. Conf. on Educational Data Mining (EDM-GEDM 2015). [paper] Guerra, J., Huang, Y., Hosseini, R., Brusilovsky, P. Exploring the Effects of Open Social Student Model Beyond Social Comparison. In: The 4th Workshop on Intelligent Support for Learning in Groups at the 17th Intl. Conf. on Artificial Intelligence in Education (AIED-ISLG 2015) (pp. 19-24). [paper]
	*Russian Morphological Segmentation Based on a Small-scale Russian and Chinese Bilingual Dictionary (Bachelor's Thesis)* 10/2010~06/2011, Institute of Computing Technology, Chinese Academy of Sciences Huang, Y., Jiang, W., Wang, Z., Zhu, J., Lv, Y., and Liu, Q. Russian Morphological Segmentation Based on a Small-scale Russian and Chinese Bilingual Dictionary. In: the 7th China Workshop on Machine Translation (CWMT 2011) (pp. 185-193). [paper]
	Language Model, Lexical Analysis and Machine Translation 10/2010~04/2012, Institute of Computing Technology, Chinese Academy of Sciences 08/2010, Institute of Automation, Chinese Academy of Sciences Lexical Analysis and Machine Translation of Morphologically Rich Languages (1) Conducted code recognition and conversion, named entity recognition and translation for morphologically rich languages. (2) Studied on statistical lexical analysis including supervised and unsupervised ones of morphologically rich languages and assisted in incorporating rules into a statistical Mongolian lexical analysis system. Experimenting on Mongolian to Chinese, Russian to Chinese machine translation systems with multiple granularities using popular software (Moses, GIZA). Evaluation of Machine Translation Systems Zhao, H., Lv, Y., Ben, G., Huang, Y., and Liu, Q. Summary on CWMT2011 MT Translation Evaluation. In: Journal of Chinese Information Processing, Vol. 26, No. 1, Jan., 2012 [paper] Searching for Colloquial Style Open Corpus and Studying on Adaptive Language Models Implemented effective crawlers and extracted useful information by regular expression and revised a program of language models and proposed a three-layer adaptive language model in order to make the full use of available resources.
	Word Sense Induction Using Cluster Ensemble 05/2010 ~ 06/2010, Center of Intelligence Science and Technology, Beijing University of Posts and Telecommunications *Rank 1st in the National Word Sense Induction (WSI) task in CIPS-ACL SIGHAN 2010, Chief Team Member* (1) Submitted 4 systems ranked 1st to 4th with the 1st using cluster ensemble, feature combination. (2) Proposed and implemented effective feature extraction methods to solve data sparseness problem, such as combining unigrams and bigrams in different window lengths with different weighting methods. Implemented the LAC clustering system(ranked 2nd) and made effort to improve it (by deriving a new distance formula, etc.) Zhang, B., Sun, J.n, Deng, L., Huang, Y., Li, J., Liu, Z., Zuo, P. Word Sense Induction using Cluster Ensemble. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP 2010) (pp. 420–427). Beijing, China, 2010. [paper]

Invited Talks and Tutorial

	Huang, Y. The FAST Model for Efficiently Incorporating General Features to Infer Student Knowledge: Applications, Evaluations and Issues. In Tepper School of Business, Carnegie Mellon University (CMU) (2016). [abstract]
	González-Brenes, J., Huang, Y., and Brusilovsky, P. The Fast Model: Integrating Learning Science and Measurement. In: National Council on Measurement in Education (NCME 2016). [program]
	Huang, Y. Fast and Meaningful Inference of Student Knowledge from Data. In: The 1st Workshop on Mining and Visualization of Social Networks and Data. In the Informatic Institute, Universidad Austral de Chile (2015).
	González-Brenes, J., Chang, K., Yudelson, M., Bergner, Y., Huang, Y. Student Modeling Applications, Recent Developments & Toolkits (SMART tutorial). In conjunction with 8th International Conference on Education Data Mining (EDM 2015). [tutorial]

Awards

	Andrew Mellon Predoctoral Fellowship Awards, University of Pittsburgh, 2016-2017.
	Andrew Mellon Predoctoral Fellowship Awards, University of Pittsburgh, 2012-2013.
	First-Class Scholarship (top 5%), Beijing University of Posts and Telecommunications, 2009-2010.
	First-Class Scholarship (top 5%), Beijing University of Posts and Telecommunications, 2008-2009.
	Second-Class Scholarship (top 10%), Beijing University of Posts and Telecommunications, 2007-2008.

Other Fun Projects

I have worked on three types of projects: 1) Education oriented, 2) Machine Learning oriented and 3) Natural Language Processing oriented.

	The Future Education --- A Design Incorporating Multimodal Information Processing This is an imaginative proposal for my undergraduate Multimodal Information Processing course. I set core principles of being ubiquitous, connecting the real life, inspiring creativity and maintaining students’ health for a future educational system and proposed ways to realize it by comprehensive human-computer interactions and virtual environments (using technology like Microsoft Surface, embedded systems, multimedia interaction) as well as intelligent software (with biological features and emotion recognition) able to communicate, suggest and gather experience from anyone at anytime. Interestingly, after I came to U.S., I found out that some of the ideas are being realized. The dreaming of a future where everyone has the equal chance to get educated with full potential and full fun is motivating me!
	Sequence Mining and Feature Engineering in Predicting Student Learning Experience Patterns This is a project for my first year Machine Learning graduate course in UPitt. We designed approaches to use sequential pattern mining to analyze sequences of responses on parameterized Java programming exercises of different level of complexity, and also of students with different strength. We conducted feature engineering on pattern features, trained and evaluated classifiers using historical patterns to predict future patterns of students on future practices. It later turned into a full paper in the Education Data Mining conference (EDM) 2014.
	Machine Learning Algorithms Implementation I have been implementing some ML algorithms since my undergraduate period till now. I implemented following algorithms with improvements in specific designs concerning accuracy, learning speed, data sparseness: Name Entity Recognizer using linear chain CRF and HMM, Gibbs Sampling for SVM, Decision Tree, Naïve Bayes, Back Propagation Artificial Neural Network, Adaboost, Genetic Algorithm, HAC clustering, EM clustering, and several feature selection methods.
	Efficient Structure Learning of SNP-Gene Network This is a project for my third year Probabilistic Graphical Model graduate course in CMU. We investigated the effect of adding prior knowledge to existed structure learning algorithms for SNP-gene network estimation, and used a new mixed graphical model to estimate the SNP-gene network. We investigated two hypotheses in order to find out the best model formulation for eQTL datasets. With regards to the first hypothesis, whether a joint model is better than two-stage model, we found out that using prior knowledge estimated from a high performance algorithm can improve the prediction accuracy dramatically while keeping similar or less computational time for the two-stage model, making it more advantageous to a joint model. With respect to the second hypothesis, there was no clear evidence showing that mixed graphical model is a better choice than Gaussian graphical model.
	Deep Sentiment-Topic Modeling for Predicting and Diagnosing Review Ratings This is a project for my second year Natural Langauge Processing graduate course in UPitt. We predicted and diagnosed the Yelp restaurant reviews with deep sentiment-topic modeling. After conducting preprocessing including spelling correction from the informal writing, and using deep sentiment analysis combined with part-of-speech tagging to find the useful information presented in the reviews, we then adopted supervised Latent Dirichlet Allocation and also Support Vector Regression models to predict the review rating. We found out that although deeper language feature engineering doesn’t improve the prediction performance, yet it allows the topic model to extract more concise, cohesive topic representation of reviews (by t-SNE 2D embedding). Also, topic-level features improve the interpretability of feature vectors, and more effectively alleviate over-fitting compared to word-level features.
	An Intelligent Oral Command Recognition System This is a project for my third year Natural Langauge Processing undergraduate course. We implemented a part-of-speech part using Viterbi algorithm and a segmentation part using dynamic programming. The system can decompose a complicated oral command into basic commands with a statistic-rule joint framework.
	Information Retrieval Related Projects The Influence of Segmentation on IR is a project for my third year Information Retrieval undergraduate course. We did experiments on different segmentation methods (considering query words’ length, proportion of noise, etc.) and different combination of segmentation in query and documents based on the available IR system Lemur. We drew interesting conclusions based on experiment results. Comparison of Different Retrieval Models in IR is another project in the course. We compared different retrieval models (BM25, PL2) with experiments on different parameters. We proposed a new weighting model combining the advantages of BM25, KLD, DFR.

Research Interests • Education • Projects and Publications • Invited Talks and Tutorial • Awards • Other Fun Projects