Readings In Data Mining - Association Rule Mining

CSc 80000 Section 5

 

 

Susan P. Imberman Ph.D.

Assistant Professor

Computer Science Department

College of Staten Island, CUNY

Voice Mail: 718-982-3273

Department Office: 718-982-982-2850

Email: imberman at mail dot csi dot cuny dot edu

Home Page: www.cs.csi.cuny.edu/~imberman/

 

SYLLABUS      CLICK HERE

 

LECTURE NOTES

Lecture 1 KDD Beginnings

 

Lecture 2 - Why do Statisticians "hate" us?!

 

Lecture 3 - Apriori

 

Lecture 4 - Student Presentations   Anyone who presents and wishes me to link to  or display their presentation, let me know.

 

Lecture 5 - Incremental Association Rules

 

Lecture 6 - Incremental Association Rules - UWEP/NUWEP

 

Lecture 7 - Student Presentations

 

FOR PRESENTATION II

Your second presentation will be on an application of association rules.  The application can deal with general association rules or one of the more "special" types we have discussed this semester such as incremental, temporal, constraints, quantitative, etc.  Each person will have 10 minutes to present.   You may use the KDD conference industrial track, IEEE International conference for Data Mining (ICDM ), or www.kdnuggets.com for possible topics.  Any applications that you have worked on are welcome too, as long as we don’t violate any proprietary contracts.  Have fun J .

 

FOR THOSE WHO LIKE TO PLAY

WEKA  http://www.cs.waikato.ac.nz/ml/weka/

 

Christian Borgelt's version of Apriori

 

READINGS

 

Apriori:

R. Srikant, R. Agrawal: "Mining Generalized Association Rules", Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, Sep. 1995. Expanded version available as IBM Research Report RJ 9963, June 1995.

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo: "Fast Discovery of Association Rules", Advances in Knowledge Discovery and Data Mining, Chapter 12, AAAI/MIT Press, 1995.

R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules", Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. Expanded version available as IBM Research Report RJ9839, June 1994.

R. Agrawal, T. Imielinski, A. Swami: "Mining Associations between Sets of Items in Massive Databases", Proc. of the ACM-SIGMOD 1993 Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216.

 

Improving Apriori:

FP-Growth - J. Han, J. Pei, and Y. Yin, `` Mining Frequent Patterns without Candidate Generation (PDF)'', (Slides), Proc. 2

 

Dynamic Hashing and Pruning (DHP)    Jong Soo Park, Ming-Syan Chen, and Philip S. Yu, "An effective hash-based algorithm for mining association rules," Proceedings of the 1995 ACM SIGMOD, pp. 175-186, San Jose, May 1995. 000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), allas, TX, May 2000.

Sampling  Sampling large databases for association rules , Hannu Toivonen. In 22th International Conference on Very Large Databases (VLDB'96), 134 - 145, Mumbay, India, September 1996. Morgan Kaufmann.

Partition   Ashok Savasere, Edward Omiecinski, and Shamkant Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st VLDB Conference, pages 432-443, Zurich, Switzerland, 1995.

 

Incremental Association Rule Algorithms:

FUP - David W. Cheung, J. Han, V. Ng, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Techniques. Proc. 12th IEEE International Conference on Data Engineering (ICDE-96), New Orleans, Louisiana, U.S.A., March, 1996.

 

FUP2 - David W. Cheung, S.D. Lee, B. Kao, A General Incremental Technique for Updating Discovered Association Rules. Proc. International Conference On Database Systems For Advanced Applications (DASFAA-97), Melbourne, Australia, April, 1997.

 

UWEP - Necip Fazil Ayan, Abdullah Uz Tansel, and Erol Arkun. "An Efficient Algorithm to Update Large Itemsets with Early Pruning". ACM SIGKDD Intl. Conf. on Knowledge Discovery in Data and Data Mining(SIGKDD'99), San Diego, California, August 1999.

Negative Borders - Shiby Thomas, Sreenath Bodagala, Khaled Alsabti, Sanjay Ranka. An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases. In Proceedings of the 3rd International conference on Knowledge Discovery and Data Mining (KDD 97), New Port Beach, California. August 1997.

NUWEP - An Efficient Method For Finding Emerging Large Itemsets, Susan P. Imberman, Abdullah Uz Tansel, Eric Pacuit, The Third Workshop on Mining Temporal and Sequential Data, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2004

 

Sliding Window Filtering  - C.-H. Lee, C.-R. Lin and M.-S. Chen, ``Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining,'' Proc. of the ACM 10th International Conference on Information and Knowledge Management (CIKM-01), November 5-10, 2001, pp. 263-270.

 

More ….