Description
Finding Prominent Streaks in NBA Boxscores
1. Description of Task
You code should accomplish the following tasks:
(1) Read the text file 1991-2004-nba.dat. In this file, each line is the boxscore of an NBA player in an NBA game.
The records are already sorted by players and then by dates, i.e., all the boxscores of a player are in
consecutive rows, in chronological order.
You only need to use the first column (player ID) and the last column (number of points by the player in a game)
of this file.
We provide a skeleton file “P3_skeleton.py” to you. It already has the code for reading from the data file.
(2) Compute the prominent streaks in this dataset. For each player, the sequence has all the points the player
made in his games. There are multiple sequences, one for each player. Each prominent streak is a subsequence of a player’s sequence. It must not be dominated by any other streak in any player’s sequence.
For the concepts and algorithms of prominent streaks, refer to our lecture vidoes, slides, as well as the following
two papers.
Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, and Yong Yu. Prominent Streak Discovery in Sequence Data. In
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD), pages 1280-1288, San Diego, California, USA, August 2011.
(http://ranger.uta.edu/~cli/pubs/2011/streak-kdd11-jllwy-jun11.pdf (http://ranger.uta.edu/~cli/pubs/2011/streakkdd11-jllwy-jun11.pdf))
Gensheng Zhang, Xiao Jiang, Ping Luo, Min Wang, and Chengkai Li. Discovering General Prominent Streaks in
Sequence Data. In ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2): 9:1-9:37, June 2014.
(http://ranger.uta.edu/~cli/pubs/2014/streak-tkdd-jllwy-sept13.pdf (http://ranger.uta.edu/~cli/pubs/2014/streaktkdd-jllwy-sept13.pdf))
2. Expected Output
Your code should return the following results. Each prominent streak should be displayed in the format of (player
ID, beginning index of the streak, length of the streak, minimum value in the streak). The beginning index
provides the position of the left end of the streak. In a player’s seequence, the index of the first element is 0.
Note that the prominent streaks in the output can be in any order, depending on the particular way an
implementation finds the prominent streaks.
In a typical run, our solution code finishes computing prominent streaks in less than half a second on our
computer. (Reading the data file itself takes 20-30 seconds.) When we test your code on the same computer,
your code is expected to achieve the same efficiency, in order to be considered efficiently implemented. Note
that an implementation of the brute-force baseline method took several mminutes to finish.
In [ ]: Reading the data file takes 22.284526109695435 seconds.
Computing prominent streaks takes 0.49631690979003906 seconds.
[(‘BRYANKO01’, 457, 4, 42), (‘BRYANKO01’, 457, 9, 40), (‘BRYANKO01’, 453
, 13, 35), (‘BRYANKO01’, 453, 14, 32), (‘BRYANKO01’, 453, 16, 30), (‘IVE
RSAL01’, 305, 27, 26), (‘IVERSAL01’, 554, 2, 51), (‘IVERSAL01’, 550, 45,
21), (‘IVERSAL01’, 550, 57, 20), (‘JACKSJI01’, 0, 1185, 0), (‘JAMISAN0
1’, 107, 2, 51), (‘JORDAMI01’, 196, 17, 27), (‘MALONKA01’, 176, 159, 14
), (‘MALONKA01’, 72, 263, 13), (‘MALONKA01’, 72, 357, 12), (‘MALONKA01’,
482, 96, 17), (‘MALONKA01’, 459, 119, 16), (‘MALONKA01’, 430, 149, 15),
(‘MALONKA01’, 72, 527, 11), (‘MALONKA01’, 24, 575, 10), (‘MALONKA01’, 2
4, 758, 7), (‘MALONKA01’, 0, 866, 2), (‘MCGRATR01’, 380, 34, 24), (‘ONEA
SH01’, 373, 74, 19), (‘ONEASH01’, 353, 94, 18), (‘ONEASH01’, 0, 858, 6),
(‘ROBINDA01’, 229, 1, 71), (‘STOCKJO01’, 0, 932, 1)]
3. What to Submit
You are required to submit a single .py file of your code. You are expected to use the code in P3_skeleton.py.
You algorithm should be in function ”prominent_streaks(sequences)”. You have the freedom to define any other
functions that you deem necessary. You shouldn’t change any existing code in the file.
4. Grading Rubrics
Your program will be evaluated on correctness, efficiency, and code quality.
Make sure to thoroughly understand the grading rubrics in file “rubrics_P3.txt”.
In [ ]: