爱情鸟第一论坛com高清免费_91免费精品国自产拍在线可以看_亚洲一区精品中文字幕_男人操心女人的视频

IEMS 5730代做、c++,Java語言編程代寫

時間:2024-03-12  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    爱情鸟第一论坛com高清免费_91免费精品国自产拍在线可以看_亚洲一区精品中文字幕_男人操心女人的视频
    <strike id="bfrlb"></strike><form id="bfrlb"><form id="bfrlb"><nobr id="bfrlb"></nobr></form></form>

        <sub id="bfrlb"><listing id="bfrlb"><menuitem id="bfrlb"></menuitem></listing></sub>

          <form id="bfrlb"></form>

            <form id="bfrlb"></form>

              <address id="bfrlb"></address>

              <address id="bfrlb"></address>
              亚洲免费在线精品一区| 午夜精品影院在线观看| 亚洲二区三区四区| 国产亚洲精品bt天堂精选| 久久精品成人一区二区三区| 国产精品久久久久久久久婷婷| 香蕉久久精品日日躁夜夜躁| 欧美日韩国产不卡在线看| 欧美精品日韩一区| 午夜伦理片一区| 欧美日本不卡高清| 亚洲欧美bt| 在线观看亚洲一区| 国产午夜精品久久久久久久| 久久久人成影片一区二区三区观看| 欧美日韩色综合| 亚洲欧洲精品天堂一级| 欧美日韩三级在线| 国产精品永久免费视频| 国产精品色在线| 午夜精品久久久久久久蜜桃app| 国产精品videosex极品| 亚洲日本aⅴ片在线观看香蕉| 久久成人这里只有精品| 国产自产2019最新不卡| 理论片一区二区在线| 欧美人成在线视频| 狠狠入ady亚洲精品| 免费av成人在线| 亚洲愉拍自拍另类高清精品| 日韩小视频在线观看专区| 狠狠狠色丁香婷婷综合久久五月| 久久久另类综合| 乱人伦精品视频在线观看| 亚洲激情亚洲| 久久青草欧美一区二区三区| 国产欧美一区二区在线观看| 欧美日韩国产黄| 你懂的网址国产 欧美| 精品999网站| 一区二区三区导航| 一区二区三区精密机械公司| 欧美日韩三级视频| 久久亚洲捆绑美女| 亚洲欧美制服中文字幕| 欧美xx69| 久久精品国产亚洲精品| 久久精品国产清自在天天线| 一区二区黄色| 欧美午夜a级限制福利片| 久久琪琪电影院| 一区二区三区在线观看欧美| 国产日韩欧美日韩大片| 久久久久91| 国产精品久久久一区麻豆最新章节| 亚洲大片在线| 亚洲人人精品| 欧美精品偷拍| 激情亚洲一区二区三区四区| 国产精品视频免费观看| 怡红院av一区二区三区| 乱人伦精品视频在线观看| 9l国产精品久久久久麻豆| 国产精品热久久久久夜色精品三区| 欧美国产另类| 久久人人九九| 国产三区二区一区久久| 久久久av毛片精品| 午夜精品久久久久久久| 午夜精品久久久久久久久久久久久| 国产精品qvod| 9色porny自拍视频一区二区| 亚洲香蕉伊综合在人在线视看| 激情小说另类小说亚洲欧美| 国产日产亚洲精品系列| 欧美成人日本| 久久久噜噜噜久久人人看| 亚洲激情成人网| 欧美日韩三级视频| 欧美超级免费视 在线| 亚洲男人第一av网站| 久久精视频免费在线久久完整在线看| 亚洲一区二区三区精品在线观看| 欧美日韩一区不卡| 欧美久久久久久蜜桃| 欧美不卡视频一区发布| 欧美日韩ab| 欧美日产一区二区三区在线观看| 午夜精品久久久久久久久久久| 激情成人综合| 国产精品制服诱惑| 国产欧美日本一区视频| 一区二区三区四区精品| 亚洲精品国产拍免费91在线| 久久婷婷国产麻豆91天堂| 亚洲视频网站在线观看| 欧美主播一区二区三区美女 久久精品人| 国产日韩精品一区二区浪潮av| 国产一区二区三区精品欧美日韩一区二区三区| 免费不卡亚洲欧美| 亚洲国产精品一区二区www| 亚洲视频在线观看网站| 国产亚洲日本欧美韩国| 久热国产精品| 久久综合九色综合欧美狠狠| 亚洲国产日韩欧美综合久久| 国产精品久久二区二区| 亚洲精品影院在线观看| 一本色道88久久加勒比精品| 亚洲精品无人区| 亚洲人成人一区二区三区| 国产伦精品一区二区三区视频黑人| 久久综合电影| 嫩草伊人久久精品少妇av杨幂| 国产精品久久久久毛片大屁完整版| 韩国视频理论视频久久| 尤物99国产成人精品视频| 欧美日韩精品一本二本三本| 久久成人这里只有精品| 欧美性一二三区| 亚洲中午字幕| 国产精品久久久久久久浪潮网站| 中文成人激情娱乐网| 亚洲视频1区| 韩国精品在线观看| 欧美精品aa| 欧美日本免费一区二区三区| 韩国欧美一区| 欧美综合国产精品久久丁香| 一区二区三区四区五区精品视频| 久久xxxx精品视频| 欧美成人一区二区三区| 国产精品国产三级国产专播精品人| 亚洲综合丁香| 一道本一区二区| 亚洲欧美激情一区| 欧美日本一区二区视频在线观看| 亚洲精品国产精品乱码不99| 欧美日韩国产成人精品| 国产精品久久久久久亚洲毛片| 国产精品av久久久久久麻豆网| 在线播放一区| 午夜精品电影| 久久免费精品视频| 欧美高清一区| 欧美日韩麻豆| 原创国产精品91| 亚洲精品免费在线播放| 国产精品亚洲网站| 亚洲精品视频免费观看| 亚洲欧美日韩精品久久| 久久电影一区| 欧美国产一区二区三区激情无套| 久久夜色撩人精品| 久久精品国产一区二区电影| 亚洲精品中文字幕在线| 激情欧美国产欧美| 国产精品亚洲一区| 国产女主播一区二区三区| 国产农村妇女精品一二区| 欧美激情一区三区| 国产精品久久久久久亚洲调教| 欧美一区深夜视频| 欧美日韩视频在线| 久热国产精品| 国产精品色一区二区三区| 欧美日韩黄视频| 在线观看日韩www视频免费| 国产欧美精品一区| 欧美片在线观看| 一本一本久久a久久精品牛牛影视| 麻豆精品视频在线观看视频| 尤物yw午夜国产精品视频明星| 亚洲国产中文字幕在线观看| 国产精品久久久久久久久久直播| 亚洲国产美女| 国产精品成人播放| 久久久久久久国产| 亚洲乱码国产乱码精品精可以看| 欧美美女操人视频| 亚洲综合视频网| 极品尤物一区二区三区| 欧美国产日韩一区二区在线观看| 一本色道婷婷久久欧美| 亚洲欧洲综合另类| 亚洲一区二区在线播放| 欧美美女bbbb| 欧美成人免费小视频| 欧美国产亚洲精品久久久8v| 激情欧美一区二区三区在线观看| 亚洲福利电影| 亚洲人成人一区二区三区| 99香蕉国产精品偷在线观看| 欧美精品久久99久久在免费线| 欧美激情区在线播放| 亚洲国产高清在线观看视频| 欧美日韩高清免费| 国产一区二区三区四区hd| 国产字幕视频一区二区| 欧美日韩你懂的|