爱情鸟第一论坛com高清免费_91免费精品国自产拍在线可以看_亚洲一区精品中文字幕_男人操心女人的视频

代寫COM6511、代做Python設計程序

時間:2024-04-30  來源:  作者: 我要糾錯



COM4511/COM6511 Speech Technology - Practical Exercise -
Keyword Search
Anton Ragni
Note that for any module assignment full marks will only be obtained for outstanding performance that
goes well beyond the questions asked. The marks allocated for each assignment are 20%. The marks will be
assigned according to the following general criteria. For every assignment handed in:
1. Fulfilling the basic requirements (5%)
Full marks will be given to fulfilling the work as described, in source code and results given.
2. Submitting high quality documentation (5%)
Full marks will be given to a write-up that is at the highest standard of technical writing and illustration.
3. Showing good reasoning (5%) Full marks will be given if the experiments and the outcomes are explained to the best standard.
4. Going beyond what was asked (5%)
Full marks will be given for interesting ideas on how to extend work that are well motivated and
described.
1 Background
The aim of this task is to build and investigate the simplest form of a keyword search (KWS) system allowing to find information
in large volumes of spoken data. Figure below shows an example of a typical KWS system which consists of an index and
a search module. The index provides a compact representation of spoken data. Given a set of keywords, the search module
Search Results
Index
Key− words
queries the index to retrieve all possible occurrences ranked according to likelihood. The quality of a KWS is assessed based
on how accurately it can retrieve all true occurrences of keywords.
A number of index representations have been proposed and examined for KWS. Most popular representations are derived
from the output of an automatic speech recognition (ASR) system. Various forms of output have been examined. These differ
in terms of the amount of information retained regarding the content of spoken data. The simplest form is the most likely word
sequence or 1-best. Additional information such as start and end times, and recognition confidence may also be provided for
each word. Given a collection of 1-best sequences, the following index can be constructed
w1 (f1,1, s1,1, e1,1) . . . (f1,n1 , s1,n1 , e1,n1 )
w2 (f1,1, s1,1, e1,1) . . . (f1,n1 , s1,n1 , e1,n1 )

wN (fN,1, sN,1, eN,1) . . . (fN,nN , sN,nN , eN,nN )
(1)
where wi is a word, ni is the number of times word wi occurs, fi,j is a file where word wi occurs for the j-th time, si,j and ei,j
is the start and end time. Searching such index for single word keywords can be as simple as finding the correct row (e.g. k)
and returning all possible tuples (fk,1, sk,1, ek,1), . . ., (fk,nk , sk,nk , ek,nk ).
The search module is expected to retrieve all possible keyword occurrences. If ASR makes no mistakes such module
can be created rather trivially. To account for possible retrieval errors, the search module provides each potential occurrence
with a relevance score. Relevance scores reflect confidence in a given occurrence being relevant. Occurrences with extremely
low relevance scores may be eliminated. If these scores are accurate each eliminated occurrence will decrease the number of
false alarms. If not then the number of misses will increase. What exactly an extremely low score is may not be very easy
to determine. Multiple factors may affect a relevance score: confidence score, duration, word confusability, word context,
keyword length. Therefore, simple relevance scores, such as those based on confidence scores, may have a wide dynamic range
and may be incomparable across different keywords. In order to ensure that relevance scores are comparable among different
keywords they need to be calibrated. A simple calibration scheme is called sum-to-one (STO) normalisation
(2)
where ri,j is an original relevance score for the j-th occurrence of the i-th keyword, γ is a scale enabling to either sharpen or
flatten the distribution of relevance scores. More complex schemes have also been examined. Given a set of occurrences with
associated relevance scores, there are several options available for eliminating spurious occurrences. One popular approach
is thresholding. Given a global or keyword specific threshold any occurrence falling under is eliminated. Simple calibration
schemes such as STO require thresholds to be estimated on a development set and adjusted to different collection sizes. More
complex approaches such as Keyword Specific Thresholding (KST) yield a fixed threshold across different keywords and
collection sizes.
Accuracy of KWS systems can be assessed in multiple ways. Standard approaches include precision (proportion of relevant retrieved occurrences among all retrieved occurrences) and recall (proportion of relevant retrieved occurrences among all
relevant occurrences), mean average precision and term weighted value. A collection of precision and recall values computed
for different thresholds yields a precision-recall (PR) curve. The area under PR curve (AUC) provides a threshold independent summative statistics for comparing different retrieval approaches. The mean average precision (mAP) is another popular,
threshold-independent, precision based metric. Consider a KWS system returning 3 correct and 4 incorrect occurrences arranged according to relevance score as follows: ✓ , ✗ , ✗ , ✓ , ✓ , ✗ , ✗ , where ✓ stands for correct occurrence and ✗ stands
for incorrect occurrence. The average precision at each rank (from 1 to 7) is 1

7 . If the number of true correct
occurrences is 3, the mean average precision for this keyword 0.7. A collection-level mAP can be computed by averaging
keyword specific mAPs. Once a KWS system operates at a reasonable AUC or mAP level it is possible to use term weighted
value (TWV) to assess accuracy of thresholding. The TWV is defined by
 
(3)
where k ∈ K is a keyword, Pmiss and Pfa are probabilities of miss and false alarm, β is a penalty assigned to false alarms.
These probabilities can be computed by
Pmiss(k, θ) = Nmiss(k, θ)
Ncorrect(k) (4)
Pfa(k, θ) = Nfa(k, θ)
Ntrial(k) (5)
where N<event> is a number of events. The number of trials is given by
Ntrial(k) = T − Ncorrect(k) (6)
where T is the duration of speech in seconds.
2 Objective
Given a collection of 1-bests, write a code that retrieves all possible occurrences of keyword list provided. Describe the search
process including index format, handling of multi-word keywords, criterion for matching, relevance score calibration and
threshold setting methodology. Write a code to assess retrieval performance using reference transcriptions according to AUC,
mAP and TWV criteria using β = 20. Comment on the difference between these criteria including the impact of parameter β.
Start and end times of hypothesised occurrences must be within 0.5 seconds of true occurrences to be considered for matching.
2
3 Marking scheme
Two critical elements are assessed: retrieval (65%) and assessment (35%). Note: Even if you cannot complete this task as a
whole you can certainly provide a description of what you were planning to accomplish.
1. Retrieval
1.1 Index Write a code that can take provided CTM files (and any other file you deem relevant) and create indices in
your own format. For example, if Python language is used then the execution of your code may look like
python index.py dev.ctm dev.index
where dev.ctm is an CTM file and dev.index is an index.
Marks are distributed based on handling of multi-word keywords
• Efficient handling of single-word keywords
• No ability to handle multi-word keywords
• Inefficient ability to handle multi-word keywords
• Or efficient ability to handle multi-word keywords
1.2 Search Write a code that can take the provided keyword file and index file (and any other file you deem relevant)
and produce a list of occurrences for each provided keyword. For example, if Python language is used then the
execution of your code may look like
python search.py dev.index keywords dev.occ
where dev.index is an index, keywords is a list of keywords, dev.occ is a list of occurrences for each
keyword.
Marks are distributed based on handling of multi-word keywords
• Efficient handling of single-word keywords
• No ability to handle multi-word keywords
• Inefficient ability to handle multi-word keywords
• Or efficient ability to handle multi-word keywords
1.3 Description Provide a technical description of the following elements
• Index file format
• Handling multi-word keywords
• Criterion for matching keywords to possible occurrences
• Search process
• Score calibration
• Threshold setting
2. Assessment Write a code that can take the provided keyword file, the list of found keyword occurrences and the corresponding reference transcript file in STM format and compute the metrics described in the Background section. For
instance, if Python language is used then the execution of your code may look like
python <metric>.py keywords dev.occ dev.stm
where <metric> is one of precision-recall, mAP and TWV, keywords is the provided keyword file, dev.occ is the
list of found keyword occurrences and dev.stm is the reference transcript file.
Hint: In order to simplify assessment consider converting reference transcript from STM file format to CTM file format.
Using indexing and search code above obtain a list of true occurrences. The list of found keyword occurrences then can
be assessed more easily by comparing it with the list of true occurrences rather than the reference transcript file in STM
file format.
2.1 Implementation
• AUC Integrate an existing implementation of AUC computation into your code. For example, for Python
language such implementation is available in sklearn package.
• mAP Write your own implementation or integrate any freely available.
3
• TWV Write your own implementation or integrate any freely available.
2.2 Description
• AUC Plot precision-recall curve. Report AUC value . Discuss performance in the high precision and low
recall area. Discuss performance in the high recall and low precision area. Suggest which keyword search
applications might be interested in a good performance specifically in those two areas (either high precision
and low recall, or high recall and low precision).
• mAP Report mAP value. Report mAP value for each keyword length (1-word, 2-words, etc.). Compare and
discuss differences in mAP values.
• TWV Report TWV value. Report TWV value for each keyword length (1-word, 2-word, etc.). Compare and
discuss differences in TWV values. Plot TWV values for a range of threshold values. Report maximum TWV
value or MTWV. Report actual TWV value or ATWV obtained with a method used for threshold selection.
• Comparison Describe the use of AUC, mAP and TWV in the development of your KWS approach. Compare
these metrics and discuss their advantages and disadvantages.
4 Hand-in procedure
All outcomes, however complete, are to be submitted jointly in a form of a package file (zip/tar/gzip) that includes
directories for each task which contain the associated required files. Submission will be performed via MOLE.
5 Resources
Three resources are provided for this task:
• 1-best transcripts in NIST CTM file format (dev.ctm,eval.ctm). The CTM file format consists of multiple records
of the following form
<F> <H> <T> <D> <W> <C>
where <F> is an audio file name, <H> is a channel, <T> is a start time in seconds, <D> is a duration in seconds, <W> is a
word, <C> is a confidence score. Each record corresponds to one recognised word. Any blank lines or lines starting with
;; are ignored. An excerpt from a CTM file is shown below
7654 A 11.34 0.2 YES 0.5
7654 A 12.00 0.34 YOU 0.7
7654 A 13.30 0.5 CAN 0.1
• Reference transcript in NIST STM file format (dev.stm, eval.stm). The STM file format consists of multiple records
of the following form
<F> <H> <S> <T> <E> <L> <W>...<W>
where <S> is a speaker, <E> is an end time, <L> topic, <W>...<W> is a word sequence. Each record corresponds to
one manually transcribed segment of audio file. An excerpt from a STM file is shown below
2345 A 2345-a 0.10 2.03 <soap> uh huh yes i thought
2345 A 2345-b 2.10 3.04 <soap> dog walking is a very
2345 A 2345-a 3.50 4.59 <soap> yes but it’s worth it
Note that exact start and end times for each word are not available. Use uniform segmentation as an approximation. The
duration of speech in dev.stm and eval.stm is estimated to be 57474.2 and 25694.3 seconds.
• Keyword list keywords. Each keyword contains one or more words as shown below
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp










 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:ACS341代做、代寫MATLAB設計程序
  • 下一篇:COMP 315代做、代寫Java/c++編程語言
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    爱情鸟第一论坛com高清免费_91免费精品国自产拍在线可以看_亚洲一区精品中文字幕_男人操心女人的视频
    <strike id="bfrlb"></strike><form id="bfrlb"><form id="bfrlb"><nobr id="bfrlb"></nobr></form></form>

        <sub id="bfrlb"><listing id="bfrlb"><menuitem id="bfrlb"></menuitem></listing></sub>

          <form id="bfrlb"></form>

            <form id="bfrlb"></form>

              <address id="bfrlb"></address>

              <address id="bfrlb"></address>
              国产精品久久婷婷六月丁香| 欧美激情麻豆| 亚洲一区在线观看视频| 亚洲欧美日韩电影| 国产精品免费网站在线观看| 亚洲乱码久久| 欧美日韩和欧美的一区二区| 新片速递亚洲合集欧美合集| 欧美日韩国产欧美日美国产精品| 亚洲欧美资源在线| 夜夜嗨av一区二区三区四区| 在线欧美日韩| 亚洲大片精品永久免费| 久久综合久久综合久久| 欧美日韩1区2区3区| 国产精品99久久久久久白浆小说| 免费成人高清在线视频| 国产精品一区二区三区免费观看| 韩国成人理伦片免费播放| 久久裸体视频| 午夜精品久久久久久99热软件| 国产精品久久久久久久久久久久久久| 国产精品国产三级国产aⅴ无密码| 亚洲制服丝袜在线| 在线播放亚洲一区| 久久aⅴ国产紧身牛仔裤| 最新69国产成人精品视频免费| 欧美日韩mv| 欧美一区激情| 亚洲一区在线免费观看| 国产乱码精品一区二区三| 国产精品毛片高清在线完整版| 日韩西西人体444www| 亚洲午夜视频| 欧美深夜影院| 欧美一区二区精美| 国产精品久久久久久久久动漫| 在线看日韩av| 亚洲一区不卡| 韩国精品在线观看| 在线亚洲精品福利网址导航| 亚洲人成亚洲人成在线观看图片| 免费不卡亚洲欧美| 久久久久久久久蜜桃| 欧美女人交a| 亚洲人被黑人高潮完整版| 欧美3dxxxxhd| 久久夜色精品国产亚洲aⅴ| 国产精品少妇自拍| 亚洲精品之草原avav久久| 久久久久网址| 久久久久国产精品人| 亚洲午夜国产一区99re久久| 嫩模写真一区二区三区三州| 国产一区二区三区的电影| 最新69国产成人精品视频免费| 国产精品久久久久久av福利软件| 国产一区久久久| 国产深夜精品福利| 伊人久久综合| 欧美日韩精品一区二区天天拍小说| 国产精品久久久久久久久搜平片| 亚洲视频日本| 亚洲午夜免费福利视频| 久久婷婷丁香| 国产欧美日韩亚洲一区二区三区| 国产精品成人一区二区三区夜夜夜| 欧美在线视频日韩| 欧美激情国产高清| 久久精品国产清高在天天线| 久久精品99国产精品| 国产视频在线观看一区| 亚洲色图综合久久| 美女免费视频一区| 亚洲七七久久综合桃花剧情介绍| 欧美日韩精品一区二区在线播放| 久久精品噜噜噜成人av农村| 亚洲人成在线播放| 久久国产加勒比精品无码| 国产精品成人国产乱一区| 免费久久99精品国产自在现线| 国产日韩精品久久久| 亚洲精品一区二区三区樱花| 欧美成人一区二区三区在线观看| 极品裸体白嫩激情啪啪国产精品| 久久精品一区二区三区四区| 嫩草国产精品入口| 欧美xart系列高清| 亚洲精品四区| 亚洲精品久久久久久久久久久久久| 亚洲黄色免费网站| 一区二区毛片| 一本色道久久88亚洲综合88| 午夜精品福利一区二区三区av| 亚洲第一页中文字幕| 日韩视频在线观看| 欧美少妇一区二区| 久久综合给合久久狠狠狠97色69| 国产自产精品| 精品99一区二区| 国产日韩欧美成人| 欧美日韩亚洲一区三区| 午夜精品www| 国语自产精品视频在线看一大j8| 在线视频精品| 欧美日韩精品中文字幕| 亚洲乱亚洲高清| 久久中文在线| 久久国产视频网| 欧美一区视频在线| 99re66热这里只有精品4| 亚洲国产一区二区三区青草影视| 一区二区三区在线视频播放| 欧美日韩成人综合天天影院| 国产性色一区二区| 国产精品乱码久久久久久| 国产精品久久久对白| 国产美女精品一区二区三区| 久久疯狂做爰流白浆xx| 亚洲欧美电影在线观看| 欧美日韩精品一区二区天天拍小说| 国产视频不卡| 国产精品国产三级欧美二区| 免费久久久一本精品久久区| 免费久久99精品国产自| 亚洲国产精品尤物yw在线观看| 欧美日韩一区二区三区视频| 久久久不卡网国产精品一区| 亚洲国产成人在线视频| 欧美99在线视频观看| 黄色成人在线网站| 国产色爱av资源综合区| 亚洲欧美在线磁力| 亚洲国产欧美精品| 亚洲第一精品电影| 久久先锋资源| 国产精品亚洲综合一区在线观看| 欧美精品系列| 欧美影视一区| 久久免费偷拍视频| 亚洲黄色免费网站| 亚洲高清免费视频| 欧美α欧美αv大片| 欧美理论大片| 亚洲二区在线观看| 亚洲淫性视频| 欧美午夜剧场| 亚洲精一区二区三区| 欧美日韩四区| 一区二区在线观看av| 亚洲一区二区三区777| 国产亚洲欧美激情| 欧美成人精品在线视频| 欧美午夜国产| 在线观看亚洲一区| 久久午夜精品一区二区| 国产日韩欧美中文在线播放| 欧美二区在线播放| 欧美视频一区二区| 国产欧美日韩免费| 欧美理论在线| 国产精品日韩一区二区三区| 亚洲伊人伊色伊影伊综合网| 欧美在线观看一区二区三区| 久久蜜桃资源一区二区老牛| 久久国产精品久久w女人spa| 国产精品欧美在线| 一区二区三区中文在线观看| 在线看片第一页欧美| 国产精品亚洲欧美| 亚洲国产激情| 亚洲国内高清视频| 国产一区美女| 亚洲人屁股眼子交8| 黄色工厂这里只有精品| 亚洲精品欧美一区二区三区| 久久一区二区三区av| 欧美日韩美女在线观看| 欧美视频久久| 激情综合色综合久久综合| 欧美激情偷拍| 9人人澡人人爽人人精品| 欧美日韩国产限制| 国产精品99久久久久久人| 亚洲欧洲中文日韩久久av乱码| 亚洲免费中文| 久久亚洲春色中文字幕久久久| 欧美日韩国产一中文字不卡| 一区二区三区免费看| 亚洲欧美在线观看| 欧美制服丝袜第一页| 欧美精品久久久久久久久久| 国产日韩欧美综合在线| 先锋影音久久久| 在线亚洲精品福利网址导航| 国产一区二区三区免费观看| 欧美激情一级片一区二区| 欧美系列亚洲系列| 精品动漫3d一区二区三区免费| 国产精品草莓在线免费观看|