学生论文
|
论文查询结果 |
返回搜索 |
|
|
|
| 论文编号: | 6508 | |
| 作者编号: | 1120100806 | |
| 上传时间: | 2014/6/12 21:48:50 | |
| 中文题目: | 一种基于齐普夫定律的识别语料中高低词频分界点的新方法及其应用 | |
| 英文题目: | A New Method to Identify the Boundary Between High- Frequency and Low-Frequency Words in Corpus Based on Zipf’s Law and Application | |
| 指导老师: | 徐建华 | |
| 中文关键字: | 齐普夫定律;高频词;低频词;分界点;环境污染修复 | |
| 英文关键字: | Zipf’s law; high-frequency words; low-frequency words; dividing point; polluted environment remediation | |
| 中文摘要: | 确定分析语料中的高频词是进行科学计量研究的基础性工作,高频词的数量直接影响研究结果。虽然目前科学计量学在各个学科得到了广泛的应用,成果众多,但是对于确定语料中分析所用的高频词方面还缺乏客观科学的方法。大部分研究者用的是主观判断法,还有学者用h指数法或者g指数法。这些方法都缺乏理论依据。Pao和Sun根据齐普夫第二定律推导出了一种识别语料中高低词频的方法,这两种方法虽然有齐普夫定律作为理论依据,但是在应用过程中所依据的“同频词理论”却是作者未经观察和验证的主观判断,缺乏依据,值得怀疑。另外上述几种方法在实际应用的过程中效果都不理想,所得到的高频词要么太多要么太少,年际之间差距太大,不稳定,难以应用。齐普夫定律是科学计量学中的一条基本定律,尽管现有研究者针对齐普夫定律开展诸多方面的研究,但齐普夫定律中的一个关键问题---常数C却并没有形成清晰一致的认识。本研究选择科学计量学和环境污染修复学这两个作者都比较熟悉但是学科性质差异很大的两个学科十年间发表的论文为分析语料。通过分析科学计量学语料各年份的词频分布和应用齐普夫定律确定C值,发现了C值的一些变化规律,并由此提出一种新的识别语料中高低词频分界点的新方法。通过科学计量学和环境污染修复学两个学科的语料的验证,证明新方法相比于已有的方法,具有明显的科学性和普适性,值得推广应用。本研究进而在这两个学科中应用了新方法,得到了这两个学科十年间的发展脉络,进一步验证了新方法的先进性和适用性。本研究的主要结论为:一、在具体的一篇语料中,C值是一随语料规模变化而变化的参数。这一发展验证了齐普夫关于C值是一个参数的判断,但是也否定了齐普夫关于0< C <0.1的推断。通过验证表明参数C值受语料中词汇量和词频分布的影响,呈现波动上升的趋势,其取值范围没有明显的规律。二、本研究所确立的新的识别语料中高低词频分界点的新方法相对于其他方法在识别高频词方面有数量和稳定性两方面的明显优势,不受语料规模及语料性质的控制。通过两个不同的学科构成的不同性质的语料的检验发现该方法具有普适性,不但适用于由题目和关键词等信息组成的连续文本中高频词的识别,也适用于由关键词构成的不连续的文本中的高频词识别,而“Pao法”和“Sun法”只适用于连续文本中高频词的识别。总之,相对于已有的方法,本研究所创建的方法在识别语料中高低词频分界点方面的优势较明显,值得推广。三、将新方法应用于科学计量学,发现科学计量研究领域已形成一系列成熟、稳定的研究议题。包括引文分析、期刊评价、科研产出评价、学科评价等等。计量指标伴随着评价而生,并成为科学计量学的研究热点,早期的研究热点是影响因子,后期的热点是h指数、g指数等新指标,使这一领域的研究正在走向深化。四、将新方法应用于环境污染修复学,发现在环境污染修复的研究中,土壤是主要的研究介质,重金属和多环芳烃是重要的研究污染物,植物修复、生物修复、电动修复是主要的修复技术。随着经济和检测技术的发展,新型的污染物不断产生和被发现,或者已有的污染物的潜在危害性逐渐被人们所重视,驱使研究者不断改进修复治理技术来应对新的污染物。同时,污染修复技术也在持续发展,表现为多种修复技术的结合以及修复材料的不断创新。新污染物和新修复方法的不断涌现使得环境污染修复学的研究既有重要的实践意义,又历久弥新。本研究还根据研究结果对我国的环境污染修复研究提出了建议。本研究的主要创新点体现在:一、本研究重新定义了语料中高频词的群体特征,为高频词确定方法提供了一种新思路,拓展了齐普夫定律的应用范围,可为相关研究提供借鉴。二、经过验证,相对于已有的方法,本研究所提出的识别语料中高低词频分界点的新方法具有明显的优越性,可为广大科学计量研究者所借鉴。科学计量学需要一种统一的科学的客观的方法来规范研究数据,本研究所创造的方法则较好的满足了这种需要。如果该方法能够被广大研究者所接收并推广,那么会对科学计量学产生较大的影响,无疑会规范文献计量学的应用,并推动科学计量学在更广的范围内应用。三、本研究从科学计量的角度对环境污染修复学的研究成果进行了系统分析,研究结果对我国即将开展的大规模环境修复项目的实施提供了有价值的参考。 | |
| 英文摘要: | Scientometrics method is wildly used in various subjects, with plenty of related papers publishing. Confirming some high-frequency words is the basic work to identify the research focuses using scientometrics method, and the number of high-frequency words directly influences the research results, but how to objectively and effectively distinguish the dividing point between the high- frequency and the low-frequency words is a question which still puzzled the researchers, or ignored by most researchers. Most researchers judge dividing point between the high- frequency and the low-frequency words by his/her experiences. Some researchers judge dividing point between the high- frequency and the low-frequency words using h-index method or g-index method. Obviously, all above methods are lack of theoretical foundation. Pao and Sun proposed a new method to judge dividing point between the high- frequency and the low-frequency words based on the Zipf’s second law. The author confirm the work of Pao and Sun, but disagree with their theory of “same- frequency word”, which based on the researchers’ imagination and have no theoretical foundation. And what’s more, the number of high-frequency words is too large or too small and not stable among each year by all above methods, showing they are difficult to use. Zipf’s law is considered to be one of the fundamental findings in the scientometrics research field since the middle of the last century. Although a lot of researchers pay attention to the law, however, the meaning of it is far from clear. Even so, the question of what exactly is the constant “C” still puzzled the researchers. This study chose two subjects as the analysis corpus: scientometircs and polluted environment remediation, because the author know well about those two subjects, and what’s more the corpus property of those two subjects is quite different, which can prove the universality of the new method. A total of 934 papers about scientometircs published during the year 2002 to 2011 are retrieved as the corpus to create the new method. The constant in the Zipf’s law is analysed using the corpus. Some regulars about the value of C in Zipf’s law are recognized, and then a new method to identify the boundary between high- frequency and low-frequency words in corpus based on Zipf’s first law is proposed. The new method was proved to be of advantage and universality through the examination by corpus from both scientometrics and polluted environment remediation, which is worth to be applied. So this study apply the new method to the scientometrics and polluted environment remediation, and get the development course of those two subjects during past ten years, which prove the advantage and applicability of the new method further. The main conclusions about this study are: 1.The value of C in Zipf’s law is a parameter instead of a constant. The parameter C is fluctuating followed by the scale of the corpus. This conclusion accords with the opinion of Zipf that C was a parameter, but disagreed with the opinion of Zipf that C had a range of 0< C <0.1. 2.Compared with the other methods, the new method had the advantage both on quantity and stability in confirming the number of high-frequency words, and didn’t affect by the scale and the character of the corpus. The new method also showed it’s universality by examination by both the corpus of scientometrics which consist of title and abstract, and the corpus of polluted environment remediation, which consist of only keywords. The method composed by Pao and Sun could only apply to the corpus consist of title and abstract. All in all, compared with the other methods, the new method showed obvious advantage in the judge dividing point between the high- frequency and the low-frequency words in corpus, be worthy of promotion. 3. Applying this new method to the scientometrics corpus, we found that after ten years of development, the scientometrics had formed some basic research issues, for example, the impact factors, citation analysis, research performance and so on, and the scientometrics was still in developing. Some new research issues, for example the co-citation analysis, the h-index and so on were leading the scientometrics to go deeper. 4. Applying this new method to the polluted environment remediation corpus, we found that the soil was the main medium, the heavy metals and the PAHs was the main contaminant studied by the researchers. The phytoremediation, electrokinetic remediation and the bioremediation were the main polluted environment remediation method. Various contaminants were detected or attached the attention by researchers with the development of economy and detect technology, so the remediation methods were persistently improved to deal with the pollution. The research of polluted environment remediation showed a tendency of different remediation methods combination and the innovation of the remediation agents. New contaminants and new remediation methods emergeing made the research of polluted environment remediation booming. Some countermeasures were put forward in the end based on the research results. The main innovative points of this study reflected in: 1. This study redefined the characteristics of the group of the high-frequency words and provided a new idea for confirming high-frequency words. This study also expanded the application of Zipf’s law, provided some reference for the related research. 2. Compared with the other methods, the new method showed obvious advantage in the judge dividing point between the high- frequency and the low-frequency words in corpus. Nowadays, the scientometrics need a scientific and advanced method to make the data standard. The new method of this study just met this need. If the new method was accepted by the most researchers, it will affect the scientometrics greatly, and can promote the application of scientometrics in larger scale. 3. The countermeasures based on the research results of the polluted environment remediation provided some valuable reference to the related researchers. | |
| 查看全文: | 预览 下载(下载需要进行登录) |