|Title||Exploring the correlation of biomedical article keywords to MeSH terms|
|Publication Type||Conference Paper|
|Year of Publication||2006|
The exponential growth in the availability ofbiomedical information has posed the need to solve retrievalissues raised in huge sequence/biomedical article repositories.Biomedical article databases, like PubMed, are hugerepositories of useful biological information given in naturallanguage form and thus not easily processed by computers.Medical Subject Headings (MeSH) terms have been proposed tofacilitate the process of electronically retrieving biomedicalarticles, which are semantically related. However, most of theclassification algorithms, used for information retrieval, requirenumeric representations of either the keywords or the MeSHterms of the articles. These representations are essentiallyvectors of variables forming large multivariate numericaldatasets. In order to combine the information from keyworddatasets and MeSH datasets, this paper proposes a multivariatestatistical approach which can quantify their relationships andreveal the underlying correlation. The basis of this approach isa mathematical technique, called non-linear canonicalcorrelation analysis (NLCCA). NLCCA can assembleinformation from several datasets by building a modeldescribing the whole of the data. The method was applied to alarge number of articles from PubMed. Certain statisticsobtained from the analysis showed that the degree ofcorrelation between MeSH terms and keywords is high. Themethod results in the reduction of data dimensionality,containing in one dataset with new variables significantinformation of the original data. These results are veryimportant for the efficient description and visualization of thedata in order to explore their structure.
Exploring the correlation of biomedical article keywords to MeSH terms