MEDLINE
Structure
MEDLINE® (Medical Literature Analysis and Retrieval System Online) is a database of predominantly biomedical bibliographic citations maintained by the U.S. National Library of Medicine (NLM).[2] Each citation includes bibliographic data, abstract if available, links to full text of the article and keywords. The keywords are indexed with the NLM's Medical Subject Headings (MeSH®).[3]
The National Library of Medicine is investigated whether indexing MeSH terms can be either fully or semi-automated.[4]
Methods to improve searching MEDLINE
There is much ongoing research into improving MEDLINE search results.
Research methods for comparative studies
In comparing search strategies, there are two experimental methods.
- If a complete test collection of articles is available that is already divided into articles of meeting inclusion criteria and articles that not meeting criteria, then each strategy is compared for its ability to successfully identify the articles meeting criteria (sensitivity) and to successfully exclude (specificity) the articles not meeting criteria. Sensitivity is also called "recall" by some authors.[5]
- If a partial test collection is available that only consists of articles meeting inclusion criteria (for example, article meeting inclusion criteria for ACP Journal Club[6] or articles included in a systematic review of a clinical topic or articles in an annotated bibliography[7]), then the sensitivity is again the proportion of relevent articles identified by the strategy. However, the specifity is not computable. Instead, one of several related measures are calculated. These measures are all based on the positive predictive value (PPV) of the strategy. Analogous to PPV used in diagnostic testing, the PPV directly correlates with the prevalence of relevent articles in the collection and thus is not stable accross prevalences.[8]
- Precision is "the proportion of retrieved articles that meet criteria" and thus is the same as the PPV.[9]
- Hit curve "is the number of important articles among the first n results."[10][11]
- Number Needed to Read (NNR) is "how many papers in a journal have to be read to find one of adequate clinical quality and relevance."[12][13][8][14] Of note, the NNR has been proposed as a metric to help libaries to decide which journals to subscribe to.[12]
Filters (hedges)
MEDLINE filters are an optimal Boolean combination of search terms, both textword and MeSH terms, to search articles of particular types. For example, one filter is for identifying randomized controlled trials. Many MEDLINE filters have been developed by the Hedges team[15] supported by a grant from the National Library of Medicine.[16]
Relevancy ranking
Although MEDLINE is usually searched for exact matches using Boolean terms, relevancy ranking has been studied. In an early comparion, relevency ranking performed well; however, the Boolean version of MEDLINE did not fully use MeSH terms.[17][18]
Citation analysis or PageRank
There are conflicting results over the role of ranking results based on citation counts or PageRank.[14][11][7]
Machine learning
Machine learning methods in which the search engine seeks articles that more resemble the included articles, may be more accurate than Bolean methods.[6]
Methods to access MEDLINE
There are many third party interfaces to search MEDLINE such as OVID[19]. The National Library of Medicine's own search interface is PubMed (http://pubmed.gov).
PubMed
PubMed (http://pubmed.gov) is the National Library of Medicine's own free Internet access to MEDLINE. PubMed has been freely available since its first search was performed by Vice President Al Gore during a press conference in the US Capitol on June 26, 1997.[1] On a typical day, PubMed receives over 2 million queries.[20]
PubMed is hosted by the Entrez Search and Retrieval System of the National Center for Biotechnology Information[21] (NCBI) branch of the NLM[22] The hardware hosting Entrez has been described.[23]
EBMSearch
EBMSearch (http://ebmsearch.org/) uses machine learning to rank articles.[6]
References
- ↑ 1.0 1.1 National Center for Biotechnology Information. NCBI News - August 1997. Retrieved on 2007-11-09. Cite error: Invalid
<ref>
tag; name "aug97" defined multiple times with different content - ↑ National Library of Medicine. MEDLINE Fact Sheet. Retrieved on 2007-11-09.
- ↑ National Library of Medicine. Medical Subject Headings (MESH®) Fact Sheet. Retrieved on 2007-11-09.
- ↑ National Library of Medicine. Indexing Initiative. Retrieved on 2007-11-25.
- ↑ Hersh, William R. (2003). Information retrieval: a health and biomedical perspective. Berlin: Springer. ISBN 0-387-95522-4.
- ↑ 6.0 6.1 6.2 Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF (2005). "Text categorization models for high-quality article retrieval in internal medicine". J Am Med Inform Assoc 12 (2): 207–16. DOI:10.1197/jamia.M1641. PMID 15561789. Research Blogging.
- ↑ 7.0 7.1 Herskovic JR, Bernstam EV (2005). "Using incomplete citation data for MEDLINE results ranking". AMIA Annu Symp Proc: 316–20. PMID 16779053. [e]
- ↑ 8.0 8.1 Bachmann LM, Coray R, Estermann P, Ter Riet G (2002). "Identifying diagnostic studies in MEDLINE: reducing the number needed to read". J Am Med Inform Assoc 9 (6): 653–8. PMID 12386115. [e]
- ↑ Haynes RB, Wilczynski NL (2004). "Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline: analytical survey". BMJ 328 (7447): 1040. DOI:10.1136/bmj.38068.557998.EE. PMID 15073027. Research Blogging.
- ↑ Herskovic JR, Iyengar MS, Bernstam EV (2007). "Using hit curves to compare search algorithm performance". J Biomed Inform 40 (2): 93–9. DOI:10.1016/j.jbi.2005.12.007. PMID 16469545. Research Blogging.
- ↑ 11.0 11.1 Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR (2006). "Using citation data to improve retrieval from MEDLINE". J Am Med Inform Assoc 13 (1): 96–105. DOI:10.1197/jamia.M1909. PMID 16221938. Research Blogging.
- ↑ 12.0 12.1 Toth B, Gray JA, Brice A (2005). "The number needed to read-a new measure of journal value". Health Info Libr J 22 (2): 81–2. DOI:10.1111/j.1471-1842.2005.00568.x. PMID 15910578. Research Blogging.
- ↑ McKibbon KA, Wilczynski NL, Haynes RB (2004). "What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals?". BMC Med 2: 33. DOI:10.1186/1741-7015-2-33. PMID 15350200. Research Blogging.
- ↑ 14.0 14.1 Haase A, Follmann M, Skipka G, Kirchner H (2007). "Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance". BMC Med Res Methodol 7: 28. DOI:10.1186/1471-2288-7-28. PMID 17603909. Research Blogging.
- ↑ Hedges Team. Search Strategies. Retrieved on 2007-11-25.
- ↑ CRISP - Computer Retrieval of Information on Scientific Projects, Abstract Display. Retrieved on 2007-11-25.
- ↑ Hersh WR, Hickam DH (1992). "A comparison of retrieval effectiveness for three methods of indexing medical literature". Am. J. Med. Sci. 303 (5): 292–300. PMID 1580316. [e]
- ↑ Hersh WR, Hickam DH, Haynes RB, McKibbon KA (1994). "A performance and failure analysis of SAPHIRE with a MEDLINE test collection". J Am Med Inform Assoc 1 (1): 51–60. PMID 7719787. [e]
- ↑ Anonymous. MEDLINE® - Ovid's MEDLINE. Retrieved on 2007-11-09.
- ↑ Herskovic JR, Tanaka LY, Hersh W, Bernstam EV (2007). "A day in the life of PubMed: analysis of a typical day's query log". J Am Med Inform Assoc 14 (2): 212–20. DOI:10.1197/jamia.M2191. PMID 17213501. Research Blogging.
- ↑ National Library of Medicine. The National Center for Biotechnology Information Programs and Activities Fact Sheet. Retrieved on 2007-11-10.
- ↑ Ostell, J. The Entrez Search and Retrieval System. Retrieved on 2007-11-10.
- ↑ Canese, K; Jentsch, J; Myers, C. Database Management and Hardware. Retrieved on 2007-11-10.