A Network of Life Science Concepts

PubMed is a large database maintained by the National Center for Biomedical Information (NCBI) including descriptive data and abstracts for some 18 million scientific articles from Journals related to the Life Sciences.

MeSH, or Medical Subject Headings is a controlled vocabulary of medical and biomedical research concepts (distinguish from "terms"). MeSH is comprised of 16 concept "trees" (along with some uncategorized concepts), which organize concepts into hierarchies (more properly, Directed Acyclic Graphs).

Approximately 17 million (16,981,052) of the 18 million Pubmed articles included in the PubMed component of the Bio2RDF repository are annotated, utilizing approximately 25K (24,574) different MeSH concepts.

If we consider 2 MeSH concepts to be "connected" when both are used to annotate the same article, then we can build a Network of Life Science Concepts (NLSC), and variety of interfaces may be built to navigate, or explore, such a network, or to use such a network to launch searches into PubMed.

There are 2 intefaces:

Note that this list is in decreasing order by "connection strength", which is intended to identify the concepts most "specifically" connected by co-occurence in PubMed papers. A connected concept related to the target concepts among many other concepts is given a smaller weight than a concept related to only a few other concepts.

Options exist to rank listed concepts in other orders, to restrict listed concepts to specific MeSH Tree categories, and to limit the number of connected concepts that are listed. Links are provided to help user move easily from the first (2-D) interface to the second (3-D).

Example result page from the first, 2-D, interface

Here is an example page from the 2-D interface. Note that this is not intended to be a "live" example; some links will not work.

To initiate a PubMed search, one can choose either a "pubmed2D" or "majors2D" link from this page (when it is live). To explore concepts linked to the target concept, "p53", one may select any "explore2D" link next to another concept, and the result will be a list of concepts linked to THAT concept.

A Network of Life Science Concepts

Here listing concepts related by PubMed paper co-occurrence to the target concept
"Tumor_Suppressor_Protein_p53"

(Showing 20 related concepts in decreasing_weight order)

Tumor_Suppressor_Protein_p53 (25435)
(Click for NCBI MeSH definition)
Related concepts and their tree category(s)

    (Click a concept name to see the NCBI definition of that MeSH concept,
                "pubmed2D" to search PubMed for papers annotated with both terms,
                "majors2D" to restrict the search to topics considered to be each paper's major focus,
                "explore2D" to see the next level of related concepts, and
                "explore3D" to find concepts related to BOTH the target and row concepts.
)
Article Counts:
Co_occurrence
  (row total)
Connection strength:
Normalized (raw)
DNA_Damage (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C]
Phenomena and Processes [G] 
2372 (455082) 1.00000 (4.936e-03)
Li-Fraumeni_Syndrome (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 110 (5148) 0.72863 (3.597e-03)
Carcinoma,_Ductal,_Breast (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 295 (103978) 0.46178 (2.280e-03)
Papillomavirus_Infections (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 283 (108421) 0.42829 (2.114e-03)
Genomic_Instability (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C]
Phenomena and Processes [G] 
92 (19913) 0.41098 (2.029e-03)
Ataxia_Telangiectasia (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 123 (35295) 0.41029 (2.025e-03)
Carcinoma,_Endometrioid (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 88 (18229) 0.40828 (2.015e-03)
Cell_Transformation,_Neoplastic (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 1144 (554411) 0.39968 (1.973e-03)
Precancerous_Conditions (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 500 (241208) 0.37987 (1.875e-03)
Colorectal_Neoplasms (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 792 (422388) 0.35827 (1.769e-03)
Carcinoma,_Non-Small-Cell_Lung (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 483 (248074) 0.35774 (1.766e-03)
Endometrial_Neoplasms (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 253 (119538) 0.35353 (1.745e-03)
Chromosomal_Instability (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C]
Phenomena and Processes [G] 
65 (11897) 0.35272 (1.741e-03)
Carcinoma,_Transitional_Cell (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 330 (164286) 0.35236 (1.739e-03)
Cystadenocarcinoma,_Serous (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 75 (18997) 0.34195 (1.688e-03)
Glioblastoma (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 262 (134904) 0.33102 (1.634e-03)
Barrett_Esophagus (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 119 (49630) 0.32115 (1.585e-03)
Aneuploidy (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C]
Phenomena and Processes [G] 
215 (114582) 0.31107 (1.536e-03)
Carcinoma,_Squamous_Cell (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 1558 (1010745) 0.30460 (1.504e-03)
Cell_Transformation,_Viral (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C]
Phenomena and Processes [G] 
295 (180013) 0.29088 (1.436e-03)
Carcinoma_in_Situ (pubmed2D) (majors2D) (explore2D) (explore3D)Diseases [C] 230 (136213) 0.28824 (1.423e-03)
(Table based on 24574 MeSH concepts used to annotate 16,981,052 PubMed papers.)

 

Example 3-D result page from a 10% sample of PubMed

To see a list of concepts related to BOTH the 2-D target concept as well as one of that target's row concepts, one may select the "explore3D" link next to the appropriate row concept. An example result is provided below.

There are 3 "Article Counts" provided for each row concept. The "co-occurrence count" is the number of articles from the PubMed sample annotated with both target concepts as well as the associated row concept. A "pair-wise total" is the sum of articles co-occurring with a row concept and one of the target concepts.

At the present time it is not possible to continue traversing the Network from this page, so no "explore" links appear.

 

A Network of Life Science Concepts (10% sample)

Here showing concepts related to BOTH of the "target" concepts:
"Tumor_Suppressor_Protein_p53" and "DNA_Damage"

(Including related concepts from MeSH Tree Categories:  C
in decreasing_weight order above a minimum normalized connection strength of 0.01)

"Tumor_Suppressor_Protein_p53" and "DNA_Damage" (4182)
(Click for NCBI MeSH definition)
Related concepts and their tree category(s)

    (Click a concept name to see the NCBI definition of the concept,
                "pubmed3D" to search PubMed for papers annotated with all three terms, and
                "majors3D" to restrict that search to topics considered to be each paper's major focus.
)
Article Counts:
Co_occurrence
  (pair-wise totals:
this row with
each of A and B)
Connection strength:
Normalized (raw)
Gingivitis (pubmed3D) (majors3D) Diseases [C] 1 (9/9) 1.00000 (1.000e-01)
Congenital_Abnormalities (pubmed3D) (majors3D) Diseases [C] 1 (21/8) 0.64516 (6.452e-02)
Retroviridae_Infections (pubmed3D) (majors3D) Diseases [C] 1 (16/16) 0.58824 (5.882e-02)
Bacterial_Infections (pubmed3D) (majors3D) Diseases [C] 1 (9/30) 0.48780 (4.878e-02)
Teratocarcinoma (pubmed3D) (majors3D) Diseases [C] 1 (30/17) 0.40816 (4.082e-02)
Anemia (pubmed3D) (majors3D) Diseases [C] 1 (40/16) 0.34483 (3.448e-02)
Burkitt_Lymphoma (pubmed3D) (majors3D) Diseases [C] 2 (88/25) 0.34188 (3.419e-02)
Remission,_Spontaneous (pubmed3D) (majors3D) Diseases [C]
Phenomena and Processes [G] 
1 (29/29) 0.33333 (3.333e-02)
Progeria (pubmed3D) (majors3D) Diseases [C] 1 (20/46) 0.29412 (2.941e-02)
Leukemia,_Experimental (pubmed3D) (majors3D) Diseases [C]
Techniques and Equipment [E] 
1 (16/51) 0.28986 (2.899e-02)
Hypoxia,_Brain (pubmed3D) (majors3D) Diseases [C] 1 (25/45) 0.27778 (2.778e-02)
Liver_Cirrhosis (pubmed3D) (majors3D) Diseases [C] 1 (56/15) 0.27397 (2.740e-02)
Stroke (pubmed3D) (majors3D) Diseases [C] 1 (39/33) 0.27027 (2.703e-02)
Leukemia,_Myelogenous,_Chronic,_BCR-ABL_Positive (pubmed3D) (majors3D) Diseases [C] 2 (83/67) 0.25974 (2.597e-02)
Arteriosclerosis (pubmed3D) (majors3D) Diseases [C] 1 (47/28) 0.25974 (2.597e-02)
Brain_Ischemia (pubmed3D) (majors3D) Diseases [C] 1 (25/51) 0.25641 (2.564e-02)
Testicular_Neoplasms (pubmed3D) (majors3D) Diseases [C] 2 (103/51) 0.25316 (2.532e-02)
Gastrointestinal_Neoplasms (pubmed3D) (majors3D) Diseases [C] 1 (75/6) 0.24096 (2.410e-02)
Arthritis,_Rheumatoid (pubmed3D) (majors3D) Diseases [C] 1 (47/38) 0.22989 (2.299e-02)
Neurodegenerative_Diseases (pubmed3D) (majors3D) Diseases [C] 1 (47/48) 0.20619 (2.062e-02)
Infarction,_Middle_Cerebral_Artery (pubmed3D) (majors3D) Diseases [C] 1 (78/25) 0.19048 (1.905e-02)
Neoplasms (pubmed3D) (majors3D) Diseases [C] 35 (1892/1717) 0.19027 (1.903e-02)
(Table based on 23857 MeSH concepts used to annotate 16,981,052 PubMed papers.)

The Pervasive Technology Institute at Indiana University provided computing facilities for constructing interfaces for:

(A non-functioning mock-up of the start page for the 2-D search is here); the 3-D start page is similar, but includes form fields to specify a second MeSH concept to use in the search.)