Search My Searches My Account Help Tour Take the Survey
 
 
Introduction
CRC Press
CHEMnetBASE

Login
Password
Introduction

Introduction to the Dictionary of Natural Products Online

This introduction and additional information is available as a PDF file

Introduction to the DNP database

The Chapman & Hall/CRC Chemical Database is a structured database holding information on chemical substances. It includes descriptive and numerical data on chemical, physical and biological properties of compounds; systematic and common names of compounds; literature references; structure diagrams and their associated connection tables. The Dictionary of Natural Products Online is a subset of this database and includes all compounds contained in the Dictionary of Natural Products (Main Work and Supplements).

The Dictionary of Natural Products (DNP) is the only comprehensive and fully-edited database on natural products. It arose as a daughter product of the well-known Dictionary of Organic Compounds (DOC) which, since its inception in the 1930s has, through successive editions, always been a leading source of natural product information.

In the early 1980s, following the publication of the Fifth Edition of DOC, the first to be founded on database methods, the Editors and contributors for the various classes of natural products embarked on a programme of enlargement, rationalisation and classification of the natural product entries, while at the same time keeping the coverage up-to-date. In 1992 the results of this major project, which had grown to match DOC in size, were separately published in both book (7 volumes) and CD-ROM format, leaving DOC with coverage of only the most widely distributed and/or practically important natural products. DNP compilation has since continued unabated by a combination of an exhaustive survey of current literature and of historical sources such as reviews to pick up minor natural products and items of data previously overlooked.

The compilation of DNP is undertaken by a team of academics and freelancers who work closely with the in-house editorial staff at Chapman & Hall. Each contributor specialises in a particular natural product class (e.g. alkaloids) and is able to reorganise and classify the data in the light of new research so as to present it in the most consistent and logical manner possible. Thus the compilation team is able to reconcile errors and inconsistencies.

The resulting on-line version represents an extremely well organised dictionary documenting virtually every known natural product.

A valuable feature of the design is that closely related natural products (e.g. where one is a glycoside or simple ester of another) are organised into the same entry, thus simplifying and bringing out the underlying structural and biosynthetic relationships of the compounds. Structure diagrams are drawn and numbered in the most consistent way according to best stereochemical and biogenetic relationships. In addition, every natural product is indexed by structural/biogenetic type under one of more than 1000 headings, allowing the rapid location of all compounds in the category, even where they have undergone biogenetic modification and no longer share exactly the same skeleton.

There is extensive (but not complete) coverage of natural products of unknown structure, and the coverage of these is currently being enhanced by various retrospective searches.

Data presentation and organisation

Derivatives and variants

In the database, closely related compounds are grouped together to form an entry. Stereoisomers and derivatives of a parent compound are all listed under one entry. The compounds in the Dictionary of Natural Products are grouped together into approximately 40,000 entries. The structure of an entry is shown below.

  Entry (parent compound)
            Derivatives
     Variants (stereoisomers or other closely-related compounds)
            Derivatives of the variant

A simple entry covers one compound, with no derivatives or variants. A composite entry will start with the entry compound, then may have:

  • one or more derivatives at entry level
  • one or more variants of the entry
  • one or more derivatives of the variant.

Variants may include stereoisomers, e.g. (R)-form, endo-form; members of a series of natural products with closely related structures such as antibiotic complexes.

For example, Trienomycins are often treated as variants although their structures may be more varied.

Derivatives may include hydrates, complexes, salts, classical organic derivatives, substitution products and oxidation products etc. Derivatives may exist on more than one functional group of an entry compound. The following techniques are among those used to bring together related substances in the same entry:

  1. Glycosides are given as derivatives of the parent aglycone, except for those glycosides which have an extensive literature in their own right (e.g., Digoxin)
  2. Acyl derivatives are extremely common and are listed under the parent compound, again unless it has as extensive literature of its own
  3. N-Alkyl and O-Alkyl derivatives such as methyl ethers of phenols are similarly given under the parent compound.


Data Types

The format of a typical entry is given in Fig. 1, and shows the individual types of data that may be present in an entry.


Chemical names and synonyms

All the names discussed below can be searched using the Chemical Name field. Compounds have been named so as to facilitate access to their factual data by keeping the nomenclature as simple as possible, whilst still adhering to good practice as determined by IUPAC (the International Union of Pure and Applied Chemistry). A great deal of care has been taken to achieve this aim as nearly as possible. Some intentional departures from IUPAC terminological principles are occasionally made to clarify the nomenclature of natural products. For example, compounds containing both lactone and -COOH groups are often named using two principal functional groups:




Fig. 1. Sample entry from database


  1. There are many instances in the primary literature of compounds being named in ways which are gross violations of good IUPAC practice, e.g., where the substituents are ordered non-alphabetically. These have been corrected.
  2. The number of trivial names used for acylating substituents has been kept to a minimum but the following are used throughout.
  3. Many other trivial appellations have from time to time appeared in the literature for other acyl groups (e.g., Senecioyl = 3-methyl-2-butenoyl, Feruloyl = 3-(4-hydroxy- 3-methoxyphenyl)-2-propenoyl or 4-hydroxy- 3-methoxycinnamoyl) but the systematic forms are usually employed except in a few cases where the shortened form is used to abbreviate a very long and unwieldy derivative descriptor as much as possible (e.g., for some of the complex flavonoid glycosides).

  4. The term prenyl for the common 3-methyl-2-butenyl substituent, (H3C)2C=CHCH2-, is used throughout.
  5. Names which are known to be duplicated within the chemical literature (not necessarily within DNP), are marked with the sign.


CAS Registry Numbers

CAS Registry Numbers are identifying numbers allocated to each distinctly definable chemical substance indexed by the Chemical Abstracts Service since 1965 (plus retrospective allocation of numbers by CAS to compounds from the sixth and seventh collective index periods). The numbers have no chemical significance but they provide a label for each substance independent of any system of nomenclature.

In DNP, much effort has been expended to ensure that accurate CAS numbers are given for as many substances as possible.

If a CAS number is not given for a particular compound, it may be (a) because CAS have not allocated one, (b) very occasionally, because an editorial decision cannot be made as to the correct number to cite, or (c) because the substance was added to the DNP database at a late stage in the compilation process, in which case the number will probably be added to the database soon.

At the foot of the DNP entry, immediately before the references, may be shown additional registry numbers. These are numbers which have been recognised by the DNP editors or contributors as belonging to the entry concerned but which cannot be unequivocally assigned to any of the compounds covered by the entry. Their main use will be in helping those who need to carry out additional searches, especially online searches in the CAS or other databases, and who will be able to obtain additional hits using these numbers. Clearly, discretion is needed in their use for this purpose.

Additional registry numbers may arise for a variety of reasons:

  1. A number may refer to stereoisomers or other variants of the main entry compound or its derivatives, which may or may not be mentioned in the entry but for which no physical properties or other useful information is available. For example, the DNP entry for Carlic acid [56083-49-9] states that it has so far been obtained in solution as a mixture of (E) and (Z)-forms. The additional registry numbers given are those of the (E) and (Z) isomers [67381-73-1] and [67381-74-2].
  2. A CAS number may refer to a mixture, in which case it is added to the DNP entry referring to the most significant component. It may refer to a hydrate, salt, complex, etc. which is not described in detail in the DNP entry.
  3. Replaced numbers, duplicate numbers and other numbers arising from CAS indexing procedure or, occasionally, from errors or inconsistencies by CAS, are also reported. For example, the DNP entry scyllo-Inositol [488-59-5] contains an additional registry number for D-scyllo-Inositol [41546-32-1]. Since scyllo-Inositol is a meso-compound, the number is erroneous. More generally, CAS frequently replace a given number with one that more accurately represents what they now know about a substance, and the replaced number remains on their files and is given in DNP as an additional number.
  4. In the case of compounds with more than one stereogenic centre, additional registry numbers frequently refer to levels of stereochemical description which cannot be assigned to a particular stereoisomer described in the entry.

    For example, the CHCD entry for 2-Amino-3-hydroxy-3-phenylpropanoic acid (ß-Hydroxyphenylalanine, 9CI) has a general CAS number [1078-17-7] and CAS numbers for all four optically active diastereoisomers [7352-06-9, 32946-42-2, 109120-55-0, 6524-48-4] as well as the two possible racemates [2584-74-9] [2584-75-0]. However, among the additional registry numbers quoted are the following:

    [7687-36-7] - number for erythro-ß-Hydroxyphenylalanine
    [50897-27-3] - number for ß-Hydroxy-L-phenylalanine
    [68296-26-4] - number for ß-Hydroxy-D-phenylalanine
    [39687-93-9] - general number for the methyl ester, hydrochloride which cannot be placed under any of the individual stereoisomers of this compound described in the entry.
  5. Numbers may refer to derivatives similar to those described in the DNP entry for which no data is available, or which have not yet been added to the entry.
  6. Some DNP entries refer to families of compounds, such as the entry for Calcitonin where only the porcine and human variants are described in detail. The additional registry numbers given in this entry are those of a number of other species variants which appear to have been identified according to CAS but for which no attempt has been made to collate full data for DNP.


Diagrams

In each entry display there is a single diagram which applies to the parent entry. Separate diagrams are not given for variants or derivatives.

Every attempt has been made to present the structures of chemical substances as accurately as possible according to current best practice and IUPAC recommendations. In drawing the formulae, as much consistency as possible between closely related structures has been aimed at. Thus, for example, sugars have been standardised as Haworth formulae and, wherever possible in complex structures, the rings are oriented in the standard Haworth manner so that structural comparisons can quickly be made. In formulae the pseudoatom abbreviations Me, Et and Ac for methyl, ethyl and acetyl respectively, are used only when attached to a heteroatom. Ph is used throughout whether attached to carbon or to a heteroatom. Other pseudoatom abbreviations such as Pri for isopropyl and Bz for benzoyl are not used in DNP.

Care must be taken with the numbering of natural products, as problems may arise due to differences in systematic and non-systematic schemes. Biogenetic numbering schemes which are generally favoured in DNP may not always be contiguous, e.g., where one or more carbon atoms have been lost during biogenesis.

Structures for derivatives can be viewed in Structure Search, but remember that these structures are generated from connection tables and may not always be oriented consistently.


Stereochemical conventions

Where the absolute configuration of a compound is known or can be inferred from the published literature without undue difficulty, this is indicated. Where only one stereoisomer is referred to in the text, the structural diagram indicates that stereoisomer. Wherever possible, stereostructures are described using the Cahn-Ingold-Prelog sequence-rule (R,S) and (E,Z) conventions but, in cases where these are cumbersome or inapplicable, alternatives such as the α,ß-system are used instead. Alternative designations are frequently presented in such cases.

The structure diagrams for compounds containing one or two chiral centres are given in DNP as Fischer-type diagrams showing the stereochemistry unequivocally. True Fischer diagrams in which the configuration is implied by the North-South-East-West positions of the substituents are widespread in the literature; they are quite unambiguous but need to be used with caution by the inexperienced. They cannot be reoriented without the risk of introducing errors.

Where only the relative configuration of a compound containing more than one chiral centre is known, the symbols (R*) and (S*) are used, the lowestnumbered chiral centre being arbitrarily assigned the symbol (R*). For racemic modifications of compounds containing more than one chiral centre the symbols (RS) and (SR) are used, with the lowest-numbered chiral centre being arbitrarily assigned the symbol (RS). The racemate of a compound containing one chiral centre only is described in DNP as (±)-.

In comparing CAS descriptors with those given in DNP, it is important to remember that the order of presentation of the chirality labels in CAS is itself based on the sequence rule priority and not on any numbering scheme, for example the CAS descriptor for the structure illustrated is [S-(R*,S*)].

The relative stereochemical label (R*,S*) is first applied with the R* applying to the chiral centre of higher priority (C-3). The absolute stereochemical descriptor (S)- is then applied changing R* to S for the chiral centre of higher priority and S* to R for the chiral centre of lower priority (C-2). For further details, see the current CAS Index Guide.

For simplicity, the enantiomers of bridged-ring compounds, such as camphor, are described simply as (+)- and (-)-. Although camphor has two chiral centres, steric restraints mean that only one pair of enantiomers can be prepared.

For further information on the (R,S)-system, see Cahn, R,S et al, J. Chem. Soc., 1951, 612; Experientia, 1956, 12, 81; Angew. Chem. Int. Ed. Engl., 1966, 5, 383.

Where appropriate, alternative stereochemical descriptors may be given using the D, L or α,ß-systems. For a fuller description of these systems, consult The Organic Chemist's Desk Reference (Chapman & Hall, 1995).


Molecular formula and molecular weight

The elements in the molecular formula are given according to the Hill convention (C, H, then other elements in alphabetical order). The molecular weights given are formula weights (or more strictly, molar masses in daltons) and are rounded to one place of decimals. In the case of some high molecular mass substances such as proteins the value quoted may be that taken from an original literature source and may be an aggregate molar mass.

Molecular formulae are included in DNP for all derivatives which are natural products and so are readily searchable, whether they are documented as derivatives or have their own individual entry. Molecular formulae are not in general given for salts, hydrates or complexes (e.g. picrates) nor for most "characterisation" derivatives such as acetates and methyl ethers of complex natural products.

Where a derivative appears to have characterised only as a salt, the properties of the salt may be given under the heading for the derivative. In such cases the data is clearly labelled, e.g., Mp 179° (as hydrochloride).


Source

The taxonomic names for organisms given throughout are in general those given in the primary literature. Standardisation of minor orthographical variations has been carried out. Data in this field may be searched under Source/Synthesis or All Text. Standards used are: Brummitt, R.K. (1992) Vascular Plant Families and Genera, Royal Botanic Gardens, Kew; Willis, J.C. (1973) A Dictionary of the Flowering Plants, Cambridge University Press, Cambridge; Gozmany, L. (1990) Seven Language Thesaurus of European Animals, Chapman & Hall London; Chemical Abstracts Service.


Importance/use

Care has been taken to make the information given on the importance and uses of chemical substances as accurate as possible. Data in this field may be searched under Use/Importance or All Text.


Type of Compound

All natural products are classified under one of more than 1050 headings according to structural type, e.g., daucane sesquiterpenoid, pyrrolizidine alkaloid, withanolide. Each structural type is assigned as a type of compound code, e.g., VG0300, VX0150. Type of compound words and type of compound codes may both be searched in Menu and Command search.

The full type of compound code index is given in Table 3, page 128 of the printed User Manual, and in the Description of Natural Product Structures that follows, each descriptive paragraph is followed by its Type of Compound code(s).


Physical Data

Appearance

Natural products are considered to be colourless unless otherwise stated. Where the compound contains a chromophore which would be expected to lead to a visible colour, but no colour is mentioned in the literature, the DNP entry will mention this fact if it has been noticed by the contributor.

An indication of crystal form and of recrystallisation solvent is often given but these are imprecise items of data; most organic compounds can be crystallised from several solvent systems and the crystal form often varies. In the case of the small number of compounds where crystal behaviour has been intensively studied (e.g. pharmaceuticals), it is found that polymorphism is a very common phenomenon and there is no reason to believe that it is not widespread among organic compounds generally.

Melting points and boiling points

The policy followed in the case of conflicting data is as follows:

  1. Where the literature melting points are closely similar, only one figure (the highest or most probable) is quoted.
  2. Where two or more melting points are recorded and differ by several degrees (the most likely explanation being that one sample was impure), the lower figure is given in parentheses, thus: 139° (134-135°).
  3. Where quoted figures differ widely and some other explanation such as polymorphism or incorrect identity seems to be the most likely explanation, both figures are quoted without parentheses, thus Mp 142º, Mp 205-206°.
  4. Known cases of polymorphism or double melting point are noted. Boiling point determination is less precise than that of melting points and conflicting boiling point data is not usually reported except when there appears to be a serious discrepancy between the different authors.


Optical rotations

These are given whenever possible, and normally refer to what the DNP contributor believes to be the best-characterised sample of highest chemical and optical purity. Where available an indication of the optical purity (op) or enantiomeric excess (ee) of the sample measured now follows the specific rotation value.

Specific rotations are dimensionless numbers and the degree sign which was formerly universal in the literature has been discontinued.

Densities and refractive indexes

Densities and refractive indexes are now of less importance for the identification of liquids than has been the case in the past, but are quoted for common or industrially important substances (e.g. monoterpenoids), or where no boiling point can be found in the literature.

Densities and refractive indexes are not quoted where the determination appears to refer to an undefined mixture of stereoisomers.

Solubilities

Solubilities are given only where the solubility is unusual. Typical organic compounds are soluble in the usual organic solvents such as ether and chloroform, and virtually insoluble in water. The presence of polar groups (OH, NH2 and especially COOH, SO3H, NR+) increases water solubility.

pKa values

pKa values are given for both acids and bases. The pKb of a base can be obtained by subtracting its pKa from 14.17 (at 20°) or from 14.00 (at 25°).

Spectroscopic data

Spectroscopic data such as uv wavelengths and extinction coefficients are given only where the spectrum is a main point of interest, or where the compound is unstable and has been identified only by spectroscopic data.

In many other cases, spectroscopic data can be rapidly located through the references quoted.


Hazard and toxicity information

General

Toxicity and hazard information is highlighted by the sign , and has been selected to assist in risk assessments for experimental, manufacturing and manipulative procedures with chemicals.

The field of safety testing is a complex, difficult and rapidly expanding one, and while as much care as possible has been taken to ensure the accuracy of reported data, the Dictionary must not be considered a comprehensive source on hazard data. The function of the reported hazard data is to alert the user to possible hazards associated with the use of a particular compound, but the absence of such data cannot be taken as an indication of safety in use, and the Publishers cannot be held responsible for any inaccuracies in the reported information, neither does the omission of hazard data in DNP imply an absence of this data from the literature. Widely recognised hazards are included however, and where possible key toxicity reviews are identified in the references. Further advice on the storage, handling and disposal of chemicals is given in The Organic Chemist's Desk Reference.

Finally, it should be emphasised that any chemical has the potential for harm if it is carelessly used. For many newly isolated materials, hazardous properties may not be apparent or may have been cited in the literature. In addition, the toxicity of some very reactive chemicals may not have been evaluated for ethical reasons, and these substances in particular should be handled with caution.

RTECS® Accession Numbers*

Many entries in DNP contain one or more RTECS® Accession Numbers. Possession of these numbers allows users to locate toxicity information on relevant substances from the NIOSH Registry of Toxic Effects of Chemical Substances, which is a compendium of toxicity data extracted from the scientific literature. For each Accession Number, the RTECS® database provides the following data when available: substance prime name and synonyms; date when the substance record was last updated; CAS Registry Number; molecular weight and formula; reproductive, tumorigenic and toxic dose data; and citations to aquatic toxicity ratings, IARC reviews, ACGIH Threshold Limit Values, toxicological reviews, existing Federal standards, the NIOSH criteria document program for recommended standards, the NIOSH current intelligence program, the NCI Carcinogenesis Testing Program, and the EPA Toxic Substances Control Act inventory. Each data line and citation is referenced to the source from which the information was extracted.


Bibliographic References

The selection of references is made with the aim of facilitating entry into the literature for the user who wishes to locate more detailed information about a particular compound. Thus, in general, recent references are preferred to older ones, particularly for chiral compounds where optical purity and absolute configuration may have been determined relatively recently. The number of references quoted cannot therefore be taken as an indication of the relative importance of a compound, and the references quoted for important substances may not be the most significant historically.

References are given in date order except for references to spectroscopic library collections, which sort at the top of the list, and those to hazard/toxicity sources which sort at the bottom.

The content of most references is indicated by means of suffixes, known as reference tags. A list of the most common ones is given in Table 4, p. 145 of the printed User Manual. For references describing a minor natural product which has been included in DNP as a derivative of a parent compound, the reference tag may be the identifying name of the natural product, e.g. (Laciniatoside II).

Some reference suffixes are now given in boldface type, where the editors consider the reference to be particularly important, for example the best synthesis giving full experimental details and often claiming a higher yield than previously reported methods.

In some entries, minor items of information, particularly the physical properties of derivatives, may arise from references not cited in the entry.


Journal abbreviations

In general these are uniform with the Chemical Abstracts Service Source Index (CASSI) listing except for a short list of very common journals:

DNP ABBREVIATION CASSI
Acta Cryst. (and sections thereof) Acta Crystallogr. (and sections thereof)
Annalen Justus Liebigs Ann. Chem.
Chem. Comm. J. Chem. Soc., Chem. Commun.
J.A.C.S. J. Am. Chem. Soc.
J.C.S. (and various subsections thereof) J. Chem. Soc. (and various subsections thereof)
J. Het. Chem. J. Heterocycl. Chem.
J.O.C. J. Org. Chem.
Tet. Lett. Tetrahedron Lett



Entry under review

The database is continually updated. When an entry is undergoing revision at the time of a on-line release (for example by the addition of further derivatives or references), this is indicated by a message at the head of the entry.



*RTECS® Accession Numbers are compiled and distributed by the National Institute for Occupational Safety and Health Service of the U.S. Department of Health and Human Services of The United States of America. All rights reserved. (1996)


 
DNP 23.1 Copyright © 2014 Taylor & Francis Group
All Rights Reserved
(CDP2)