CC-DPS Logo

Visual Image02

Chemical Big Data for

Chemical AI

Visual Image02
“No Data, No AI.” “No Data, No AI.” “No Data, No AI.” “No Data, No AI.”

Open-source programs for AI development are widely available, but large-scale, high-quality chemical data is rare. Instantly access over 3 trillion data entries for more than 1 billion chemical compounds, offering high-quality chemical datasets essential for effective AI training. Explore the details below. Open-source programs for AI development are widely available, but large-scale, high-quality chemical data is rare. Instantly access over 3 trillion data entries for more than 1 billion chemical compounds, offering high-quality chemical datasets essential for effective AI training. Explore the details below. Open-source programs for AI development are widely available, but large-scale, high-quality chemical data is rare. Instantly access over 3 trillion data entries for more than 1 billion chemical compounds, offering high-quality chemical datasets essential for effective AI training. Explore the details below.

Download Full Data List
  • Refined Experimental Property Data
  • Reliability-Proven Predicted Property Data
  • Molecular Structure and Identifier
  • Spectroscopic Data
  • Quantum Chemical Data
  • Molecular Descriptor (2-Dimensional) Data
  • How to Obtain Chemical Big Data

Download

Full List of Chemical Big Data

Refined Experimental Property Data

  • We provide refined experimental property data across thermo-physicochemical, thermodynamic, transport, and pharmaceutical properties, which serves as an important benchmark to cross-verify predictions and improve robustness. Below is the list of available refined experimental data

    Property Name Number of
    Compounds
    Number of
    Data Points
    Constant Properties
    Absolute Entropy of Ideal Gas at 298.15 K and 1 bar 1,864 1,864
    Acentric Factor 1,857 1,857
    Critical Compressibility Factor 1,355 1,355
    Critical Pressure 3,157 3,157
    Critical Temperature 2,668 2,668
    Critical Volume 2,468 2,468
    Dipole Moment 9,707 9,707
    Electron Affinity 200 200
    Enthalpy (Heat) of Formation for Ideal Gas at 298.15 K 1,960 1,960
    Enthalpy (Heat) of Fusion at Melting Point 3,930 3,930
    Flash Point 3,728 3,728
    Gibbs Energy of Formation for Ideal Gas at 298.15 K and 1 bar 1,823 1,823
    Heat (Enthalpy) of Vaporization at 298.15 K 2,875 2,875
    Heat (Enthalpy) of Vaporization at Normal Boiling Point 1,210 1,210
    Ionization Potential 6,079 6,079
    Liquid Density at Normal Boiling Point 1,813 1,813
    Liquid Molar Volume at 298.15 K 8,140 8,140
    Lower Flammability Limit Temperature 1,067 1,067
    Lower Flammability Limit Volume Percent 391 391
    Magnetic Susceptibility 1,205 1,205
    Net Standard State Enthalpy (Heat) of Combustion at 298.15 K 2,163 2,163
    Normal Boiling Point 35,489 35,489
    Parachor 965 965
    Polarizability 380 380
    Radius of Gyration 1,367 1,367
    Refractive Index 60,974 60,974
    Solubility Parameter at 298.15 K 1,494 1,494
    Standard State Absolute Entropy at 298.15 K and 1 bar 1,105 1,105
    Standard State Enthalpy (Heat) of Formation at 298.15 K and 1 bar 3,506 3,506
    Standard State Gibbs Energy of Formation at 298.15 K and 1 bar 1,170 1,170
    Upper Flammability Limit Temperature 1,421 1,421
    Upper Flammability Limit Volume Percent 972 972
    Melting Point 3,583 3,583
    LogP (Octanol-Water Partition Coefficient) 29,199 29,199
    LogS (Water Solubility) 7,443 7,443
    Temperature Dependent Properties
    Heat Capacity of Ideal Gas 1,105 19,353
    Heat Capacity of Liquid 919 17,093
    Heat of Vaporization 1,371 35,941
    Liquid Density 2,770 19,686
    Second Virial Coefficient 485 14,829
    Surface Tension 1,195 11,984
    Thermal Conductivity of Gas 889 8,936
    Thermal Conductivity of Liquid 1,480 18,084
    Vapor Pressure of Liquid 2,939 41,257
    Viscosity of Gas 1,256 12,557
    Viscosity of Liquid 1,841 30,367
  • The experimental data was collected to validate the reliability of our QSQN technology. Due to frequent measuring errors in the raw experimental data, we implemented a systematic refinement process involving basic analysis, statistical filtering, and similarity analysis. For more details, see our refinement example using the normal boiling point as a case study.

Reliability-Proven Predicted Property Data

  • We provide thermo-physicochemical, thermodynamic, transport, and pharmaceutical property data for over 1 billion compounds as listed below.

    Property Name QSQN Technology QN Technology
    Number of Compounds Average Accuracy(%)* Number of Compounds Average Accuracy(%)*
    Absolute Entropy of Ideal Gas at 298.15 K and 1 bar 4+ Million 95.90% 1+ Billion 98.67%
    Acentric Factor 4+ Million 93.70% 1+ Billion 94.45%
    Activity Score for GPCR Ligands 4+ Million N.APP. N.AVA. N.AVA.
    Activity Score for Ion Channel Modulators 4+ Million N.APP. N.AVA. N.AVA.
    Activity Score for Kinase Inhibitors 4+ Million N.APP. N.AVA. N.AVA.
    Activity Score for Nuclear Receptor Ligands 4+ Million N.APP. N.AVA. N.AVA.
    Critical Compressibility Factor 4+ Million 94.90% 1+ Billion 91.70%
    Critical Pressure 4+ Million 94.86% 1+ Billion 93.00%
    Critical Temperature 4+ Million 95.49% 1+ Billion 97.14%
    Critical Volume 4+ Million 94.31% 1+ Billion 97.15%
    Dipole Moment 4+ Million 92.20% N.AVA. N.AVA.
    Drug-Likeness 4+ Million N.APP. N.AVA. N.AVA.
    Electron Affinity 4+ Million 90.41% N.AVA. N.AVA.
    Enthalpy (Heat) of Formation for Ideal Gas at 298.15 K 4+ Million 96.15% 1+ Billion 90.15%
    Enthalpy (Heat) of Fusion at Melting Point 4+ Million 96.77% N.AVA. N.AVA.
    Flash Point 4+ Million 95.30% 1+ Billion 99.94%
    Ghose-Crippen Molar Refractivity 4+ Million N.APP. N.AVA. N.AVA.
    Ghose-Crippen Octanol-Water Partition Coeff. (logP) 4+ Million N.APP. N.AVA. N.AVA.
    Gibbs Energy of Formation for Ideal Gas at 298.15 K and 1 bar 4+ Million 95.04% N.AVA. N.AVA.
    Heat (Enthalpy) of Vaporization at 298.15 K 4+ Million 95.47% 1+ Billion 78.44%
    Heat (Enthalpy) of Vaporization at Normal Boiling Point 4+ Million 95.77% 1+ Billion 97.31%
    Heat Capacity of Ideal Gas 4+ Million 96.70% N.AVA. N.AVA.
    Heat Capacity of Liquid 4+ Million 99.28% N.AVA. N.AVA.
    Heat of Vaporization 4+ Million 98.67% N.AVA. N.AVA.
    Ionization Potential 4+ Million 95.41% N.AVA. N.AVA.
    Lipinski Alert Index 4+ Million N.APP. N.AVA. N.AVA.
    Liquid Density 4+ Million 99.09% N.AVA. N.AVA.
    Liquid Density at Normal Boiling Point 4+ Million 95.98% 1+ Billion 98.92%
    Liquid Molar Volume at 298.15 K 4+ Million 97.21% 1+ Billion 94.82%
    LogP (Octanol-Water Partition Coefficient) 4+ Million 96.72% N.AVA. N.AVA.
    LogS (Water Solubility) 4+ Million 96.28% 1+ Billion 86.61%
    Lower Flammability Limit Temperature 4+ Million 96.43% 1+ Billion 98.92%
    Lower Flammability Limit Volume Percent 4+ Million 95.08% 1+ Billion 66.33%
    Magnetic Susceptibility 4+ Million 94.68% 1+ Billion 85.64%
    Moriguchi Octanol-Water Partition Coeff. (logP) 4+ Million N.APP. N.AVA. N.AVA.
    Net Standard State Enthalpy (Heat) of Combustion at 298.15 K 4+ Million 95.87% 1+ Billion 86.94%
    Normal Boiling Point 4+ Million 95.02% 1+ Billion 99.69%
    Number of Acceptor Atoms for H-bonds (N,O) 4+ Million N.APP. N.AVA. N.AVA.
    Number of Donor Atoms for H-bonds (N,O) 4+ Million N.APP. N.AVA. N.AVA.
    Melting Point 4+ Million 84.90% 1+ Billion 83.08%
    Parachor 4+ Million 97.18% 1+ Billion 94.41%
    Polarizability 4+ Million 91.02% 1+ Billion 84.13%
    Radius of Gyration 4+ Million N.APP. N.AVA. N.AVA.
    Refractive Index 4+ Million 95.99% 1+ Billion 84.11%
    Second Virial Coefficient 4+ Million 91.94% N.AVA. N.AVA.
    Solubility Parameter at 298.15 K 4+ Million 95.88% 1+ Billion 98.94%
    Standard State Absolute Entropy at 298.15 K and 1 bar 4+ Million 97.49% 1+ Billion 94.52%
    Standard State Enthalpy (Heat) of Formation at 298.15 K and 1 bar 4+ Million 94.87% 1+ Billion 79.53%
    Standard State Gibbs Energy of Formation at 298.15 K and 1 bar 4+ Million 94.14% N.AVA. N.AVA.
    Surface Tension 4+ Million 93.59% N.AVA. N.AVA.
    Thermal Conductivity of Gas 4+ Million 90.08% N.AVA. N.AVA.
    Thermal Conductivity of Liquid 4+ Million 91.82% N.AVA. N.AVA.
    Upper Flammability Limit Temperature 4+ Million 94.22% 1+ Billion 96.66%
    Upper Flammability Limit Volume Percent 4+ Million 95.43% 1+ Billion 92.96%
    van der Waals Area 4+ Million N.APP. N.AVA. N.AVA.
    van der Waals Reduced Volume 4+ Million N.APP. N.AVA. N.AVA.
    Vapor Pressure of Liquid 4+ Million 97.86% N.AVA. N.AVA.
    Viscosity of Gas 4+ Million 98.37% N.AVA. N.AVA.
    Viscosity of Liquid 4+ Million 89.62% N.AVA. N.AVA.
  • The property data listed above is produced using our proprietary QSQN and QN technology, backed by 41 patents, and has been validated to demonstrate proven reliability. The compounds, composed of C, H, N, O, S, F, Cl, Br, I, Si, P, and/or As, span a wide range of structures and compositions.

Molecular Structure and Identifier

  • We offer comprehensive molecular structures and identifiers for over 1 billion compounds as listed below.

    Structure & Identifier Number of Compounds
    Optimized 3D Structure Data (Mol) File 4+ million
    3D Structure Data (Mol) File 1+ Billion
    2D Structure Data (Mol) File 1+ Billion
    SMILES String 1+ Billion
    InChI 1+ Billion
    InChIKey 1+ Billion
  • The optimized 3D structures are produced using a high-quality quantum chemical computation process. We perform conformer analysis to select the lowest energy structure, followed by geometry optimization with the DFT-B3LYP functional and 6-31G basis set. All optimized structures are verified to ensure the absence of imaginary frequencies.

Spectroscopic Data

  • We provide comprehensive spectroscopic data for over 4 million compounds, with each and every compound featuring key information for molecular identification and analysis. The following spectroscopic data is available:

    • Infrared (IR) Spectroscopy:

      Detailed vibrational frequencies and intensities essential for identifying functional groups and molecular structures. IR spectroscopy data is available in JDX (JCAMP Chemical Spectroscopic Data Exchange Format) as well, ensuring easy integration into various analysis tools and workflows.

    • Nuclear Magnetic Resonance (NMR) Spectroscopy:

      ¹H, ¹³C, ¹⁵N, ¹⁷O, and ³²S NMR data, providing insights into molecular environments, bonding interactions, and chemical connectivity.

    • Vibrational Circular Dichroism (VCD) Spectroscopy:

      Provides chiral-sensitive data that enhances molecular structure determination, particularly useful for studying the stereochemistry of organic compounds.

  • These datasets are derived from high-quality quantum chemical computations as outlined in our proprietary QSQN technology.

Quantum Chemical Data

  • We provide an extensive collection of quantum chemical data for over 4 million compounds, critical for developing AI models in areas like molecular design, drug discovery, and material science. The data listed below represents only a subset of the complete data (Download full data list).

  • Quantum Chemical Computation Data

    • Vibrational Frequency Data:

      Provides key insights into molecular vibrations, essential for understanding molecular dynamics.

    • Total Energy:

      Critical for determining the stability and potential reactivity of a molecule.

    • Molecular Orbital Energies (HOMO, LUMO):

      Important indicators of a molecule's ability to donate or accept electrons, useful for predicting chemical reactivity.

    • Mulliken Charges, Cartesian Coordinates, and Force Constants:

      Important computational outputs used for electronic structure analysis and geometric optimization.

  • Quantum Chemical Descriptors (3-Dimensional Molecular Descriptors)

    • HOMO-LUMO Energy Gap:

      A widely used metric for predicting molecular stability and chemical reactivity.

    • Nucleophilic and Electrophilic Reactivity Indices:

      Critical for understanding how a molecule interacts with other species in nucleophilic (electron-donating) and electrophilic (electron-accepting) reactions.

    • Atomic Charges and Molecular Dipole:

      Key indicators of charge distribution within the molecule, influencing how it interacts in chemical and biological environments.

  • Electrostatic Descriptors (3-Dimensional Molecular Descriptors)

    Electrostatic properties are crucial for understanding a molecule’s behavior in different environments, such as solvent interactions:

    • Max and Min Partial Charges:

      Provide insight into the distribution of charge across a molecule, helping predict areas of electron density.

    • Polarity Parameter:

      Measures the overall polarity of a molecule, which is important for predicting solubility and molecular interactions.

    • Surface Area Metrics:

      Includes Total Molecular Surface Area, Partial Positive and Negative Surface Areas, and Charge-Weighted Surface Areas—essential for understanding molecular interactions like binding affinity.

  • Quantum Chemical Computation Result File

    • FCHK File:

      A file that contains comprehensive quantum chemical computation results, providing a complete dataset for each molecule that can be further analyzed or integrated into AI models.

Molecular Descriptor (2-Dimensional) Data

  • Molecular descriptors mathematically represent the properties of molecules and play a crucial role in improving the performance of AI models, enabling AI systems to more accurately understand and predict the structural, chemical, and geometrical characteristics of molecules in fields such as chemical research, material science, and drug design

  • We offer over 3,000 2-dimensional molecular descriptors for more than 1 billion compounds, organized into 20 distinct categories, providing detailed insights into molecular behavior for AI-driven research. Below are the descriptor categories and the corresponding number of available descriptors.

    Descriptor Name No
    Constitutional Descriptors 43
    Ring Descriptors 32
    Topological Descriptors 75
    Walk And Path Counts 46
    Connectivity Indices 37
    Information Indices 48
    2D Matrix-Based Descriptors 550
    2D Autocorrelation Indices 213
    Burden Eigenvalue Descriptors 96
    P_VSA-Like Descriptors 45
    Descriptor Name No
    ETA Indices 23
    Edge Adjacency Indices 324
    Functional Group Counts 155
    Atom-Centred Fragments 115
    Atom-Type E-State Indices 169
    CATS 2D Descriptors 150
    2D Atom Pair Descriptors 1596
    Charge Descriptors 15
    Molecular Properties 20
    Drug-Like Indices 27
  • Each descriptor category provides a unique perspective on the molecular structure and properties. For a detailed list of descriptors within each category, a separate file is available for download (Download full data list).

How to Obtain Our Chemical Big Data

  • We understand that each customer’s data needs are unique. We therefore offer the flexibility to choose only the specific data you need from our 30+ billion datasets, ensuring a fully customized solution tailored to your exact requirements.

  • Simply contact us at contact@cc-dps.com and let us know the type of data and compounds you need, the quantity of data points or compounds, and your preferred format for receiving the data (e.g., as a file, integrated database, API, or other specific formats). Once we receive your inquiry, we will provide you with a tailored quote based on your specific requirements.

  • Feel free to reach out to us anytime—we’re here to help you get the data you need for your AI development.

Download Full Information List