You are here



Bob Belford, University Arkansas Little Rock
Jordi Cuadros, IQS Universitat Ramon Llull
Andrew Cornell, University of Arkansas at Little Rock
Tanya Gupta, South Dakota State University
Ehren Bucholtz, University of Health Sciences and Pharmacy in St. Louis


This paper is about IUPAC project 2018-012-2-0241, the InChI OER, which is an Open Education Resource designed to provide educators and other interested parties with resources, training material and information related to InChI, the International Chemical Identifier.  The InChI OER is an extension of the InChI Trust website that allows people to share and find resources related to InChI. Although the InChI OER is of value to a wide variety of people this paper seeks to reach out to chemical educators and provide them with an understanding of InChI and its role in the practice of science. 



  1. Why Should Chemical Educators Know About InChI?
  2. What is InChI?
  3. What is the InChI OER?
  4. Exemplar of InChI OER; Safety & the Stockroom.
  5. Future of InChI & the InChI OER


Why Should Chemical Educators Know About InChI?

Today's students will be entering a world and workplace where decision making and the practice of science will become ever increasingly intertwined with big data, and yet this topic is noticeably missing in the undergraduate chemistry curriculum. In 2009 Microsoft Research published the Fourth Paradigm, an anthology of essays on data science in memory of the data scientist Jim Gray. The introduction "Jim Gray on eScience: A Transformed Scientific Method"  was based on a 2007 talk he gave in Mountain View California,  which defined four paradigms of science that can give us a perspective with which to approach the topic of InChI.

  • 1st Paradigm: Empirical Science (thousands of years old)
  • 2nd Paradigm: Theoretical Science (centuries old)
  • 3rd Paradigm: Computational Science (decades old)
  • 4th Paradigm: eScience/data exploration (years old)


Roughly 220 years earlier Lavoisier in his 1789 book, "The Elements of Chemistry, in a New Systematic Order, Containing all the Modern Discoveries" stated: 

“As ideas are preserved and communicated by means of words, it necessarily follows that we cannot improve the language of any science, without at the same time improving the science itself; neither can we, on the other hand, improve a science without improving the language or nomenclature which belongs to it"1

At the turn of the second millennium, it was clear that a new form of chemical representation was evolving that aligned with the third and fourth paradigms of science, and could be used by machines to both store data and perform complex computations. Digitized molecular representation essentially took three different forms, molecular graphs, line notation and index numbers. A molecular graph was the most data intensive format, which at a minimum contained two data tables, an atom table (identifying the atoms in a molecule) and a connection table (identifying the bonds).  Line notations used rules to code the information of the molecular graph into a one dimensional string of characters, with SMILES (Simplified Molecular Input Line Entry System) being the most common. Issues of canonicalization arose as different software would often generate different smiles for the same molecules (caffeine was shown to have up to 4,160 different SMILES strings2), and there were even issues with canonical SMILES as the string depended on the canonicalization algorithm. The last and simplest way of representing a molecule on a computer was using an index number or alphanumeric string like a Chemical Abstract Services (CAS) number, which required a lookup table and had no structural information coded into it.

Lavoisier's statement was prophetic and as the chemical sciences evolved beyond empirical and theoretical paradigms there was an interdependent and concurrent need for the chemical nomenclature to evolve. That is, Fourth Paradigm Science could not effectively contribute to scientific discovery if communication across databases and software agents was impeded by the fact they used different identifiers to describe the same chemical. IUPAC is the international authority on chemical nomenclature and terminology whose recommendations are made public through the IUPAC Color Books3 and the journal Pure and Applied Chemistry.4  IUPAC recognized the need to extend the realm of standardized nomenclature into computer representations and in March of 2000 organized a meeting at the US National Academy of Sciences to explore this.5 They subsequently teamed up with the US National Institute of Standards and Technologies (NIST), that was already working on the problem, and in January of 2001 the IUPAC-International Chemical Identifier (IChI) project was initiated (IUPAC Project 2000-025-1-800).6 In 2004 this was changed to InChI (International Chemical Identifier) with the initial release occurring in 2009. In 2010 the not-for-profit InChI Trust was formed, which works with the IUPAC Division VIII InChI Subcommittee to develop, test and advance the functionality of InChI in an effort to align with the needs of the evolving sciences.

There were essentially two goals with the initial release of InChI. First, was to enable communication across databases and software agents. The objective was not to replace the identifiers a database used, but to allow that identifier to be converted to an InChI, and thus if two databases used different identifiers for the same chemical, the data could be aligned as they both convert to the same InChI.  The second is to come up with an open source standard canonical representation where only one InChI would be generated for a specific molecule, and that anyone could use. This is not to say that you can not have a canonical SMILES, but they are only canonical if the same algorithm generated them. Thus, by having a unique canonical identifier that the different (often proprietary) identifiers of specific databases and software agents can interconvert between, the InChI enables data exploration in the chemical sciences. As Lavoisier so aptly stated above, "neither can we, on the other hand, improve a science without improving the language or nomenclature which belongs to it" and InChI can improve the practice of the chemical sciences in the age of big data.


What is InChI?

The InChI Trust has posted to YouTube the following series of Videos describing InChI:

  1. What on Earth is InChI?
  2. The Birth of InChI
  3. The Googlable InChIKey.
  4. InChI and the Islands.

So, InChI is a structure-based textual chemical identifier designed to be unique and to encode substance identity at different levels of granularity. Anyone knowing the formula of a chemical entity will be able to use the free software provided by the InChI Trust to obtain its InChI (see Figure 1).

Additionally, a hashed representation of the InChI, the InChI Key, is also computed. This representation is more compact (27 characters) and may be more appropriate for indexing and searching purposes.

Figure 1: Windows version of the InChI software.

The capability of the InChI to represent a chemical to different levels of details is based on its layered structure. The Standard InChI has five core layers (and several sublayers within the core layers) . It starts with "InChI=1S/", which indicates the version of the InChI being used, in this case, version 1. "S" stands for standard. After the prefix, the layers and sublayers are encoded, separated with forward slashes, "/". An example is shown in Figure 2.

The main layers for a standard InChI of [(R)-carboxy(chloro)methyl]azanium, the protonated form of 2-(35Cl)chloro-R-glycine. Note each layer or sublayer is separated by a forward slash

Figure 2: The main layers for a standard InChI of [(R)-carboxy(chloro)methyl]azanium, the protonated form of 2-(35Cl)chloro-R-glycine. Note each layer or sublayer is separated by a forward slash.

The five layers in the standard InChI are a main layer, which includes the chemical formula, the connectivity, and the position of the hydrogens; a charge layer, which encodes changes in charge or protonation; a stereochemical layer, where double bond (Z/E) stereochemistry and tetrahedral stereochemistry are reflected; an isotopic layer, to indicate isotopical changes; and a fixed-hydrogens layers, to locate mobile hydrogens if required. Additional layers, like a reconnected layer, used when metals are present, or a polymer layer, devoted to encoding polymeric entities, are also considered in version 1.05 of the InChI documentation7.

Table 1 shows a set of InChI for related compounds to illustrate these layers and show how these help reflecting on chemical identity. Isomers share the chemical formula sublayer, but while structural isomers differ on the connections or hydrogens sublayers, stereoisomers do not. Stereoisomers, charged species and isotopomers have identical main layers and are differentiated by the additional layers of the InChI. 

Table 1. InChI for a set of related chemical entities.


Table 1. InChI for a set of related chemical entities.

Entity (PubChem CID)

InChI, InChI Key

but-2-enoic acid (19499)



methyl prop-2-enoate (7294)




but-3-enoic acid (32743)




(E)-but-2-enoate (6971246)



(Z)-but-2-enoic acid (643792)



(3,4-13C2)but-2-enoate (153704929)



The InChILayersExplorer (fig. 3) is an Excel spreadsheet that allows one to explore and exemplify the layers in the InChI is available at the InChI OER, in the image below a deuterated thalomide was processed and you can see the code for 5 layers. 

Figure 3: Screenshot of InChILayersExplorer that can be obtained in the InChI OER.  This one shows the layers of a deuterated thalidomide (red insert).


What is the InChI OER?

There are a type of online OERs (Open Education Resources) that are discipline or topic specific and take advantage of Web 2.0 Content Management Systems (CMS) to advance the teaching and learning of chemistry through the sharing of instructional resources. Many of these are community based and some examples include the ASDL (Analytical Sciences Digital Library), IONiC/VIPEr (Virtual Inorganic Pedagogical Electronic Resources), OrganicERs (Organic Education Resources) and XCITR (Exploring Chemical Information Teaching Resources). The InChI OER is similar to these and is integrated into the InChI-Trust website, which is built on the WordPress Web 2.0 CMS and can be accessed through the "RESOURCES" tab on the Trust's website (figure 4). The InChI OER is being developed by an IUPAC Task Group (project 2018-012-3-024) involving support through the InChI Trust, the IUPAC Committee on Publications and Cheminformatics Data Standards, the IUPAC Committee on Chemistry Education and IUPAC Division VIII, Chemical Nomenclature and Structure Representation. The InChI OER taskgroup consists predominantly of educators who are working with the Division VIII InChI Subcommittee and the InChI Trust in an effort to bring about a greater awareness of InChI and its impact on the practice of science.

Figure 4: The "resources" tab on the header of InChI Trust webpage has a link to the InChI OER.

Like other web 2.0 education instructional resources sites the InChI OER allows people to share and find resources. When content is uploaded to the OER it is tagged, and then those tags are exposed to the public as a filter, which can then be used to find material within the site. There are actually two types of tags, "content type" tags and the "InChI" tags.  The "content type" tag distinguishes open access content from non-open access content, and only open access content is shown by default. We felt it necessary to create the non-oer option as there are valuable non-oer publications related to InChI, and these can also be searched with the InChI taxonomy tags by simply clicking the non-OER radio button (clicking both gives you access to all content).  A link to the content is provided if the content is non-OER or published in an Open Access Journal, otherwise the content is directly uploaded to the InChI Trust website and available to the public through the InChI OER (all content uploaded to the site is OER). 

When the InChI OER loads you see a tag filter on the left and a display of hits on the right, and you need to click the non-OER if you want to see all content cataloged by the site. You can scroll through the InChI tags and simply filter the item by clicking on a tag ("<ctrl> click" allows a Bollean "AND" filter of multiple tags). Figure 4. shows the hits returned for a search of "spreadsheet".

Figure 5: The InChI OER tag filter and "hits" when the tag "spreadsheet" is chosen.

Choosing the content from figure 5 titled “InChILayersExplorer”, the OER will bring up a page dedicated to that specific resource (the spreadsheet in figure 3). Depending on the resource, different items may be shown on the page, but all will have an information box (figure 6 for the InChILayersExplorer). Some will contain links to download files, while others may link to the original host that published the content. The last piece of information includes the tags located at the bottom to which that material has been tagged with. Clicking a tag will pull up all other resources tagged with the same term.

Figure 6: The information box associated with the InChILayersExplorer. Note you can download the spreadsheet in figure three from the third field in the box

Exemplar of the InChI OER: Application to Safety and the Stockroom

In this section we will look at how one can go to the InChI OER and learn how to create a Google Sheet that will take your chemical stockroom inventory and connect it to PubChem Laboratory Chemical Safety Summaries (LCSS).8,9 PubChem is a public compound database and one of the largest "big data" repositories in the chemical sciences, which on November 27, 2020 had information on 111,458,063 chemical compounds from 762 data sources.10 PubChem aggregates data from multiple sources in a chemical compound's summary page and maintains the provenance of that data by linking to the original source. The PubChem LCSS are a subset of the information in a chemical's compound summary page focused on hazard and safety information, as exemplified by this LCSS for BenzenePubChem LCSS are modeled after the LCSS format described by the National Academy of Science's National Research Council's publication Prudent Practices in the Laboratory: Handling and Management of Chemical Hazards.11 

Most chemists query the web for information through a Graphical User Interface (GUI) like a search interface on a webpage through their web browser, but you can also navigate the web through calls in a spreadsheet to an Application Program Interface (API).  If one goes to the InChI OER and selects the tags "Google Sheet" and "safety", they filter the content for Google sheets related to safety, where they can find and make a copy of "ChemNames2LCSS" sheet (fig. 7). This spreadsheet allows users to paste a list of up to 1000 chemicals (think stockroom inventory) and instantly generate links to the LCSS of the chemicals. This sheet is a simple way a practicing chemist can use Big Data in their laboratory and is based on the semantic framework of an RDF (Resource Descriptive Framework) triple (subject, predicate, object) relationship.12 Here the subject is the name of a chemical on your spreadsheet, the predicate is "has LCSS of" and the object of that query is that chemical's LCSS, in the form of a link to the LCSS.

Initially when you load a list of words they are in red, which means there is no lcss connected to that word (Figure 7).

Figure 7: Screencapture of the Google Sheet "ChemNames2LCSS" showing three tabs. 

As the Google doc processes the list some of them turn black, meaning that word is a unique chemical and there is now an LCSS connected to that chemical (figure 8).  If the word becomes shaded it means it is redundant to a name already linked and if it stays red, it is not identified as a different chemical. 

Figure 8: The InputChemicalsList after processing.  Of the 10 entries, only 6 are unique chemicals.

If you click the "OrderedStockroom" tab you get a unique list of chemicals in alphabetical order with links to the LCSS (Fig. 9).

Figure 9: "Ordered Stockroom" tab provides alphabetized list of the identified 6 unique chemicals with links to LCSS.

The question becomes, what was the role of the InChI in this endeavor?  One of the first steps in processing data is to clean it up, and for example, a stockroom inventory will have multiple listings of the same chemical, along with content that may not be a chemical.  If you look at the PubChemDataCollection tab (figure 10) you see a list of InChIs (column C), which would identify each unique chemical.  The script used the InChI to identify the unique chemicals, which were then sorted in alphabetical order and listed in the middle tab (fig. 9) with a link to the PubChem LCSS. Note how row 7 (Erlenmeyer flask) produced an error and could not be processed, and so this item stayed red (fig. 8).

Figure 10: The PubChemDataCollection tab was used to process the data.  Note, there ae many more columns to this tab.

The use of InChI thus allowed us to identify redundant entries and synonyms, but as we will see in the next section on the future of InChI, additional layers like the description of mixtures (MInChI) are also of critical importance, as often times the phase and concentration of a chemical influence its properties and safety concerns. That is, the role of digital representation of chemicals is an ongoing endeavor of IUPAC and the InChI Trust, and the layered nature of InChI allows continual development of its functions and features.


Future of InChI & the InChI OER

Part 1: Future work of InChI OER

There are currently over 100 items in the InChI OER and beyond identifying and creating new content related to InChI, the next stage is to refine the taxonomy to make the content more discoverable. Right now the Taxonomy is tiered with 8 broad categories.

  1. InChI Algorithm & Description
  2. InChI Application
  3. InChI Development/Modifications
  4. Audience
  5. Curricular Materials
  6. Content Type
  7. File Type 
  8. Language

The taxonomy is a controlled vocabulary and user's can not add their own terms.  Please contact Bob Belford ( if you have any comments on the taxonomy or wish to contribute content to the InChI OER.


Part 2: Future work of InChI

The IUPAC Subcommittee on InChI and the InChI Trust are involved with multiple projects to advance the functionality of InChI.  Here is a list of several recent projects to give you a feel for the type of work being done and the directions InChI is moving in.

  1. RInChI (Reaction InChI): International chemical identifier for reactions (RInChI)
  2. MInChI (Mixtures InChI):Capturing mixture composition: an open machine-readable format for representing mixed substances
  3. Coordination InChI for inorganics: now with stereochemistry (blog post by Dr. Alex M. Clark):
  4. TautomerismToward a Comprehensive Treatment of Tautomerism in Cheminformatics Including in InChI V2
  5. QR Codes: QR Code InChI


Future Education Opportunities

Faculty who are interested in learning more about InChI and cheminformatics in general are encouraged to participate in the Fall 2021 Cheminformatics OLCC that is sponsored by the ACS CHED CCCE.  OLCCs are intercollegiate courses that have been run by the CCCE for almost a quarter of a century, and may indeed be the oldest ongoing online course in the chemical sciences. The goal of an OLCC is to allow schools to offer courses they can not normally offer by bringing in online external experts and you are encouraged to read the 2016 CCCE Newsletter article Twentieth Anniversary of the OLCC.

The material for the Fall 2019 Cheminformatics OLCC can be found in the LibreText OER and a paper on the past three Cheminformatics OLCCs, "Teaching Cheminformatics through a Collaborative Intercollegiate Online Chemistry Course (OLCC)" has been accepted by the Journal of Chemical Education and is now under technical review (we will update this article with a link as soon as it is web-published). Please contact Bob Belford (, Ehren Bucholtz ( or Sunghwan Kim ( if you are interested in learning more about the Fall 2021 Cheminformatics OLCC. 


We would to thank the following for their help and support in our efforts:  Steve Heller, Richard Kidd, Jonathan Goodman, Leah McEwen, Ralph Stuart, Nathan Brown, Tina Qin, Vincent Scalfani, Martin Walker, Steven Wathen
  1. The Project Gutenberg eBook of Elements of Chemistry, by Mr Lavoisier.
  2. July 2014 – NextMove Software.
  3. Color Books. IUPAC | International Union of Pure and Applied Chemistry
  4. Pure and Applied Chemistry. De Gruyter
  5. What on Earth is InChI? IUPAC 100
  6. Project Details IUPAC-International Chemical Identifier.
  7. Stein, Stephen E., Heller, Stephen R.,  Tchekhovskoi,  Dmitrii V., Pletnev, Igor V. (2017) IUPAC International Chemical Identifier (InChI). InChI version 1, Software version 1.05. Technical Manual. [electronic].
  8. PubChem. Laboratory Chemical Safety Summary (LCSS) views now available in PubChem. PubChem Blog (2015).
  9. PubChem Laboratory Chemical Safety Summary | DivCHED CCCE: Committee on Computers in Chemical Education.
  10. PubChem Statistics.
  11. Council, N. R. Prudent Practices in the Laboratory: Handling and Management of Chemical Hazards, Updated Version. (2011). doi:10.17226/12654
  12. Semantic triple. Wikipedia (2020).


12/07/20 to 12/09/20


In addition to the stockroom use case described here of recognizing duplicates of chemicals with different names, the PubChem connection can be used to collect a wide variety of safety information. For example, PubChem includes both raw data, GHS classifications and more complex information that is often not available on Safety Data Sheets. Automated collection of specific information such as flashpoint or chemical incompatibilities can be very helpful in conducting safety reviews of both teaching and research lab work. 

I wonder if anyone in the group has explored this opportunity? I have some ideas for how this might be used in a teaching lab setting to help students understand appropriate safety precautions associated with lab work.

Bob Belford's picture

Hi Ralph,

I just realized the ConfChem list was not subscribed to the paper and so your comment was never sent, and I am responding to it now.

You are right, PubChem has a lot of data that can be useful in many ways.  One of the "challenges" is that you can get contraditory "raw data" for the same chemical, as PubChem takes data from multiple sources, and those sources may not align.  In fact I believe one could say that for a chemical to be validated in PubChem it needs to be "InChI-able", that is, you need to be able to generate an InChI for it. In the sixth paper of this Newsletter we will discuss some of the safety material in our online lab, and this link should allow you to make a copy of the SDS worksheet, and if you go to question 8 (and 9), you can see how I tried to bring up the topic of data provenance,

The other comment I would make is that many people may not realize there is both computed and raw data in PubChem, if you go to the record for benzene, section 3.1 you get the computed values, below that are the experimental.  But there are issues with experimental data, and students need to be able to "think" about data, and not believe it is correct, just because they found it online.


Hi Bob, Jordi,

The CHEMNames2LCSS is pretty slick!  We typically include safety information in the lab book (for freshman anyway), but this would be a nice mechanism to include information literacy in the labs by having students look up data. 



Natalie Ulrich's picture

*furiously taking notes Jason :)

>One of the "challenges" is that you can get contraditory "raw data" for the same chemical, as PubChem takes data from multiple sources, and those sources may not align. 

Good point. I suspect that this is an opportunity to discuss the concent of uncertainities in even fundamental data. Hazmat responders are taught to consult three sources of information before acting on a specific property. My observation is that chemists tend to take the first value they find and run with that...

I taught a course in Environmental Health Chemistry for many years before I retired and recently pbulshed a paper in JCE with an outline.  I am now retired, but if I were still teaching, the references to PubChem would be an excellent addition.  If anyone would like to look at what I did, the reference is

The paper that I referenced is unfortunately behind a paywall at ACS.  If you are interested and send me a request at I will be happy to send you a copy of the draft version.  

rpendarvis's picture

I used SDS in testing in a forensic science course.  The question was a fictitious nursing home poisoning scenario some years ago.  The french fries were found to have a strange odor and containers of Chevron Techron were found in the pantry next to the soybean oil. Given SDS for both, some still could not identify the cause.  I suppose it was information overload.

Bob Belford's picture

Hi All,

We had created a super quick (less than a minute) survey to try and get a bearing on people's awareness of InChI.  We would be grateful if you could respond, and do so from the perspective of before you read the paper.  Here is the link, and it is anonymous.

Thank you,


Milind Khadilkar's picture

Hi all, I am a software developer and entrepreneur from Mumbai, India, not a Chemistry person.  My experience with InChI was first around 2004 or so (I am writing all this as I remember...I could be wrong) before it was released and then again in 2011. I was the man facilitating coordination between a Chemistry team (doing journal based research) and a software development team and the collaborative effort was to create a storage-efficient software repository for data about moieties and a fast search engine for moieties. Incidentally, I am not a Chemist, but was in my role because I remembered my school and undergraduate Chemistry and was willing to tackle it. InChI was to be the master component of the software, but that decision was taken by the software team. The Chemistry team had not heard of InChI. So one of my tasks was to interpret InChI for both the Chemistry and software teams.

While doing that, I realized that the InChI layers could have significance in the learning or teaching of Chemistry too. I explored it with some success (limited by my total lack of common sense aspects of Chemistry) with a group of students. I got may be a 100 people sensitized to the existence of InChI, but the reception to my idea of using InChI for Chemistry education got a lukewarm response from undergraduate teachers. A few years later I also tried to take InChI to chemists from the industry. There too, most had not heard of it (in 2011-12) and even after I gave them an introduction with possible applications in molecular dossiers (MSDS???) and in inventory management it did not interest them. Also, they were not enthused with the idea of InChI as an educational tool. (Note that, at that time, probably even now, practicing Chemists from industry use common names in their communication, not the IUPAC nomenclature, which they regarded as a "theoretical" device).

However, for one aspect, there was some interest, viz., the canonical numbering of atoms within InChI. There, the fact that carbon atoms in a chain could be numbered uniquely through a purely mathematical process interested some teachers, but there I could not participate fruitfully in their Chemistry discussions related to the Chemical significance, if any, of the numbers.

I would very much like to know whether InChI is being used to teach Chemistry in any way, or rather, whether it can be, whether it ought to be. I am not referring to Gray's 4th paradigm here, but to a possible use in conceptual understanding.

During those days, we were thwarted by InChI not helping us in (a) tautomers, (b) Markush structures (c) identification of "chemically significant" rings [The Chemists felt that the rings that could be got from an InChI string were significant only from a "diagrammatic" point of view] (d) Regiochemistry and most importantly (e) non molecular moieties. Also, documentation was not easily available in the early days. By 2011 that was not a cause for complaint.

There was also a discussion on whether the layered structure was stable, whether it could survive across versions, as InChI tries to assimilate more and more variations. Apparently, it has survived beautifully, possibly come out stronger.

Please excuse me if the above makes no sense. I am not a Chemistry person, and this was long, long ago. My only contact with Chemistry over the years has been through ConfChem (Thanks, Rebelford!)

Bob Belford's picture

Hi Milind,

I have thought a lot about what you are asking.  I am not sure "reading" an InChI is of value, but using InChI in digital labs to explore chemical structure, and the nature of "what is a molecule", could result in some very high level learning.  I am not an organic chemist, but I know that in this time of COVID19 many organic labs had to run at half capacity (split over two weeks to maintain social distancing) and it seems that an off-week remote digital lab could use cheminformatics to advance organic.  If you go to the InChI OER, and do a search of "Organic Chemistry" and "MS-Word", you get two word documents that you can download, and that may give someone some ideas.

I think the deepest learning comes with tackling the issues of representing molecules on computers, and the ongoing projects of the InChI Trust would be a great place to advance learning and develop valuable dry labs. 

Something we did not approach in the article was the extensible nature of InChI, that is, you can create your own layers to capture some form of knowledge that is associated with the structure of a molecule.  I mean something as simple as water is H2O seems pretty straight forward, right? …. and then I ask my students if they think the molar mass of water in the Arkansas river changes from the headwater in the Rockies to Little Rock?  They always say no, it is a constant, and are amazed when I show them that it does.  Well, water is a mixture of isotopomers, and the ratio varies by altitude.  So in principle, students could be asked, what kind of information would they need to capture the difference in molar mass of water as it flows from the Rockies to the sea? That is, what kind of information would they need to represent to make their own layer? (I am not saying to make a layer, but what is the information that would be needed?)

My point is that to capture of the essence of "what is a molecule" on a computer we really do have to do a deep dive into what do we mean by a "molecule", and so lots of opportunities at fundamental levels of understanding could arise.

Hi Bob and Milind, 

I agree with Bob that reading an InChI may not be super important for a new chemist, it was designed to be computer and not human readable. I do teach organic chemistry, and this semester I developed a couple labs where students used ChemDoodle 2D and 3D for drawing and visualizing molecules. Then I showed them the interface for reaching out to PubChem and Chemspider had issues especially when searching on InChI, so I then wrote a lab about using PubChem and Chemspider webpages directly. Students were able to search with InChI, but from a readability perspective, they preferred SMILES, but then I showed them the issues that SMILES has with not being canonical across databases. 

I have been playing with optical structure recognition software and one of the benefits that I have found with the InChIKey is that the multiple layers are easy to compare stereoisomers. The first 14 characters is connectivity, the next 10 characters contain other information including stereochemistry. I was using this to determine if students drew the correct stereoisomer of a molecule. It is very easy to identify if the connectivity is correct and award partial credit for molecule, and then full credit if the InChIKey matched the next 10 characters. So this is kind of an extension that I use for InChIKey and my databases but not really an extension of the InChI itself.

I am also interested in the canonical numbering of atoms in the molecules. In the next update of the OLCC I hope to include numbering algorithms (Morgan). It helps students to see that there are problems with comparing mol files with canonicalization. If anything my approach to this is that much of the computer data that is stored can look haphazard and makes it less human readable. While the goal is to create systems that read data for us, it is humans that need to verify the data anyway. It seems hard to do if we have a difficult time parsing the data ourselves. Maybe that is why students prefer reading SMILES to InChI even though there are inherent problems with SMILES.



Milind Khadilkar's picture

Thanks, Bob, Ehren,
I speak from the Pulpit of High Ignorance.

I feel that while "human readability" was not THE primary design goal of InChI, an InChI string is human readable (as against InChIKey, with its pros and cons) and it seems to be one of its strengths. At least our software team learned a lot browsing through it and the Chemistry team too had their own "Aha!" moments.
In my earlier post I probably did not word my thoughts clearly. While reading an InChI string has its uses, I meant to refer to the layered nature of InChI itself, not the layers of the InChI string of a specific compound. The fact that this layering structure was chosen (over competing proposals, I guess) implies that it represents the collective wisdom of many eminent chemists. However this layering is not reflected ( could be my ignorance...) in the pedagogy of undergraduate Organic Chemistry text books and curricula.
This layering can be considered as giving rise to an alternate (possibly impractical, possibly non-illuminating, possibly misleading) classification of organic compounds, which could provide an additional insight to students, which I thought could be as beneficial as, say, the insight given by judicious use of polar coordinates in some geometry problems.

I found, for example, that those who knew the layered structure of InChI got a bit of a different understanding when they were later introduced to functional groups. But I just taught a bunch of students two years out of high school and that was certainly not a proper sample, certainly not proper educational research. Over the past two years I have had an exposure to the methodology of educational research, which back then I had no knowledge of.

I am not a Chemistry teacher, nor do I know Chemistry well. However, in those days of my close association with Chemistry (it was delightful!) I strongly felt that the alternate insight offered by InChI layers was valuable. I certainly could have been wrong then, and of course, now.

To summarize, I was wondering whether an appreciation of the layered structure underlying InChI would result in a better adoption of organic chemistry by undergraduate students.

I also wonder why the canonical numbering by itself is not widely used in Chemistry. It could make some discussions and explanations more succinct and unambiguous.

Thanks again!

Hi Milind,

Thanks for your comments. The whole idea behind the InChI OER is to allow people to discover and share material related to InChI. If you, or anyone else, has material you would like to share, I would be glad to try and help you clean it up and post it to the OER.  Several of the members of the task group are organic chemists and I would think they would be glad to provide feedback. The only "requirement" is that it be OER, and people be allowed to reuse and modify as they need (you can require attribution).

Please contact me in private if you would like to contribute to the OER (, and I extend this invitation to anyone who has educational material they would like to contribute, even after the discussion period of this paper. But that is the whole idea of this paper, it is not really about InChI, but the InChI OER, a place where educators can share and discover uses and applications of InChI.


Test with divched listed