NLP and Text Mining
Are you working on an NLP project where you’re mining the text from articles to which the university is subscribing as an institution? Talk to me about it. There are very restrictive policies that MANY publishers have put into their contracts to discourage text mining.
****Warning – The next paragraphs are going to be me getting on my soapbox (again)****
Personally, this information hoarding is an unfair practice in my opinion and restricts the advancement of science. I am certain that the author’s of these works never intended that their research findings should not be propagated. And while I realize that we live and thrive in a capitalistic society, it gives me little comfort that we are being asked to pay, pay, pay, pay…especially for research that has mostly been funded by our own tax dollars. We have every right to be able to access information readily. This is the fundamental cause as to why I am a librarian. Everybody has a right to information. It is the singly most valuable commodity in the world. It has the ability to empower anybody.
The NIH Public Access Policy is a nice step toward information nirvana, but we still have a long way to go. It’s not reasonable to expect that a researcher can easily cull through 18 million articles even with the most advanced search engines available. Human indexing and algorithms on abstracts alone cannot achieve the precision that we seek. A computational solution against the full text is not only reasonable but necessary.
Librarians struggle with publishers all of the time. Libraries pay soooooooooo much money for access. How can we possibly be expected to pay even more for information that we have already paid for? There is a limit to how much fruit the money tree can bear. There is a limit to how much librarians can bear.
Come on publishers. Do the right thing here. Doesn’t the public deserve it?