Choose fontsize:
small
normal
big
large
Welcome,
Guest
. Please
login
or
register
.
Did you miss your
activation email?
1 Hour
1 Day
1 Week
1 Month
Forever
Home
Help
Search
Login
Register
News
BookLamp.org is open, and we need your feedback!
BookLamp Links:
Return to BookLamp
Member's Home
Forum Home
BookLamp Forum
>
General Category
>
General Reactions to BookLamp's Approach
>
Making the scanning of books a community effort
Pages: [
1
]
« previous
next »
Print
Author
Topic: Making the scanning of books a community effort (Read 3185 times)
0 Members and 1 Guest are viewing this topic.
Rob Farnell
Apprentice
Posts: 1
Making the scanning of books a community effort
«
on:
March 18, 2008, 04:33:17 am »
Obviously there is the copyright question mark bit about this, but lets just forget about that for a second.
I read on the Amazon.com post in this forum that you feel it may take a year to scan enough books to build a sensible database. Would it not be possible to have some sort of request and provide list so that people could choose to submit books people think others may like or provide books that people want to compare. I can think of a number of books that I want to compare but you or others may not consider important and this would be a way to do it.
Rob
Logged
lora ..
Apprentice
Posts: 1
Re: Making the scanning of books a community effort
«
Reply #1 on:
March 18, 2008, 07:26:58 am »
I regularly scan books I have to bring back to the library, if I'm not able to buy them somewhere (or if they are too expensive), and I know of at least one other person who does that as well... I think I'm probably not alone in this, and I would definitely be willing to share my scans for this purpose.
The books I scan are almost all non-fiction though, don't know if there are people scanning fiction and/or if booklamp will be limited to fiction? But if you can get a lot of people to scan a book every once in a while you can probably build a pretty decent database from it!
Logged
Rick Weller
Apprentice
Posts: 1
Re: Making the scanning of books a community effort
«
Reply #2 on:
March 18, 2008, 11:14:24 am »
Or how about develop a utility that can be run on a home user's PC that would allow them to scan whatever books they own and then have that utility dump out the graph information and upload it to your website?
That way no copyright issues as a person is only scanning the contents of the book to create a graph which is uploaded to your database.
Logged
Jill ONeill
Apprentice
Posts: 1
Re: Making the scanning of books a community effort
«
Reply #3 on:
March 18, 2008, 12:55:08 pm »
Not to be difficult, but there are definite legal implications about scanning a text if you do not hold the copyright to that text. Publishers get *very* uneasy about that kind of activity, even if it is intended solely for private use.
I have no idea how the parties behind this site have addressed that aspect, but would hope they've spoken to someone who can identify the potential legal pitfalls.
That said, is there anyway that you might be able to partner with a library that is already participating in the scanning of texts for either Google Books or the Open Content Alliance? The libraries would have tremendous interest in your service and approaching an entity such as Columbia University (which has both a publishing arm as well as a library) might be a good beginning.
Logged
Aaron Stanton
Project Manager
Core Team
Posts: 281
Re: Making the scanning of books a community effort
«
Reply #4 on:
March 18, 2008, 02:08:50 pm »
It's impossible to discuss this issue without talking about copyright - for one, I think it would be a very bad idea to ask users to scan and then submit copyrighted information. I think that's certainly begging for trouble.
The system itself uses the full-text of the book to create the equivalent of a digital review, and then separates the review from the full text. No copyrighted material is ever published, and short of creating difficulty if we decide to update our tracked metrics, there's no reason to retain the text of a book after review. Once the review is complete (graphs generated, etc), the material that is housed on the server is entirely the generated review content and no copyrighted material at all. Even if the server were somehow hacked, there would be no possibility of copyrighted material being distributed.
The issue of copyright is very sensitive in no small part because of publisher's complaints with Google Book Search. Part of BookLamp's strength is that it has the potential to make a full text database useful to both users and publishers without ever needing to publish any amount of copyrighted text.
I think that protecting the actual copyrighted data of a book to be one of our highest priorities, hence the concern about . Having users scan and submit data creates a series of issues. There are different approaches that could be considered, but they'd have to be considered carefully, and done right.
Aaron
Logged
Ricardo Sanchez
Apprentice
Posts: 3
Re: Making the scanning of books a community effort
«
Reply #5 on:
March 18, 2008, 11:34:09 pm »
I'm a forum newbie, so hopefully this hasn't already been suggested.
Obviously publishers will be very concerned about uploading text or scanned pictures of pages to a website for analysis. My suggestion is this: what if the actual analysis did not occur on your end but was instead moved to the users' side? I assume you have some sort of utility that you use to convert scanned pages into book data (graphs, summaries, etc), so what if you made that utility downloadable? A user could scan a book, run your utility on it, then upload the resultant file for integration into your database. Thus, copyrighted material stays on the user's computer and only the metrics are uploaded to your system.
Admittedly, this will shift the burden to the users of your system, but with a consistent motivated user base, this shouldn't be a problem (I'd love to punch in a Timothy Zahn book or two). In effect, you will be making the data acquisition open source while keeping any algorithms and databases proprietary.
Logged
Therin
Master
Posts: 117
Re: Making the scanning of books a community effort
«
Reply #6 on:
March 19, 2008, 12:14:38 am »
Hey Ricardo,
I'm not a techie, but I think the problem with the idea as you suggest it is that if they ever want to adjust the utility, even just fine tune it, without the original text to reapply the algorithms to all the books would need to be scanned again.
Plus it opens the possibility of individual people facing copyright breaches for having digital versions of the books on their computers.
Logged
Face it, we all want to change the world
Randy McGirr
Apprentice
Posts: 5
Re: Making the scanning of books a community effort
«
Reply #7 on:
March 19, 2008, 09:26:43 am »
Would it be possible to make it easy for publishers to send only the results of a book analysis to booklamp? Why not make it easy for publishers to run a PDF for one of their books through an analysis program? This would get around the copyright issue and it could literally take less than five minutes for a publisher to run the analysis for a book and send it off to booklamp... thus ensuring a little more publicity for the book. Also if the publishers cooperated in this manner, it would save someone the bother of scanning a book... which I imagine to be a very time-consuming process!
Logged
Stephen Rollins
Perfect Master
Posts: 281
Re: Making the scanning of books a community effort
«
Reply #8 on:
March 19, 2008, 05:02:48 pm »
I think that having the publishers send in the results would certainly be highly convenient, not to mention easy on copyrights, but it might make it possible for someone to skew the results before they get added to the BookLamp database. I don't know why you'd want to do that, but hey.
Really, though, if you're going to be able to submit a paper or something you've written up yourself, there's no reason that the same sort of thing can't be done with books. Maybe the results could be sent directly to BookLamp as the book is being evaluated? Then there'd be no data-skewing issues.
Logged
Be kinder than necessary, because everyone you meet is fighting some kind of battle.
Ricardo Sanchez
Apprentice
Posts: 3
Re: Making the scanning of books a community effort
«
Reply #9 on:
March 19, 2008, 11:30:47 pm »
I agree that the best case scenario would be that book data come straight from the publishers, but unfortunately I don't see that happening unless this site becomes extremely well populated with potential customers. Even if this occurs, I would think that a publisher would only be really motivated to scan in its best sellers and leave its lesser known authors behind, which unfortunately would defeat the entire purpose of a book suggestion system.
As far as publishers attempting to "skew" results, I doubt that this would be a serious problem because a) people like different styles of book (if a publisher pushed their "dialog" results up they would be alienating readers who like more action-packed books), and b) as the BookLamp video demonstrates, a person who has read the book can glance at the graphs and identify points of change in the book (such as the Jurassic Park example). If the data is skewed, readers will notice and can warn other users that this publisher might be less than honest. This would be bad publicity for a book publisher.
Therin, unfortunately I can't really address the possibility of individual copyright infringement lawsuits, since I'm not a lawyer. However, in regards to the downloadable utility idea, I would think that the data gathered by the tool is static in relation to the book. In other words, the tool records things that do not change, such as number of scenes, verb density, and dialog density. Unless BookLamp decided to add a whole new metric, any fine tuning to the algorithm should be able to be applied to the existing data without needing to rescan the book.
This discussion is difficult since we do not know precisely what sort of data is recorded, and hence don't know how "scalable" it is.
Logged
Aaron Stanton
Project Manager
Core Team
Posts: 281
Re: Making the scanning of books a community effort
«
Reply #10 on:
March 25, 2008, 01:13:04 pm »
As for the copyright discussion, there's an item in the FAQ about copyright, so I'll just point you to that:
http://beta.booklamp.org/forum/faq.html
As for the possibility of distributing a utility to publishers or readers to transmit the raw date instead of the copyrighted work, we've discussed this possibility from the beginning. There are a number of challenges, but none impossible to overcome. My main concern would be ever asking someone to do something that is questionable in terms of copyright; I can see a number of substantial issues that could arise from such a thing. I think the idea of building a system for the publisher to upload data to the system is far more appealing, and could logically be a benefit to both readers and publishers.
From a technical standpoint, it would present problems if we decided to introduce a new metric, though not necessarily a new formula. Many of the formulas could be recalculated without needing a publisher to resubmit a book. Still, that would mean that any substantial new features that were required something we didn't account for at the beginning - of which there have been several over the development of the project - would lessen the usefulness of old data.
As for tweaking and skewing results - the formulas are detailed enough that I think it would be difficult to deliberately skew a book, even if you knew exactly what the formulas were and how they were weighted.
Aaron
Logged
pmslewis
Apprentice
Posts: 2
Re: Making the scanning of books a community effort
«
Reply #11 on:
April 20, 2008, 05:32:02 pm »
Since you've started by focusing on a genre you're passionate and knowledgeable about -- science fiction -- perhaps you could approach one of the science fiction publishing houses to partner with you on a proof of concept. I know you will want to branch out into other genres eventually, but you have a lot of opportunities while you're small to find partnerships that will help you refine your concept while you are adding content. BantamDoubledayDell can wait.
Logged
Pages: [
1
]
Print
« previous
next »
Jump to:
Please select a destination:
-----------------------------
General Category
-----------------------------
=> General Discussion
-----------------------------
Where should this project go from here?
-----------------------------
=> Where should this project go from here?
-----------------------------
General Category
-----------------------------
=> General Reactions to BookLamp's Approach
=> CanGoogleHearMe.com Discussion
-----------------------------
Where should this project go from here?
-----------------------------
=> BookLamp.org - Publisher's Project Discussion
-----------------------------
General Category
-----------------------------
=> Site Updates
Loading...