How much can you quote from a crawled document? Can you republish the entire crawl? What can you do under "fair use" of copyrighted material and what can't you do? Can you articulate a solid defense of your publication that it truly contains only pure factual information? Will BigCo dislike having its name associated with the study but can you protect yourself by limiting your publication to "nominative use" of its trademarks? What is the practical risk of someone raising a stink if the legality of your usage is ambiguous? Who actually holds copyright on the crawled documents?
You have a lot of rights and you can do a lot. Understanding those rights and where they end lets you do more, and with confidence.
So I think I just was being unimaginative on "scraping"; I wouldn't have thought to save quotes/prose, just things like word counts, processed results (sentiment analysis), pricing, etc. In which case most of that shouldn't come up, but yes I can see where other options are less simple.
You have a lot of rights and you can do a lot. Understanding those rights and where they end lets you do more, and with confidence.