So using their numbers (max 36 'units', 15 qps per unit, 25GB) that is a max search corpus of 540M documents or 900GB, 540qps, $4,500 (approx) per month which is a discount of 50% so $9K/month when full up). Does anyone know if the 36 host limit arises out of the requirement that all units are in the same rack?
I'm a Program Manager on the Azure Search team. I am going to correct your numbers a bit. Even though you can have a maximum number of 36 search units, the number of partitions you can create (currently) is 12. Partitions, by the way is what you increase to allow you to increase the number of documents. With this limit of 12 partitions, the maximum size of an index is actually 180M documents or 300 GB (not 900 GB as you stated). So far, we have found that the vast majority of customers we have been working with fit well below these limits and in fact even more of the majority fit into the 1 partition (15 M document / 25GB) range.
For a very few customers we have talked to, there is a need for more than this and for this we can actually allocate a much larger system that has much higher ranges. We have an azuresearch_contact email address on the pricing page (http://azure.microsoft.com/en-us/pricing/details/search/) with more details if you need this.
To your other question about racks and search units. You can think of a search unit as a dedicated Azure VM for your usage. For each additional Search unit you create is an additional VM for your use. Each VM has a certain amount of capacity that it can handle. If your needs grow beyond what you can get with a single search unit, you can move the dial up to increase it whether it is increasing replica count to add more QPS / High Availability or increasing partitions to add more documents / faster data ingestion. The way you calculate the number of search units you have is replicas x partitions, where each search unit (during public preview) is $125 US / month. By the way a single replica can handle about 15 QPS which for most customers is more than enough. But even with this, the ability to scale up and down is pretty important to a lot of people. Imagine Black Friday in the US where a retailer gets hammered with searches, yet only wants to allocate increased replicas for that day to handle the increased query load. There is a bit more information on this here: http://azure.microsoft.com/en-us/documentation/articles/sear...
It does help Liam, thanks. I'm coming at this from a web search perspective. Checking our crawler we have about 16M documents from Wikipedia indexed, which would presumably fit inside your single partition. The 'hot' crawl (things that change with a frequency <= 7 days) is a lot bigger than that though :-)
I'm guessing your target market is folks that want to corral their documents? (sort of like the Google appliance but in the cloud?) What is your privacy policy on that? (lawyers for example have a lot of documents but rarely put them in the cloud for example) And when you say 15 qps what is the SLA? I that at the 50th percentile? 95th? 99th? I've noticed it seems to be hard to pin down in Elastic Search.
ChuckMcM, you are absolutely right. Nailing down QPS rates are an incredibly tough thing. Not just for Azure Search but also for most Search engines that I am aware of. Things like #'s of facets, complexity of queries all play a part in what a search engine can serve up from a QPS rate. When we say ~15QPS we try to point out that this is based on an average index of the ones that we have seen from our typical customers. Certainly some customers may see way more QPS on a single search unit and others will see less.
The main markets (or scenarios) that we target with Azure Search are eCommerce Retail, User Generated Content sites (such as a recipe site or Hacker News) and internal organizational apps. The interesting thing about internal organizational apps is that we are seeing more an more users are finding that search is a natural way to navigate and explore their data. Users are typically far more knowledgeable of using search to explore their data thanks to engines like Google and Bing then then are with say SQL.
We actually don't have an official SLA yet for this preview. That is one of the goals of this public preview which is to really determine what we can realistically promise for our v1 release.
Yes, privacy is a thing for sure. It is interesting that you say lawyers because we have had a number of companies in the law field that have wanted to use Azure Search. Things like indexing of case documents is quite popular from what I have seen. In many of these examples (and also with Helathcare especially), privacy or more specifically encryption at rest as well as compliance (such as HIPPA) often become critical. As of today we don't have either. We don't have encryption at rest and we do not have HIPPA compliance for Azure Search. Of course, this will be a goal and I guess we need to start somewhere. The encryption as it relates to search is actually going to be a really hard thing to do properly so that will be an interesting thing for the future.
By the way, WikiPedia is one of the datasets we often test with our service. Feel free to ping me as we have a loader for the WikiPedia dataset that I could look into sharing with you if would you like to play with it and Azure Search. My email address is my YCombinator username + microsoft.com.
The configurable analysis is the cornerstone of the search functionality and is a huge portion of Apache Lucene (and therefore both Solr and ElasticSearch).
So, in my eyes, this offering has not outgrown the pure web-search domain yet.
(edit) Which is strange, because in another comment they do say they use ElasticSeach under the covers. I even thought that the API interface looked somewhat similar to ES.
The purpose of starting with this "Simple query syntax" was to try to keep things as simple and straightforward as possible for both the developer and the users of search. We have only exposed the query syntax that we have found that customers we have been working with so far have needed. I am sure there will be more, and as you say, since the core is ElasticSearch, if the demand for things such as configurable analysis is there, we can certainly look to expose it in our API. For things like this it would be great if you could cast a vote in our UserVoice page (http://feedback.azure.com/forums/263029-azure-search). By the way, this is a great place to go to see the feedback and suggestions from customers we have been working with so far.
My main problem with this is that a standard instance is $125/mo for anything beyond the free limits (10k documents, 50mB). It would be great to see pricing that followed, say, Azure Websites or SQL pricing... $20-40/mo for smaller instances (smaller by either search volume or index size).
I mean, after all, if SQL Azure supported Full Text Indexes, this wouldn't be critical either.
It is agreed that the jump from $0 to $125 (preview pricing) is a large jump and I can say we are definitely considering something in the middle ground. I am curious, what types of features would you be willing to give up for a lower price point? For example, one option might be for us to consider using smaller VM's, but that would greatly reduce the document count (perhaps 1M docs max), QPM's (perhaps max 1 QPS) and/or possibly limit the ability to support for high availability.
Do these sound like reasonable things to give up for this lower price point? Do you have alternate ideas?
From my perspective, I think limiting QPM is better than limiting the number of documents that you can index. In my current case, I have millions of "documents" (in a Lucene sense of documents) but relatively low usage. Obviously, my goal long-term is to increase the usage. So being able to pay to index a lot of documents but limit the resources (i.e., VM size, but not storage size) to search would be the best way to scale.
In theory, the more users I have the more $ I have to scale the search.
The solution I'm currently using (mostly because SQL Azure doesn't support Full-Text Search) is Lucene.NET (which is still on a very old version but supposedly 4.8 is coming) and AzureDirectory (which leverages Blob storage). It's clunky, but it works... at least for the scale I use it at currently. I would love to be able to use Azure Search and scale it up again just like with all my other services.
This comes out at a time when I've finally decided to start digging into ElasticSearch. Since it's a side project, I don't have to worry about scaling and management issues. So just from the standpoint of functionalities, is there any advantage of this Search as a Service over an ElasticSearch cluster on Azure's Virtual Machines?
I am a Program Manager for Azure Search, so you will have to take this response as perhaps being a little ones-sided. I think you will find that getting elastic search up and running in Azure to be very easy. As to your point about not worrying about scaling or management issues using ElaticSearch on Azure VM's, I don't agree with this. There are so many things that you will still need to be concerned about. For example, Azure will periodically do VM updates and patches which will cause your VM's to occasionally be restarted. This of course would affect your search availability which means if you can not permit downtime, you need to think about replicas across VM's and how to manage availability groups so that the patches are applied in a way that you can avoid downtime. Then you have to think about how you are going to handle partitions and how you will shard your index across machines in the case where you need more than one VM to accommodate the amount of data you have in your index. All of these things and parts that we in Azure Search take on for you and make easy for you to scale up/down whether you need more replicas for higher QPS or higher availability or more partitions to allow for greater numbers of documents or faster data ingestion. There is actually quite a lot more to managing a search service (even in a cloud hosted IaaS environment), but hopefully this gives you a few things to think about.
A colleague of mine also experimented with ElasticSearch on Azure. His server got hacked and was shut down by Microsoft after they discovered it generated large amounts of traffic and participated in DOS attacks. (there was a vulnerabiliy, and many ElasticSearch servers were hacked a few months back)
So I'd argue that using Azure Search Service, being managed, will free you from worries of having to manage and update yet another technology.
I am a Program Manager for the Azure Search team. I am glad to hear you think our service looks promising. You are right about searching across indexes. This is something we heard often from a number of customers we worked with before today's annoucement. There are often ways of working around this, for example by merging content into a single index, but this is obviously not workable for everyone, so you are also correct, that this is really just a matter of time.
I'm wondering what the target market for that is?