Hacker Newsnew | past | comments | ask | show | jobs | submit | eneuman's commentslogin

You’re more than welcome! I really appreciate the kind words.

If you have any ideas for improvements, missing features, or run into any issues, don't hesitate to share!


Thank you for the input! To be honest, I don’t use Dask often, and as a regular Pandas user, I don’t feel the most qualified to comment—but here we go.

Can this be merged into Pandas?

I’d be honored if something I built got incorporated into Pandas! That said, keeping aiopandas as a standalone package has the advantage of working with older Pandas versions, which is useful for workflows where upgrading isn’t feasible. I also can’t speak to the downstream implications of adding this directly into Pandas.

Pandas does not install tqdm by default.

That makes sense, and aiopandas doesn’t require tqdm either. You can pass any class with __init__, update, and close methods as the tqdm argument, and it will work the same. Keeping dependencies minimal helps avoid unnecessary breakage.

What about Dask?

I’m not a regular Dask user, so I can’t comment much on its internals. Dask already supports async coroutines (Dask Async API), but for simple async API calls or LLM requests, aiopandas is meant to be a lightweight extension of Pandas rather than a full-scale parallelization framework. If you’re already using Dask, it probably covers most of what you need, but if you’re just looking to add async support to Pandas without additional complexity, aiopandas might be a more lightweight option.


Fair benchmarks would justify merging aiopandas into pandas. Benchmark grid axes: aiopandas, dtype_backend="pyarrow", dask-cudf

pandas pyarrow docs: https://pandas.pydata.org/docs/dev/user_guide/pyarrow.html

/? async pyarrow: https://www.google.com/search?q=async+pyarrow

/? repo:apache/arrow async language:Python : https://github.com/search?q=repo%3Aapache%2Farrow+async+lang... :

test_flight_async.py https://github.com/apache/arrow/blob/main/python/pyarrow/tes...

pyarrow/src/arrow/python/async.h: https://github.com/apache/arrow/blob/main/python/pyarrow/src... : "Bind a Python callback to an arrow::Future."

--

dask-cudf: https://docs.rapids.ai/api/dask-cudf/stable/ :

> Neither Dask cuDF nor Dask DataFrame provide support for multi-GPU or multi-node execution on their own. You must also deploy a dask.distributed cluster to leverage multiple GPUs. We strongly recommend using Dask-CUDA to simplify the setup of the cluster, taking advantage of all features of the GPU and networking hardware.

cudf.pandas > FAQ > "When should I use cudf.pandas vs using the cuDF library directly?" https://docs.rapids.ai/api/cudf/stable/cudf_pandas/faq/#when... :

> cuDF implements a subset of the pandas API, while cudf.pandas will fall back automatically to pandas as needed.

> Can I use cudf.pandas with Dask or PySpark?

> [Not at this time, though you can change the dask df to e.g. cudf, which does not implement the full pandas dataframe API]

--

dask.distributed docs > Asynchronous Operation; re Tornado or asyncio: https://distributed.dask.org/en/latest/asynchronous.html#asy...

--

tqdm.dask, tqdm.notebook: https://github.com/tqdm/tqdm#ipythonjupyter-integration

  from tqdm.notebook import trange, tqdm
  for n in trange(10):
      time.sleep(1)
--

But then TPUs instead of or in addition to async GPUs;

TensorFlow TPU docs: https://www.tensorflow.org/guide/tpu


Thanks! I originally built this to scratch an itch I had, so I’m really glad you find it useful too. If you have any ideas for improvements or missing features, feel free to suggest them — or even open a PR!


To get the ball rolling, https://careers.reef.pl/ tests Senior Python developers upfront.


You can also search for other similar Show HN posts offering to make it easier to search through Who's hiring: https://payperrun.com/%3E/search?displayParams={%22q%22:%22S...

(There are quite a few, you might want to filter by date!)


Hey everyone, I just made this thread easier to search through here:

https://payperrun.com/%3E/search?displayParams={%22q%22:%22D...

It uses LLM embeddings to sort posts by semantic proximity, but you can also filter out posts with [case-insensitive] comma separated values (click on the filter button and add, for example, "US-Only, On-Site" to the "not contains" input).

It's pretty crude but I hope it helps!

(I'll set up an update job tomorrow morning)


This question has been asked a few times, so lots of interesting comments to read!

- Ask HN: What do you wish you had known before you turned 40? https://news.ycombinator.com/item?id=9092246

- Ask HN: What do you wish you had done/known in your 30s? https://news.ycombinator.com/item?id=30782994

- Ask HN: What's your best advice for someone turning 30 today? https://news.ycombinator.com/item?id=26068320

- Ask HN: How would you wish you had invested your money if you were 30 again? https://news.ycombinator.com/item?id=13179385

- Ask HN: What's Your Biggest Regret? https://news.ycombinator.com/item?id=33118584

[Shameless plug: I found all these on my llm-embedding based search engine I launched today: https://payperrun.com/%3E/search?displayParams={%22q%22:%22A...

It's much better than HN's default search: https://hn.algolia.com/?q=Ask+HN%3A+What+do+you+regret+doing... ]


You've probably already seen, levelsio's tweet: https://twitter.com/levelsio/status/1457315274466594817

I also like one of his old posts: https://levels.io/12-startups-12-months/

Yeah, building something people want (and are willing to pay) is pretty hard, and trying to sell early/often is a good way to reduce your market risk. I'm more on the "build something I want", and as soon as I have something "I like", iterate on the communication/go-to-market, more than the product. I think this is more aligned with founders like Brian Chesky that launched (and failed) multiple times, but kept on going because they really believed in their idea (though I'm still on the failing part).

There are many Ask HN posts about launching which you might find useful: https://payperrun.com/%3E/search?displayParams={%22q%22:%22A...


(Btw, the Ask HN search I shared is part of the search service I launched earlier today. I think the results are better than HN's default search: https://hn.algolia.com/?q=Ask+HN%3A+how+to+launch%3F)


Yup! I’ve seen them both. It really is a grind to get to the point where something actually sticks. The biggest skill I think I need is just shipping really fast


There have been a few attempts at a crowdsourced-rank search engine (which is similar to what you're suggesting - people indexing the content), but it seems to be a hard cookie, most of the examples of similar ideas I could find on ProductHunt or ShowHN seem dead:

https://payperrun.com/%3E/search?displayParams={%22q%22:%22c...

(btw, I just launched this llm-embedding based search service that lets you check if a startup idea has already been tried/failed).

I don't know if this idea has a higher death rate than the baseline, but my guess is Google/PageRank is good enough for most use-cases, and then if you want quality sources, you can just follow them on YouTube, Twitter, Instagram, etc. Wait, maybe I shouldn't try to compete with Google?


Thank you for the feedback!

If there are any other domains you'd like to see, let me know :)


I'm thinking Crunchbase would give you a comprehensive view into Silicon Valley (and adjacent) company data, but the data is behind an API.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: