The primary reason for the size was the amount of memory. The hardware address space was 64kB. It's a DOS so there were full set of system calls for file and directory handling.
Apple DOS 3.3 usually lived in $9600-$BFFF, which was the top of the semi-standard 48K RAM Apple II. So the whole thing was 10,752 bytes. That included a LUT to assist with disk nibble-byte conversions, I/O routines to read, write, and enumerate files and text streams, and a disk initialization routine.
The cold boot code lived in 256 bytes of ROM memory-mapped to $C600-$C6FF on the Slot 6 disk controller. It generated another copy of the LUT, read the first few sectors of the disk to a temporary RAM location, and jumped to it.
Every disk had to be at least partially readable by this 256-byte bootstrap code. This was why every disk-based copy protection scheme on the Apple II was doomed; disassemble what it read, and learn the scheme's secrets.
The original DOS was slow because it did an extra buffer copy while processing 256-byte sectors. Lots of people produced aftermarket fast DOS versions that removed the large initialization routine and replaced it with smarter in-place reading/writing code. This allowed the system to be ready to read or write the next interleaved sector on a track, rather than having to wait for the disk to complete a full rotation. I remember Diversi-DOS and DavidDOS. The video game Sheila included a fast DOS as well; my high school friends and I extracted it from the game and used it as our "daily driver" DOS.
Various routines in DOS 3.3 became de facto APIs. So you'd see random programs JSRing to $Axxx to do something with the disk. You could also send text commands to DOS 3.3 through this brilliant pair of hooks that let it intercept characters to/from the screen. I think the print routine was $FDF0 or thereabouts. Crucially, Applesoft BASIC used those two hooks, so BASIC was DOS-integrated that way.