32 Optimization
David Douard edited this page 2023-01-25 15:16:55 +01:00

Here are the strategies or best ways to optimize SeaweedFS.

Use LevelDB

When starting volume server, you can specify the index type. By default it is using memory. This is fast when volume server is started, but the start up time can be long in order to be loaded file indexes into memory.

weed volume -index=leveldb can change to leveldb. It is much faster to start up a volume server, at the cost of a little bit slower when accessing the files. Compared to network speed, the extra cost is not that much in most cases.

Note: there are 3 flavors of leveldb indexes for the volume server:

  • leveldb: small memory footprint (4MB total, 1 write buffer, 2 block buffers)
  • leveldbMedium: medium memory footprint (8MB total, 2 write buffers, 4 block buffers)
  • leveldbLarge: large memory footprint (12MB total, 4 write buffers, 8 block buffers)

Preallocate volume file disk spaces

In some Linux file system, e.g., XFS, ext4, Btrfs, etc, SeaweedFS can optionally allocate disk space for the volume files. This ensures file data is on contiguous blocks, which can improve performance when files are large and may cover multiple extents.

To enable disk space preallcation, start the master with these options on a Linux OS with a supporting file system.

  -volumePreallocate
    	Preallocate disk space for volumes.
  -volumeSizeLimitMB uint
    	Master stops directing writes to oversized volumes. (default 30000)

Increase concurrent writes

By default, SeaweedFS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated.

If you want to distribute writes to more volumes, you can do so by instructing SeaweedFS master via this URL.

curl http://localhost:9333/vol/grow?count=12&replication=001

This will assign 12 volumes with 001 replication. Since 001 replication means 2 copies for the same data, this will actually consumes 24 physical volumes.

The volume can be pre-created with other parameters:

curl http://localhost:9333/vol/grow?count=12&collection=benchmark
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1&rack=rack1
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1&rack=rack1&dataNode=node1

Another way to change the volume growth strategy is to use master.toml generated by weed scaffold -config=master. Adjust the following section:

[master.volume_growth]
copy_1 = 7                # create 1 x 7 = 7 actual volumes
copy_2 = 6                # create 2 x 6 = 12 actual volumes
copy_3 = 3                # create 3 x 3 = 9 actual volumes
copy_other = 1            # create n x 1 = n actual volumes

Increase concurrent reads

Same as above, more volumes will increase read concurrency.

In addition, increasing the replication will also help. Having the same data stored on multiple servers will surely increase read concurrency.

Add more hard drives

More hard drives will give you better write/read throughput.

Increase user open file limit

The SeaweedFS usually only open a few actual disk files. But the network file requests may exceed the default limit, usually default to 1024. For production, you will need root permission to increase the limit to something higher, e.g., "ulimit -n 10240".

Memory consumption

For each volume server, there are 2 things impacts the memory requirements.

By default, the volume server uses in memory index to achieve O(1) disk read. Roughly about 20 bytes is needed for each file. If one 30GB volume has 1 million files of averaged 30KB, the volume can cost 20MB memory to hold the index. You can also use leveldb index to reduce memory consumption and speed up startup time. To use it, "weed server -volume.index=[memory|leveldb|leveldbMedium|leveldbLarge]", or "weed volume -index=[memory|leveldb|leveldbMedium|leveldbLarge]".

Another aspect is the total file size that are read at the same time. If serving 100KB-sized file for 1000 concurrent reads, 100MB memory is needed.

For large files uploaded through filer, auto-chunking would be applied, the default value is weed filer -maxMB=32. So if your system is going to support 1000 concurrent reads, 1000 x 32MB memory is needed.

Insert with your own keys

The file id generation is actually pretty trivial and you could use your own way to generate the file keys.

A file key has 3 parts:

  • volume id: a volume with free spaces
  • needle id: a monotonously increasing and unique number
  • file cookie: a random number, you can customize it in whichever way you want

You can directly ask master server to assign a file key, and replace the needle id part with your own unique id, e.g., user id.

Also you can get each volume's free space from the server status.

curl "http://localhost:9333/dir/status?pretty=y"

Once you are sure about the volume free spaces, you can use your own file ids. Just need to ensure the file key format is compatible.

The assigned file cookie can also be customized, as long as it is 32-bit.

Volume Server Efficiency

"strict monotonously increasing" needle id is not necessary, but keeping needle id in a "mostly" increasing order helps to keep the in memory data structure efficient.

If the needle id has to be random, you can use leveldb as the index when starting volume server.

Upload large files

If files are large and network is slow, the server will take time to read the file. Please increase the "-idleTimeout=30" limit setting for volume server. It cut off the connection if uploading takes a longer time than the limit.

Upload large files with Auto Split/Merge

If the file is large, it's better to upload this way:

weed upload -maxMB=64 the_file_name

This will split the file into data chunks of 64MB each, and upload them separately. The file ids of all the data chunks are saved into an additional meta chunk. The meta chunk's file id are returned.

When downloading the file, just

weed download the_meta_chunk_file_id

The meta chunk has the list of file ids, with each file id on each line. So if you want to process them in parallel, you can download the meta chunk and deal with each data chunk directly.

Collection as a Simple Name Space

When assigning file ids,

curl http://master:9333/dir/assign?collection=pictures
curl http://master:9333/dir/assign?collection=documents

will also generate a "pictures" collection and a "documents" collection if they are not created already. Each collection will have its dedicated volumes, and they will not share the same volume.

Actually, the actual data files have the collection name as the prefix, e.g., "pictures_1.dat", "documents_3.dat".

If you need to delete them later see https://github.com/seaweedfs/seaweedfs/wiki/Master-Server-API#delete-collection

Optimized when the network is unstable

By default, if there is a slow read request caused by network problems, this request will block other requests, such as writing.

Set volume.hasSlowRead to true,this prevents slow reads from blocking other requests,but large file read P99 latency will increase.

Increasing volume.readBufferSizeMB (for example, set to filer.maxMB size) can make read requests require fewer locks , which can fix the P99 latency problem caused by volume.hasSlowRead

Logging

When going to production, you will want to collect the logs. SeaweedFS uses glog. Here are some examples:

weed -v=2 master
weed -logdir=. volume