IRC log of musicbrainz-devel on 2011-06-27

Timestamps are in UTC.

00:20:35 [voiceinsideyou]
voiceinsideyou has joined #musicbrainz-devel
02:09:05 [bitmap]
bitmap has joined #musicbrainz-devel
06:20:27 [dinog]
dinog has joined #musicbrainz-devel
06:25:22 [warp]
hello!
06:35:04 [Leftmost]
Leftmost has joined #musicbrainz-devel
06:35:04 [Leftmost]
Leftmost has joined #musicbrainz-devel
06:49:33 [Leftmost]
Leftmost has joined #musicbrainz-devel
06:49:38 [Leftmost]
Leftmost has joined #musicbrainz-devel
07:02:02 [reosarevok]
reosarevok has joined #musicbrainz-devel
07:10:49 [Leftmost`]
Leftmost` has joined #musicbrainz-devel
07:12:58 [Leftmost`]
Leftmost` has joined #musicbrainz-devel
07:13:41 [ruaok]
ruaok has joined #musicbrainz-devel
07:24:56 [warp]
ruaok: late!
07:27:12 [ruaok]
late?
07:27:13 [ruaok]
for what?
07:27:38 [warp]
your time of day.
07:27:53 [ruaok]
only 12:30am.
07:27:57 [warp]
I'm assuming you're in the US
07:27:59 [ruaok]
I just got home and I'm not cracked out.
07:28:07 [ruaok]
its reason to celebrate. :)
07:28:11 [warp]
haha
07:28:15 [nikki]
oh hi ruaok
07:28:22 [ruaok]
hi nikki
07:28:25 [nikki]
there's been reports of the search indexes not updating
07:28:33 [ruaok]
confirmed. :(
07:28:40 [nikki]
oh?
07:28:53 [ruaok]
yeah, a similar problem to last time.
07:29:04 [ruaok]
I need to get ijabz a some info so he can look into it.
07:30:46 [reosarevok]
warp: Track parser understands "16. Życie jest hajem - 3:42" as 16 / Życie jest hajem - / 3:42
07:30:51 [reosarevok]
(without taking - out)
07:30:59 [reosarevok]
Could that be fixed if I make a ticket for it?
07:31:04 [reosarevok]
Or is there some reason for it?
07:31:41 [reosarevok]
* reosarevok imagines it *can* happen that - is legitimately at the end of the track name, but…
07:32:15 [warp]
reosarevok: but... ?
07:32:26 [reosarevok]
But it will be very rare
07:32:37 [warp]
reosarevok: just as rare as completely omitting the artist :P
07:32:45 [reosarevok]
Compared to the huge amount of copied tracklists with - before the track length
07:32:52 [reosarevok]
So maybe we could get a checkbox or something?
07:36:24 [warp]
I've never seen such a tracklist, so I'm not convinced of the "huge amount" of tracklists having "- before the track length".
07:36:40 [warp]
obviously I'm biased because I don't listen to all the music which exists
07:37:40 [warp]
still, I think it would be a good idea to gather some proper data on how copy/pasted tracklists from many sources are handled by the track parser.
07:37:40 [reosarevok]
* reosarevok has seen lots, including at least one page for Spanish hip hop demos that hosts lots and lots of stuff which has it in every one
07:37:49 [reosarevok]
:)
07:38:02 [reosarevok]
I can live with it, but yeah, it might be good
07:38:11 [reosarevok]
Maybe we can make a blog post about that?
07:38:28 [reosarevok]
Like "is there any issue you've found with the way the track parser works?"
07:38:37 [nikki]
heh...
07:38:39 [warp]
that's too loose
07:38:49 [reosarevok]
Yeah, maybe
07:38:49 [nikki]
in mason we had an option for detecting track times
07:38:52 [reosarevok]
* reosarevok just woke up
07:39:06 [reosarevok]
But I guess people might see this kind of stuff as annoyances but not bugs (because they're not) and not report them
07:39:10 [nikki]
or something like that... I never really used it, I rarely have a tracklist with times
07:39:17 [reosarevok]
So maybe we have to ask
07:39:20 [reosarevok]
Yeah, we had that
07:39:27 [reosarevok]
* reosarevok did use it some times
07:39:27 [ruaok]
ok, problem reported.
07:39:34 [nikki]
* nikki wants it to detect "M" before track numbers
07:39:43 [warp]
reosarevok: I think it would be more important to gather a good list of sources first.
07:39:52 [ruaok]
its in cdstubs, so I took those out of the build rotation. everything else should be up to date in about 4.5 hours.
07:41:38 [nikki]
warp: you mean a list of examples or?
07:42:33 [warp]
nikki: I mostly know and use bandcamp, amazon and discogs. but there are many more places with lots of tracklists where people copy/paste from, I would like to have a list of them.
07:42:57 [nikki]
itunes, all the major japanese labels...
07:43:02 [nikki]
vgmdb
07:43:14 [nikki]
* nikki is obviously biased towards jp sources ;)
07:44:21 [Leftmost]
* Leftmost doesn't suppose nikki knows a source for Japanese CDs that might sell them for less than USD 30.
07:44:28 [warp]
I wish we could just include importers for sites like that in musicbrainz itself :)
07:44:44 [nikki]
Leftmost: what're you looking for?
07:44:57 [Leftmost]
A couple Thee Michelle Gun Elephant CDs.
07:45:12 [warp]
Leftmost: if it's popular enough in other asian countries korean or hongkong editions may exist and will be much cheaper.
07:45:32 [warp]
also, Thee Michelle Gun Elephant \o/
07:45:42 [ijabz]
ijabz has joined #musicbrainz-devel
07:45:54 [nikki]
* nikki wonders which ones
07:46:39 [Leftmost]
Sabrina Heaven mostly, although there was another I wanted which now escapes me... sec.
07:48:05 [Leftmost]
I think it was Gear Blues.
07:52:04 [nikki]
hm. nope :/
07:52:12 [nikki]
unless you want them second hand off ebay or something
07:55:19 [Leftmost]
I could deal with that. Last time I checked, they were still pretty steep though. I can check again.
07:58:26 [ijabz]
ruaok:index fail was simple because there seems to be a cdstub with no artist, is that valid ?
07:58:56 [ruaok]
shouldn't be, really.
07:59:06 [ruaok]
can you please enter a bug for that and then add a work-around?
07:59:17 [ruaok]
I can deploy that in the morning.
07:59:49 [ruaok]
one sec. a release or track artist?
08:00:04 [ruaok]
the release artist can be blank to indicate a va release. but then all tracks should have artists.
08:01:06 [ijabz]
release_raw.artist
08:01:20 [ijabz]
Im just fixing the code now, should only take a few minutes
08:01:34 [ruaok]
yes, that should be valid.
08:01:42 [ruaok]
not sure why that wasn't a problem before.
08:01:49 [ijabz]
Right, no nor am i
08:09:00 [ijabz]
Actally I wont rush, if your not deploying until later today
08:09:28 [ruaok]
good thinking. I'm too tired to do anything.
08:09:38 [ruaok]
I should be going to bed. I've been yearning for my own bed for days now.
08:09:44 [ruaok]
what's keeping me now? :)
08:17:04 [ijabz]
u still here, u need someone else there to nag you into going to bed
08:17:29 [ruaok]
maybe thats why I live alone. :)
08:17:48 [ijabz]
hehe
08:18:44 [ijabz]
Some say leaders can survive on 3 or 4 hours sleep
08:20:53 [nikki]
nikki has joined #musicbrainz-devel
08:21:12 [ruaok]
well, I'm out
08:21:14 [ruaok]
nn everyone!
09:04:28 [Leftmost]
Leftmost has joined #musicbrainz-devel
09:04:35 [Leftmost]
Leftmost has joined #musicbrainz-devel
09:23:00 [ijabz]
warp:, hi could you help me out with something please
09:23:47 [warp]
ijabz: possibly
09:23:51 [ijabz]
Could you run SELECT * FROM release_raw where release_raw.artist is null on server
09:26:43 [warp]
I get one result on the production servers.
09:28:24 [warp]
ijabz: http://paste.pocoo.org/show/dev1IYYBnhiiM06fYV0o
09:28:40 [pecastro]
pecastro has joined #musicbrainz-devel
09:29:17 [ijabz]
thanks, that sounds right, this is what caused the indexes to not build.
09:29:53 [ijabz]
Ok, supplemanetary question, Rob said it was valid to have a null artist for various artist releases, but if that is the case why do we only have one such release ?
09:30:26 [warp]
because most people will enter "Various" or "Various Artists" as the release artist for such releases.
09:30:47 [warp]
SELECT count(*) FROM release_raw where release_raw.artist ilike 'Various%'; gives 7903 rows.
09:31:24 [warp]
there's even 80 with just "VA".
09:31:27 [ijabz]
but both are valid ?
09:31:48 [ijabz]
s/both/all
09:32:05 [warp]
well, these are cdstubs. the whole point of having them is for data which hasn't been edited properly.
09:32:31 [warp]
so I'm not sure there is a concept of "valid" with them :)
09:32:31 [ijabz]
ok, thankyou sir
09:33:03 [warp]
I wouldn't mind making the release artist mandatory in these situations though. users can always enter "Various Artists" for those.
09:34:07 [ijabz]
right, so Il'll raise an improvement issue for you on this then shall I
09:34:23 [warp]
sure, that's fine.
09:35:03 [warp]
I'll have to check with ruaok though, because he said null should be allowed there. Even if I think that's rubbish I'll discuss it with him before making changes :)
09:35:44 [reosarevok]
I would say if nobody is using it, the reasoning for having it becomes null
09:36:02 [warp]
reosarevok: hey, atleast one person used it :D
09:46:53 [pecastro]
Hi guys... Just confirming something ... currently there'0s no "infrastructure" to have separate READ|Write queries to the DB right ?
10:00:51 [warp]
pecastro: that sounds correct.
10:02:14 [pecastro]
warp: Ok. I just wanted to be absolutely sure. Here we're deploying the slave against a postgres with replication and the DBA's thought we we're going to use the slaves.
10:02:18 [pecastro]
warp: tkx.
10:07:15 [voiceinsideyou]
voiceinsideyou has joined #musicbrainz-devel
12:22:42 [adhawkins]
adhawkins has joined #musicbrainz-devel
12:29:34 [ijabz]
ijabz has joined #musicbrainz-devel
12:49:24 [ijabz]
ijabz has left #musicbrainz-devel
13:00:11 [zazi]
zazi has joined #musicbrainz-devel
13:29:55 [adhawkins]
Any devs here?
13:37:16 [voiceinsideyou]
voiceinsideyou has joined #musicbrainz-devel
14:06:40 [warp]
adhawkins: yes.
14:09:08 [adhawkins]
I'm getting close to being able to release libmb4
14:09:23 [adhawkins]
Want to put some information in the docs as to location of bug reporting
14:10:29 [adhawkins]
There appears to be a libmusicbrainz category in tickets.musicbrainz.org
14:10:33 [adhawkins]
Is that the one to use?
14:10:53 [adhawkins]
Also I guess it'd be good if I were automatically assigned bugs that are reported in that category?
14:16:45 [warp]
adhawkins: using libmusicbrainz on tickets.musicbrainz.org sounds good to me, do you have an account on tickets.musicbrainz.org?
14:17:56 [adhawkins]
Yes, I think so
14:18:09 [adhawkins]
Yes, I do
14:18:16 [adhawkins]
adhawkins
14:18:20 [adhawkins]
Funnily enough :)
14:20:10 [warp]
I made you "Project Lead" as I see no other way to assign a default user for tickets in that project.
14:20:29 [warp]
if anyone object someone more familiar with our jira setup will have to fix it :)
14:20:34 [warp]
s/object/objects/
14:21:00 [adhawkins]
Ok fair enough
14:24:31 [adhawkins]
Just looking at ticket lmb-11
14:24:51 [adhawkins]
That will apply to my interface too (I've pretty much duplicated the way Lukz did it)
14:25:10 [adhawkins]
Was wondering if a better way would be to allow the 'getter' to pass in a null pointer, and have the length a pointer too.
14:25:20 [adhawkins]
Then if a null str pointer is sent, the length is updated with the required length
14:27:05 [warp]
it all sounds a bit confusing
14:27:12 [adhawkins]
How so?
14:27:13 [warp]
I haven't used C properly in years :)
14:27:17 [adhawkins]
lol
14:27:22 [adhawkins]
The current mechanism is
14:27:30 [adhawkins]
char String[256];
14:27:39 [adhawkins]
mb_get_release_title(Release,String,256);
14:27:49 [adhawkins]
So you have to know how big to make the string you pass in
14:27:52 [adhawkins]
I'm suggesting
14:27:57 [adhawkins]
char *String
14:28:01 [adhawkins]
int Length=0;
14:28:12 [adhawkins]
mb_get_release_title(Release,0,&Length)
14:28:17 [adhawkins]
this would set length to the required length
14:28:25 [adhawkins]
You then do a malloc, and repeat the call passing in a valid pointer
14:28:45 [adhawkins]
He's suggesting that the malloc be done within the library, and the caller then be responsible for freeing the string
14:28:49 [warp]
ah right, I had to do something similar when calling libicu
14:29:10 [warp]
it seems completely bonkers to me to have to call the same thing twice for different purposes.
14:29:24 [adhawkins]
Right, that's common in the Windows API for example :)
14:29:47 [warp]
having the library malloc it and the caller be responsible for free()ing seems more logical to me.
14:30:24 [adhawkins]
Ok. Obviously I can do either
14:30:29 [adhawkins]
But now seems like as good a time as any to change it
14:30:36 [warp]
though then there are probably people who want to provide their own malloc(), so you may have to have some way for them to override it...
14:30:38 [adhawkins]
Given that the whole API is changing for NGS
14:30:47 [adhawkins]
Ah yes...
14:31:39 [adhawkins]
I'll chat it over with Lukas next time I see him online
14:33:04 [warp]
oh, in my code which calls libicu I only call the function again if there was enough space pre-allocated
14:33:45 [adhawkins]
Right. How does it tell you theat?
14:34:26 [warp]
https://github.com/metabrainz/musicbrainz-server/blob/master/postgresql-musicbrainz-collate/musicbrainz_collate.c#L100
14:35:08 [adhawkins]
So the return value is the required size, not the size actually copied?
14:35:14 [warp]
yes
14:35:21 [adhawkins]
Ok, that sounds possible.
14:35:40 [adhawkins]
For strings, would you expect that to include the NULL?
14:36:00 [warp]
I'm not saying that's a particularly good way of doing it. it is just something which other libraries apparently are doing in production code :)
14:36:05 [adhawkins]
:)
14:36:35 [adhawkins]
Wouldn't it be more efficient to pass in NULL to the first call?
14:36:40 [warp]
adhawkins: on, on that second question see the comment at line 65 in that same file.
14:36:41 [adhawkins]
Saving the first alloc?
14:37:00 [adhawkins]
Ok, so string lengths don't include the NULL.
14:37:06 [warp]
adhawkins: not if you expect that the pre-allocated amount is usually enough.
14:37:19 [adhawkins]
I guess so
14:38:15 [warp]
adhawkins: I think a release title for example can be longer than 256 bytes, but the vast majority will fit in 256 bytes -- so a caller could decide to prealloc that.
14:38:25 [adhawkins]
Yeah
14:41:29 [adhawkins]
I've added a note to that ticket, see if the original reporter has any particular opinion
14:41:54 [adhawkins]
One question regarding the tracker, it has a tab for subversion commits
14:42:13 [adhawkins]
Is this filled in automatically? If so, presumably I need to put something in the correct format in the submit comments for SVN?
14:48:57 [warp]
erm, on tickets.musicbrainz.org? I have no idea, I've never used it. ofcourse the musicbrainz server code is developed with git, not subversion.
14:50:29 [adhawkins]
Ah ok, looking at the documentation for jira
14:54:57 [luks]
the problem with calling free() in the application is that you have to call the right free()
14:55:05 [luks]
it must be the same standard library
14:55:13 [luks]
which is not so obvious if you are using a dll
14:55:42 [adhawkins]
How about the suggestion of having the getters return the required length?
14:55:52 [adhawkins]
That then allows the caller to not pass in a pointer just to retrrieve the length
14:56:04 [adhawkins]
Or pass in a pre-allocated pointer but know if they didn't allocate enough space
14:57:13 [warp]
* warp happy luks showed up with better C knowledge :)
14:57:26 [adhawkins]
* adhawkins doesn't really 'do' C any more either :)
14:58:19 [luks]
hm, I'm thinking about always returning a pre-allocated pointer
14:58:33 [luks]
that will be available only until the next mb4 call
14:58:35 [adhawkins]
Then I'd need to supply a generic 'delete string'
14:58:41 [adhawkins]
Erk, that's nasty.
14:58:43 [luks]
no, you would never delete it
14:58:49 [luks]
pretty much what c_str() does
14:59:10 [adhawkins]
So that'd be what, a global variable internally?
14:59:21 [adhawkins]
Whenever you want to return a string, if that's non null you free it first
14:59:26 [luks]
yeah, that probably sucks
14:59:27 [adhawkins]
Then re-allocate it for the next one?
14:59:34 [luks]
because you can no longer use multiple thread
14:59:43 [luks]
it would have to live in the object instances, which is not that easy
14:59:48 [adhawkins]
Yes
15:00:10 [adhawkins]
Another option would be to have a get_length for each string too
15:00:18 [luks]
well, what about changing "int len" to "int *len"
15:00:25 [adhawkins]
I did consider that too :)
15:00:31 [luks]
you can pass a pointer and an int that says it's 256 bytes long
15:00:40 [adhawkins]
Then warp showed me the api he's using where the return code is the required length
15:00:44 [luks]
if it's too short, it will return an error but also the expected length
15:00:54 [adhawkins]
https://github.com/metabrainz/musicbrainz-server/blob/master/postgresql-musicbrainz-collate/musicbrainz_collate.c#L100
15:01:15 [reosarevok_]
reosarevok_ has joined #musicbrainz-devel
15:01:43 [luks]
oh, these functions currently return void?
15:02:00 [luks]
then yes, returning the length makes sense
15:03:15 [adhawkins]
Yes, all the C string getters return void
15:03:27 [adhawkins]
So, is that a 'decision'? :)
15:04:34 [luks]
well, it's your choice :)
15:04:56 [adhawkins]
Yes, but I'm seeking informed opinion :)
15:05:24 [luks]
I'm happy with returning the string length
15:06:02 [adhawkins]
Ok, I'll do that then
15:09:33 [pecastro]
luks: Hi luks. Did you have a chance of reading some of the comments I left you on the private window ?
15:11:49 [luks]
sorry, I must have missed it
15:11:52 [luks]
* luks reads
15:13:29 [luks]
well, I'm not sure, the guys wanted to rename the functions
15:13:44 [luks]
then warp gave it a shipit
15:14:06 [luks]
I haven't done any server work recently, so I'm really not sure
15:14:09 [pecastro]
I think I can do that, as in, I think I can squeeze some time to perform that .. I just need some guidance on the process...
15:14:49 [luks]
I'm personally happy with the changes as they are
15:14:55 [pecastro]
luks: Would you mind having a look at my latest comments here -> http://codereview.musicbrainz.org/r/1291/ and assert if this is enough / If I'm in the right path ?
15:15:45 [luks]
sorry, but I'm really not the right person to do that
15:15:50 [pecastro]
The bits I'm less certain about is how you guys manage schema changes which would undoubtedly be necessary in this case...
15:16:30 [luks]
I did write the code, but I don't work on the server anymore
15:16:41 [luks]
I'd merge the code as it is if it was my decision
15:18:45 [pecastro]
Do you think it could benefit if you state that as a fact in the review ticket? That might originate either some traction for the work to be merged or a more decisive answer regarding the changes required...
15:23:44 [adhawkins2]
adhawkins2 has joined #musicbrainz-devel
15:25:31 [luks]
I think the only reason why it's not merged is that everybody is afraid to not break everything
15:25:32 [voiceinsideyou]
voiceinsideyou has joined #musicbrainz-devel
15:25:39 [luks]
which will be even more likely if you rename the functions
15:30:52 [adhawkins2]
How quickly will a subversion commit appear in JIRA?
15:31:45 [luks]
never :)
15:32:09 [adhawkins2]
Really?
15:32:11 [adhawkins2]
Oh, ok.
15:32:33 [luks]
yeah, it's not setup
15:32:49 [luks]
the guys are mostly using git and I guess that doesn't work in general
15:33:59 [luks]
hm, what is CNoneMBTrack?
15:34:16 [adhawkins2]
No idea. It's in the schema :)
15:34:34 [adhawkins2]
Comes back with a freedbdisc I think
15:34:36 [luks]
oh, wow :)
15:35:25 [adhawkins2]
I think the library is pretty much ready for release now.
15:35:32 [hawke_]
hawke_ has joined #musicbrainz-devel
15:35:39 [adhawkins2]
Luks, any chance you can check out the latest version of my branch and have a look over it and the documentation?
15:35:47 [adhawkins2]
See if there's anything that's unclear or could be done better?
15:36:10 [luks]
I wonder if it's None vs Non was intentional or it's a typo
15:36:37 [adhawkins2]
Oh, could be my typo, one sec
15:36:55 [luks]
it's called def_nonmb-track in the schema
15:37:10 [adhawkins2]
Whoops, it's mine.
15:37:13 [adhawkins2]
I'll fix that.
15:48:29 [adhawkins2]
Ok, that's done
15:48:37 [adhawkins2]
If you could cast a quick eye over it all I'd apprecaite it
15:48:41 [adhawkins2]
Then we can merge and announce!
15:49:24 [luks]
I'll read the code later today
15:49:52 [adhawkins2]
Excellent.
15:50:01 [adhawkins2]
There are doxygen docs too, so if you could have a look over them it'd be appreciated
15:57:26 [ijabz]
ijabz has joined #musicbrainz-devel
17:03:33 [ppawel]
ppawel has joined #musicbrainz-devel
17:09:50 [reosarevok]
reosarevok has joined #musicbrainz-devel
17:22:38 [BarryNorton]
BarryNorton has joined #musicbrainz-devel
17:27:00 [ruaok]
ruaok has joined #musicbrainz-devel
17:54:35 [rt_luckie]
rt_luckie has joined #musicbrainz-devel
18:30:28 [Atsby]
Atsby has joined #musicbrainz-devel
18:31:01 [batsy_]
batsy_ has joined #musicbrainz-devel
18:45:25 [reosarevok]
reosarevok has joined #musicbrainz-devel
18:45:55 [ijabz_]
ijabz_ has joined #musicbrainz-devel
18:46:30 [Batsy]
Batsy has joined #musicbrainz-devel
18:55:47 [pecastro]
pecastro has joined #musicbrainz-devel
18:58:03 [ruaok]
ruaok has joined #musicbrainz-devel
18:59:39 [pronik]
pronik has joined #musicbrainz-devel
19:00:07 [ruaok]
<BANG>
19:00:11 [ruaok]
meeting time!
19:00:12 [warp]
eep!
19:00:28 [ruaok]
since ocharles is still out, you get to start warp!
19:00:37 [warp]
yay!
19:00:57 [warp]
well, there's nothing particularly interesting to report.
19:01:08 [warp]
just fixing more bugs.
19:01:38 [ruaok]
how did the updated bug list look to you?
19:01:41 [warp]
I worked on two webservice tickets, one of which I need some feedback from ocharles.
19:01:45 [ruaok]
the top 5 result list?
19:01:56 [warp]
the rest was mostly release editor stuff.
19:02:11 [warp]
ruaok: yes, looks much more manageable :)
19:02:16 [ruaok]
ok, good.
19:02:40 [warp]
ruaok: out of those four tickets are "in review"
19:02:48 [ruaok]
I'm guessing we dont have enough stuff to do a release today.
19:02:51 [warp]
one is discussion required, perhaps that should go in the topic today.
19:03:13 [ruaok]
go ahead and put that up.
19:03:19 [warp]
warp has changed the topic to: agenda: review (inc gsoc), MBS-2194
19:03:44 [ruaok]
next week will be a holiday here in the US.
19:03:53 [ruaok]
I personally will be out on monday and tuesday.
19:04:00 [warp]
ruaok: I started looking into the performance issues with the release editor.
19:04:06 [ruaok]
can you please lead the meeting, including sending out the reminder?
19:04:21 [warp]
(for reference, that is MBS-2126)
19:04:30 [warp]
ruaok: yes, sure.
19:04:30 [ruaok]
warp: great, I'm sure lots of people will be glad to hear that.
19:04:35 [ruaok]
ok, thanks.
19:05:24 [warp]
I've only looked at the first page (Release Information) today, there are a few code paths which are executed several times (which shouldn't be neccesary)
19:05:39 [warp]
and a few queries which can be cached in memcached.
19:06:09 [ruaok]
good, sounds like there are some low hanging fruit to get some immediate improvements.
19:06:14 [warp]
ruaok: I expect I can fill the entire week with this stuff, but hopefully we'll get some solid improvements.
19:06:33 [ruaok]
agreed.
19:06:37 [ruaok]
keep hammering on then!
19:06:45 [ruaok]
as for a release next monday, it would probably good to get something out, even if its not the usual pile of fixes.
19:06:59 [warp]
it's not _that_ low hanging, I expect it will take atleast some minor refactoring to fix the issues.
19:07:02 [ruaok]
you and ocharles should just take care of that and I can tag the release upon my return.
19:07:10 [ruaok]
* ruaok nods
19:07:26 [warp]
ruaok: ok, ocharles is back this week?
19:07:37 [ruaok]
yes. let me check.
19:07:53 [ruaok]
he is out through wednesday. back thu then.
19:08:05 [ruaok]
and djce is also out this week.
19:08:07 [warp]
ok, and you're gone on just monday and tuesday? or more?
19:08:17 [warp]
(er, next week monday and tuesday)
19:08:23 [ruaok]
which means that you and ocharles will need to look after the servers from friday morning - tuesday evening.
19:08:45 [ruaok]
I'll be in the desert with zero communications.
19:08:57 [ruaok]
for a much needed "vacation". yeah, right.
19:09:01 [warp]
:)
19:09:07 [ruaok]
the desert is always very relaxing.
19:09:16 [warp]
oh, on that topic.
19:09:20 [ruaok]
shoot.
19:09:35 [warp]
I plan to spend a few weeks in ecuador after italy.
19:09:53 [ruaok]
all vacation or some working vacation?
19:09:59 [warp]
I intend to work there, so I can stay there a month or so. but we'll have to see how that goes.
19:10:07 [ruaok]
and \�/ for going out to ecuador!
19:10:30 [ruaok]
it will be weird to have you 3 time zones away. might be nice for collaboration.
19:10:33 [warp]
I just purchased a beefy laptop to use in italy and ecuador, installed mb_server on it, looking good so far :)
19:10:39 [ruaok]
excellent.
19:10:48 [ruaok]
* ruaok needs a new lappy as well.
19:10:54 [ruaok]
the macbook isn't doing so hot.
19:10:56 [reosarevok]
Yay for Ecuador
19:11:13 [ruaok]
anyways, last week for me: it was mostly spend dealing with echoprints one way or another.
19:11:18 [ruaok]
I
19:11:18 [ruaok]
'
19:11:33 [warp]
haha
19:11:36 [ruaok]
I'm not certain how all that will play out, but I'm encouraged that we have new/open solutions.
19:11:54 [warp]
indeed.
19:12:05 [ruaok]
I've got lots of email to do deal with still and some search index fussing to do.
19:12:20 [warp]
the focus of echo print seems a bit different from acoustid. so it would be ideal to just support both in musicbrainz.
19:12:25 [ruaok]
on thursday evening, I'm going to meet with Adrian from last.fm to touch base about data between last.fm and MB.
19:12:32 [ianmcorvidae]
they use fairly different technologies, yeah
19:12:50 [ruaok]
we're finally re-kindling our relationship, which had pretty much fizzled after RJ & co left last.fm.
19:12:56 [ianmcorvidae]
yay :D
19:13:00 [ruaok]
warp: thats the plan.
19:13:01 [warp]
ruaok: ah, sounds good.
19:13:22 [ruaok]
though luks tells me that MB needs to do nothing to support acoustid, its all handled on the acoustid server.
19:13:34 [ianmcorvidae]
yeah, acoustid already understands mbids
19:13:55 [ruaok]
and on another note, digital west has bumped us from 6mbit commit to 10mbit commit for the same price.
19:14:00 [warp]
ruaok: well, we'll picard needs support even if we don't have to touch our server.
19:14:09 [ruaok]
which is awesome and brings the real life impact of NGS on our bills down to zero. :)
19:14:17 [ruaok]
warp: true that.
19:14:44 [ruaok]
this week will be a bunch of little stuff for me, I dont expect to get anything earth shattering done.
19:14:58 [ruaok]
also, Cory is still out and thus we have no word from him on the cover art archive.
19:15:04 [reosarevok]
:(
19:15:10 [ruaok]
so, we ned to sit tight still.
19:15:25 [ruaok]
thats all the news I can think of.
19:15:38 [ruaok]
nikki, ijabz_, murdos: anything from you?
19:16:14 [nikki]
not really
19:16:19 [ijabz_]
ruaok: There would be big advantages to putting acoustid into mb server but i guess thats a discussion for another day
19:16:20 [reosarevok]
ruaok: while you're at it, could you kindly, politely tell them "COULD YOU DIFFERENTIATE ARTISTS WITH THE SAME NAME ALREADY YOU DAMNED LAZY GUYS?"
19:16:25 [reosarevok]
(but really politely)
19:16:29 [reosarevok]
;)
19:16:40 [ijabz_]
Not server itself , but the (ids like puids)
19:16:41 [warp]
lol
19:17:01 [reosarevok]
ruaok: I imagine you're aware that we have 4! autoeditor elections going on already
19:17:05 [ijabz_]
Just fixed search index bug
19:17:12 [reosarevok]
(that being a happy 4, not factorial of 4)
19:17:21 [ruaok]
ijabz_: luks prefers to manage all that in acoustid, so that discussion is not ours to have. you'd need to have it with him.
19:17:24 [warp]
* warp was not aware.
19:17:48 [ruaok]
reosarevok: I'll make a note of it. :)
19:17:59 [ijabz_]
i'll do so
19:18:05 [ruaok]
ok.
19:18:12 [ianmcorvidae]
huh, 4, wow; I only knew of two
19:18:22 [bitmap]
reosarevok: possibly more soon, there were a couple people I wanted to elect too. :)
19:18:28 [ruaok]
nice, its good to see those elections back up. :)
19:18:40 [ruaok]
* ruaok will vote in a bit.
19:18:48 [ruaok]
ianmcorvidae: how is your gsoc stuff?
19:18:49 [reosarevok]
bitmap: nice! are they Europeans or classical editors?
19:19:18 [ianmcorvidae]
it is going well
19:19:23 [ianmcorvidae]
I've been having an inflection-point week
19:19:31 [ianmcorvidae]
mostly bugfixing, styling, refactoring, that sort of thing
19:19:37 [ianmcorvidae]
seemed like a good week to do that, with ocharles on vacation
19:19:45 [bitmap]
reosarevok: not classical, but one edits a lot of european stuff
19:19:54 [reosarevok]
:)
19:20:03 [reosarevok]
* reosarevok will stop interrupting now
19:20:09 [ianmcorvidae]
so I have one gigantic review up, some changes to fix after warp's review of that, and that's about all
19:20:28 [ianmcorvidae]
I also found a fix for the problems nikki has been having (canvas text support things), still tinkering to make it work properly though
19:20:33 [ruaok]
ianmcorvidae: great. I'm having fun playing with your improvements after each release. :)
19:20:38 [ianmcorvidae]
:)
19:20:46 [ruaok]
bitmap: how about you?
19:21:25 [bitmap]
I didn't get as much done because my cat was missing for a few days :( but I was working on http://codereview.musicbrainz.org/r/1371/ most of the time which is a fairly large review
19:21:58 [bitmap]
and I was working on more collection management stuff, which I think I have to discuss with warp
19:22:28 [bitmap]
collections-related stuff will definitely be pushed to my gsoc-ngs branch this week.
19:22:49 [warp]
ok
19:22:51 [ruaok]
awesome.
19:23:16 [ruaok]
Batsy: how about you?
19:23:23 [ruaok]
we're overdue for touching base.
19:23:40 [Batsy]
Carrying on pretty well for the most part. Definitely getting there re: making the webservice do its calls
19:23:47 [Batsy]
have a small roadblock that I'm working through re: data types
19:23:57 [Batsy]
biggest thing is I couldn't get hold of djce, is there anyone else that can set me up on git?
19:24:17 [warp]
Batsy: probably ocharles
19:24:23 [ianmcorvidae]
heh
19:24:27 [ianmcorvidae]
who comes back from vacation first? :P
19:24:30 [ruaok]
who is also out this week.
19:24:33 [Batsy]
yeah
19:24:34 [ruaok]
ocharles does.
19:24:34 [warp]
Batsy: can you use github in the meantime?
19:24:43 [Batsy]
yeah, I could do that.
19:24:49 [ruaok]
I can probably figure out how to get you setup.
19:24:54 [ruaok]
let me look into that after the meeting.
19:25:08 [warp]
we have many people contributing things through github, in my experience that works just as well.
19:25:22 [Batsy]
cool.
19:25:31 [ruaok]
warp: ok, I won't try too hard then. :)
19:26:03 [ruaok]
Batsy: lets just do that. submit your work to github for the time being and ping me with a url when you have one.
19:26:07 [Batsy]
okay, will do.
19:26:21 [Batsy]
may be asking for some advice in here soon re: datatypes in perl if I can't sort that out, but that shouldn't be a major issue
19:26:27 [ruaok]
please ask now. :)
19:26:46 [ruaok]
(our agenda isn't very crowded so we can all help for a change)
19:26:52 [Batsy]
okay
19:27:30 [Batsy]
the issue is, for the artist widget right now, it pulls the info directly related to the artist from mbid, and then for the artist's releases to attach to the widget an array gets populated
19:27:59 [Batsy]
but that info only gets returned as an array instead of hash info within the array, couldn't get it to output correctly, though I'm pretty sure the array is getting populated with what it needs
19:28:33 [ianmcorvidae]
I glanced at this the other day; I think the release stuff is returning an array of an array of hashes and an integer
19:28:51 [ianmcorvidae]
not sure what the integer is, and not totally sure of syntax, so I couldn't really coach Batsy as to how to get to the Release hashrefs
19:29:07 [ruaok]
this sounds like it might be paging info or somesuch.
19:29:19 [ianmcorvidae]
possibly
19:29:26 [ianmcorvidae]
it's the same calls that the XML webservice for artist is using
19:29:36 [ianmcorvidae]
(to get releases for an artist at /ws/2/artist)
19:29:38 [ruaok]
ok.
19:29:42 [Batsy]
...yup
19:29:46 [ruaok]
shouldn't be too hard to fix. :)
19:29:52 [ruaok]
but it would be better to look a the code.
19:30:14 [ruaok]
so, get the code in and then hit me with the URL and a link to the specific stuff that is giving you issues.
19:30:25 [ruaok]
* ruaok will be in the office til 4pm pdt
19:30:28 [Batsy]
Yeah, I'll get that pushed somewhere today and we can go over it.
19:30:32 [ruaok]
great.
19:30:36 [warp]
Batsy: if you have some code up on github we can more easily take a look at things.
19:31:21 [ruaok]
ok, anyone else for review?
19:31:28 [warp]
Batsy: you can always poke me, ocharles or ruaok with your questions. that should cover most time zones too ;)
19:31:44 [ruaok]
esp with warp going to ecuador. :)
19:31:54 [ruaok]
not sure how you're going to survive the heat, warp. :(
19:31:56 [Batsy]
haha, yup
19:32:18 [ruaok]
right then, onward to MBS-2194
19:32:21 [ruaok]
warp?
19:32:27 [warp]
ruaok: yeah, that's the scary bit. and probably shoddy internet connections.
19:32:31 [warp]
right
19:32:43 [warp]
MBS-2194 deals with the behaviour of the advanced view of the tracklist page in the release editor.
19:32:47 [ruaok]
ugh. double whammy.
19:32:54 [warp]
haha
19:33:18 [warp]
that view has from the start been a view where you just edit the data in the database, without the release editor second guessing you.
19:34:08 [warp]
so if you swap two tracks by editing the track title fields, their recording associations won't be swapped, because the editor assumes you know what you're doing if you're on that view.
19:34:48 [ruaok]
could we maybe detect this and give the a polite nudge to the user?
19:34:58 [ruaok]
I see that you ..., you sure you want that you dolt?
19:35:00 [warp]
In MBS-2194 users ask for that behaviour to change, and for the release editor to start guessing recording assocations somewhat similar to what it does with the track parser.
19:36:41 [reosarevok]
I don't think it should guess
19:36:44 [warp]
(which, as you may expect from how I worded the previous sentences is something I don't really agree with)
19:36:52 [reosarevok]
But maybe it should de-assign
19:36:56 [ruaok]
yes, neither do I.
19:37:02 [reosarevok]
I mean
19:37:10 [ruaok]
but I think it might be nice for it to detect possible problems and alert the user.
19:37:19 [reosarevok]
There is a threshold after which it asks you to re-confirm the recording
19:37:28 [ruaok]
but the user can just say: "I know, I'm aware of that, piss off".
19:37:33 [reosarevok]
Maybe we just need to change that threshold
19:37:43 [warp]
yes, reconfirming the recording sounds fine, and shouldn't be at all difficult to implement there.
19:38:29 [reosarevok]
I think even murdos would agree that's enough
19:39:36 [ruaok]
warp: want to see if that is sufficient for murdos? if not, come back and discuss more?
19:39:57 [warp]
ok, I'll update the ticket with that preliminary decision. I don't expect to work on it this week, if anyone disagrees we can discuss it again at the next meeting.
19:40:07 [ruaok]
sounds good.
19:40:18 [ruaok]
ok, anyone else with a topic?
19:40:30 [pecastro]
hrrr
19:40:38 [ruaok]
yes, pecastro?
19:40:39 [warp]
pecastro :)
19:40:45 [luks]
I have something about echoprints, if it's not offtopic
19:40:51 [ruaok]
luks: go
19:40:53 [pecastro]
Can I talk about the postgres unnacent library work ?
19:41:10 [pecastro]
I can go after luks.
19:41:13 [ruaok]
pecastro: sure. go after luks/echoprint
19:41:16 [ruaok]
ok
19:41:24 [luks]
ok, so there are echoprint track IDs (that start with TR) and there are echoprint song IDs (that start with SO)
19:41:36 [luks]
I think the intentionn was to track the TR ones in MB
19:41:37 [warp]
warp has changed the topic to: agenda: echoprint, unaccent
19:41:53 [ruaok]
and I've already seen SO echoprints.
19:41:55 [luks]
but the script that alastairp wrote submits the SO IDs
19:42:22 [ruaok]
alastairp: prod
19:42:43 [luks]
ok, so my questions are: which ones do we want to track? if we want to SO, it would be nice to have the mapping from TR to SO publicly available
19:42:53 [luks]
because it's currently only in the private echo nest database
19:42:58 [alastairp]
hety
19:43:02 [alastairp]
sorry, a little busy now
19:43:35 [ruaok]
luks: can you please post these questions to mb-devel?
19:43:40 [ijabz_]
luks have you gleaned this from sourcecode, or is all this explained anywhere because Im struggling to get a handle on the echoprint stuff
19:43:49 [ruaok]
alastairp: please answer luks' email in mb-devel when you get time.
19:44:10 [ruaok]
ijabz_: this was mostly from discussions with alastairp.
19:44:15 [ruaok]
but the docs are scant at best.
19:44:15 [luks]
ijabz_: I've been following the development since it appeared on github a few weeks ago :)
19:44:28 [luks]
ok, I'll send a mail to mb-devel
19:44:33 [ruaok]
luks: thanks.
19:44:37 [ruaok]
pecastro: you're up.
19:45:05 [pecastro]
Tkx.... In a nutshell...
19:45:28 [pecastro]
According with the requirements of the ticket, the work, which was to make sure that installing the extensions wouldn't clash with the shipped libraries of pg is done and ready to be committed. In the meantime some digression arose regarding potentially also changing the names of the sql functions being called.
19:45:48 [alastairp]
ruaok: k, will do
19:46:16 [pecastro]
Well... I do agree that adding that extra bit would make the work totally clean and idempotent of any other stuff but at the same time as luks points out it raises the risk as we we'll have to grep instances of the code for the usage of those functions.
19:47:09 [pecastro]
Also, warp as flagged it as good to go.. so I'm wondering if is there something missing, ig we should do everything .... or if is already there is good enough for what we set up to do...
19:47:34 [warp]
I flagged as shippit because it improves things, and there is nothing wrong with the changes as is.
19:47:40 [pecastro]
aye.
19:47:45 [ruaok]
do you have an example of what the functions names are in both extensions?
19:47:56 [warp]
but I do think we also need to rename the functions, and preferably right now, because we're messing with it now anyway.
19:48:22 [pecastro]
select unnaccent("blá") for instance.
19:48:27 [ruaok]
the greater question in my mind is this: does our extension still need to exist?
19:48:30 [warp]
ruaok: both have "unaccent()"
19:48:56 [ruaok]
so, we *have* to rename something if a user ever wants to have both loaded at the same time.
19:49:11 [pecastro]
I don't think I mind, trying to that work of replacing the functions ...
19:49:15 [pecastro]
No not really.
19:49:29 [warp]
ruaok: both loaded at the same time in the same database as I understand it. what pecastro has done should already allow both to be installed in the same postgresql instance.
19:49:38 [ruaok]
no not really, our extenstion no longer has a purpose?
19:49:46 [pecastro]
As they are they would still work exactly as before. I just renamed the library that they use so it wouldn't clash with a similar library shipped with postgres-contrib
19:50:20 [ruaok]
how do the libraries differ? could we ditch ours and use the contrib one?
19:50:35 [ruaok]
less code to maintain == better
19:50:59 [luks]
if you define a dictionary of what should be replaced with what
19:51:06 [pecastro]
hrr hmm that's something I think luks is better prepared to answer than I am... actually I think luks was the original maintainer of the contrib ones ?
19:51:19 [pecastro]
But that's a good point.
19:51:31 [ruaok]
not sure I understand, luks.
19:51:56 [luks]
the extension in contrib is basically just a generic way to do text replacements
19:52:07 [ruaok]
ah.
19:52:11 [luks]
you define a text files with all the rules and it will do the replacements
19:52:18 [ruaok]
whereas ours is a complete out of the box solution?
19:52:25 [luks]
yes
19:52:34 [ruaok]
ok, then lets keep ours.
19:52:40 [luks]
I think it does provide some trivial mapping
19:52:48 [ruaok]
I agree with warp: since we're messing with it, lets fix it properly.
19:52:52 [luks]
but that doesn't cover all the characters that we want to replace
19:53:02 [pecastro]
In the end.... I think I'm good/can do the replaceents and hence the mb extensions would become something like mb_unnaccent or musicbrainz_ or whatever... I'm just looking for some agreement...guidance..
19:53:05 [ruaok]
pecastro: can you please change the names and submit an updated review?
19:53:23 [warp]
pecastro: I would say musicbrainz_unaccent, that's what I used for the collate extension.
19:53:34 [ruaok]
on July 11 we have schema change release scheduled.
19:53:44 [pecastro]
ruaok: you mean. Change the names of the postgres-unnacent, amend the code where that SQL is called ?! Right?
19:53:44 [ruaok]
in order to combine dbs and fix some replicated tables and all that.
19:53:52 [ruaok]
so, we can make this change at the same time.
19:54:22 [ruaok]
pecastro: change our version of the lib to prepend musicbrainz_ and then amend thec ode where that SQL is called.
19:54:32 [pecastro]
aye. I'm on it.
19:54:33 [ruaok]
(I think we said the same thing. :) )
19:54:36 [ruaok]
pecastro: thanks!
19:54:43 [ruaok]
righto.
19:54:46 [pecastro]
Tkx, that's all I required to know.
19:54:52 [ruaok]
last call for topics, otherwise we're done!
19:55:18 [ruaok]
ok, lets close the meeting then.
19:55:22 [ruaok]
</BANG>
19:55:22 [MBChatLogger]
MBChatLogger has changed the topic to: http://musicbrainz.org/#devel
19:55:25 [ruaok]
thanks for your time!
19:55:29 [ruaok]
warp: please post a link.
19:55:29 [warp]
thank you, ruaok!
19:55:33 [warp]
yes, sir
19:58:30 [Batsy]
got some code up now on github if anyone wants to take a look
19:59:35 [warp]
Batsy: sure, where can I fetch it and what is your question?
19:59:42 [Batsy]
https://github.com/lordbatsy/musicbrainz-server/commit/dbf5ce3d68e483ff7e4e5e2ece70d701e3060219#L0R114
19:59:51 [Batsy]
the stuff that's currently commented out is where we get the release info
20:00:01 [Batsy]
(this is an intermediate step, too, to see how data flows - eventually this'll go through a serializer)
20:00:21 [Batsy]
and then below where the "@releases_various[0]
20:00:23 [Batsy]
" is plugged in
20:00:30 [Batsy]
(that was just to see if it worked, as it happens, it didn't)
20:01:35 [warp]
ok
20:02:24 [warp]
Batsy: how do I run this code? /ws/widgets/artistwidget ?
20:02:31 [Batsy]
yes
20:04:21 [warp]
you can see what's in a variable with Data::Dumper.
20:04:44 [warp]
so for example I'm inspecting @releases_various by adding this code:
20:04:48 [warp]
use Data::Dumper;
20:04:52 [warp]
warn Dumper (\@releases_various)."\n";
20:04:57 [Batsy]
ok
20:06:01 [warp]
with that I can see that what find_for_various_artists returns is something like this: [ [ ], 5 ]
20:07:05 [Batsy]
hmm, ok
20:07:14 [warp]
nonvarious returns the same structure, but is empty.
20:07:37 [warp]
er, nonvarious has the same structure, but is empty.
20:07:44 [warp]
$VAR1 = [
20:07:44 [warp]
[],
20:07:44 [warp]
0
20:07:44 [warp]
];
20:08:32 [Batsy]
ok. nonvarious should be empty in this specific example, since the artist in question only has various artist releases
20:09:43 [warp]
yep, just replaced it with bjork and I get a whole bunch of results there.
20:09:50 [hawke_]
hawke_ has joined #musicbrainz-devel
20:11:40 [Batsy]
I need to figure out how to get at that data
20:12:45 [warp]
ok, I see the problem.
20:13:35 [warp]
find_by_artist returns a reference to a list with two things inside it: 1) a reference to the list of releases 2) a scalar with the number of hits
20:13:58 [warp]
no, I'm wrong.
20:14:03 [warp]
find_by_artist returns a list with two things inside it: 1) a reference to the list of releases 2) a scalar with the number of hits
20:14:06 [warp]
there.
20:14:32 [ianmcorvidae]
heh, references
20:14:35 [warp]
$releases_various[0] returns the first element, which is a reference to a list. you either need to make that a regular list, or store it in a scalar.
20:14:44 [warp]
so one of these:
20:14:54 [warp]
my $foo = $releases_various[0];
20:14:56 [Batsy]
ok
20:15:03 [warp]
my @foo = @{ $releases_various[0] };
20:15:20 [warp]
if you choose the $foo option, you can use $foo as a list by writing it as @$foo.
20:15:44 [warp]
@$foo is really just a shorthand for @{ $foo }
20:15:44 [Batsy]
* Batsy nod
20:16:12 [warp]
Batsy: that should give you enough info to continue, right?
20:16:21 [Batsy]
yeah, I think so.
20:16:44 [warp]
great. i'll be afk, if you have more questions i'll be online again tomorrow :)
20:16:52 [Batsy]
ok :)
20:20:06 [ruaok]
thanks for helping Batsy out, warp!
20:23:56 [ruaok]
* ruaok heads for food
20:24:27 [adhawkins]
adhawkins has joined #musicbrainz-devel
20:25:10 [ianmcorvidae]
ruaok: at some point when you're not busy with more pressing things (so, probably just about anything else :P) we could chat about Batsy and I's vacation again (presuming you now have a better idea of when you're heading out for Bologna, etc.)
20:25:59 [Batsy]
oh, yeah, cause I think we've got our dates set
20:36:39 [Mooky]
Mooky has joined #musicbrainz-devel
20:40:30 [Mooky]
Mooky has left #musicbrainz-devel
20:41:49 [Mooky953]
Mooky953 has joined #musicbrainz-devel
20:45:24 [bitmap]
luks: to use disambiguation comments in Picard scripting, would recordingcomment and releasecomment be suitable names?
20:46:42 [luks]
do you want to save them to tags?
20:47:07 [bitmap]
no, I wouldn't really see the use of that
20:47:17 [bitmap]
unless you think it would be useful
20:47:33 [luks]
no, I definitely do not want that :)
20:47:44 [bitmap]
okay, good :)
20:47:55 [luks]
but then you have to use the "private" names
20:48:08 [luks]
e.g. ~recordingcomment
20:48:17 [luks]
which is available as %_recordingcomment%
20:48:29 [bitmap]
ah, right.
20:49:49 [bitmap]
luks: I guess there's no good way to support artist comments though, with artist credits
20:49:51 [luks]
I'd use disambiguation instead of comment if it wasn't so long
20:49:58 [luks]
yeah
20:50:17 [adhawkins]
* adhawkins hates disambiguation
20:50:34 [luks]
well, the fields are no release general purpose comments
20:50:41 [luks]
they have a special meaning
20:50:45 [bitmap]
I think "disambiguation" has a clearer meaning then "comment"
20:50:49 [bitmap]
s/then/than
20:50:52 [adhawkins]
But it's a pain to type :)
20:50:59 [luks]
disambig :)
20:54:03 [ruaok]
ijabz_: ping
20:54:22 [ruaok]
ianmcorvidae, Batsy: what dates are you going to be in cali?
20:54:37 [ijabz_]
ruaok: pong
20:54:43 [Batsy]
probably something like July 9 - 17
20:55:02 [ruaok]
for the cdstubs fix, does the servlet need to be updated as well, or just the index builder?
20:55:27 [ijabz_]
both, but yo canupdate searchservlet now before indexes are rebuilt
20:55:36 [ruaok]
I should be around for a good chunk of that, Batsy .
20:55:47 [ruaok]
probably heading out on the 13th.
20:55:57 [ruaok]
ijabz_: ok.
20:57:27 [Batsy]
ruaok: sounds good
20:57:40 [ianmcorvidae]
cool
20:58:07 [ruaok]
that weekend is my only weekend in town for most of june and all of july.
20:58:08 [ruaok]
sigh.
20:58:23 [ianmcorvidae]
heh
20:58:36 [ruaok]
I really screwed my summer. big time.
20:58:54 [ianmcorvidae]
well, at least you get vacation in the fall, if tripit is true
20:59:13 [ruaok]
yes. :)
20:59:45 [ruaok]
at some point I need to go on a real vacation in my timezone.
21:00:00 [ianmcorvidae]
hehe
21:01:25 [Mooky953]
hi - is now a good time to ask a question about musicbrainz?
21:01:36 [ianmcorvidae]
Mooky953: any time is a good time to ask a question about musicbrainz, here :)
21:01:44 [Mooky953]
cool
21:01:56 [Mooky953]
I've posted on musicbrainz-devel a couple of times but got no bites.
21:02:08 [Mooky953]
I was wondering what the situation/outlook is with discId lookups?
21:02:59 [ruaok]
what is the issue exactly?
21:03:14 [ruaok]
a bug report would be the best way to tell us what is up.
21:03:19 [Mooky953]
Since NGS, most lookups give lots of results.
21:03:34 [ruaok]
yes.
21:03:36 [Mooky953]
So the usefulness of the lookups isn't so good anymore :(
21:03:40 [Mooky953]
(imho)
21:03:54 [ruaok]
why?
21:04:11 [ruaok]
its a matter of picking the right medium that matches what you're looking for.
21:04:32 [Mooky953]
Previously, before NGS, if I put a CD in my computer, most of the time I would automatically get the right result
21:04:36 [Mooky953]
that was really cool
21:04:45 [Mooky953]
Now I get a list and have to pick the right one...
21:05:03 [MBChatLogger]
Ponies and sunshine and myspace and glitter!!
21:05:03 [Mooky953]
Things like Gracenote and all that stuff also seem to get the right result most of the time.
21:05:15 [ruaok]
are you the author of the lookup app?
21:05:23 [Mooky953]
yes
21:05:29 [ruaok]
which app?
21:05:33 [Mooky953]
ripright
21:05:49 [Mooky953]
http://mcternan.co.uk/ripright/
21:06:12 [Mooky953]
but I think this is an issue for all apps
21:06:13 [ruaok]
ok, lets see if we can work out some heuristics to help you along.
21:06:32 [ianmcorvidae]
can you use something like ISRCs and CD-TEXT as additional info, perhaps? that would be my super-hacky solution :P
21:06:35 [ruaok]
yes, because even after us talking about NGS for two years NO ONE BOTHERED TO CHECK THEIR APPS WITH NGS.
21:06:51 [ruaok]
its not that NGS was a surprise to anyone.
21:07:08 [Mooky953]
I did not know about musicbrainz until about 2 months ago, so I'm sorry for that.
21:07:22 [Mooky953]
I don't mind the NGS has made it bad though...
21:07:23 [ruaok]
ruaok: do you have an example WS query that your app makes that results in having to pick the right results?
21:07:50 [Mooky953]
17% of discIds will return more than 1 medium at the moment
21:07:57 [ruaok]
if so, give us the URL and we'll see if we can help you make the right decisios.
21:08:04 [ruaok]
* ruaok nods
21:08:10 [Mooky953]
Lookup discid dyxEcw.E6UX5lqK2IycFcMURyFE-
21:08:27 [ruaok]
what is the WS query you use to look that up?
21:08:47 [Mooky953]
you can look it up anyway you like. You will still get 10 mediums.
21:08:53 [Mooky953]
I have a copy of the db on a server here...
21:09:00 [ruaok]
humor me, please.
21:09:05 [Mooky953]
sure.. hang on
21:09:28 [Mooky953]
http://musicbrainz.org/cdtoc/dyxEcw.E6UX5lqK2IycFcMURyFE-
21:09:37 [ruaok]
no, web service query.
21:09:57 [Mooky953]
oh - you want the XML output..... that may take me longer
21:10:06 [ruaok]
just the XML URL
21:10:12 [ruaok]
I want to see how you request the data.
21:10:19 [ruaok]
and which version of the WS you're using.
21:10:39 [Mooky953]
I've been using libmusicbrainz3 and 4
21:11:10 [ruaok]
ah, ok.
21:11:39 [ruaok]
* ruaok does it by hand
21:11:53 [Mooky953]
libmb3 does a lookup like this
21:11:54 [Mooky953]
GET /ws/1/release/?type=xml&discid=dyxEcw.E6UX5lqK2IycFcMURyFE-
21:11:55 [ianmcorvidae]
both WSes return 10 results, anyway, and I suspect they're the same
21:14:25 [ruaok]
why not just pick the first result?
21:14:48 [Mooky953]
there are a couple of problems with that...
21:15:07 [Mooky953]
first is that I can't then tag with MUSICBRAINZ_ALBUMID correctly
21:15:25 [Mooky953]
also, if the discId had a clash (and there are a few), then I might get totally the wrong artist.
21:15:35 [ruaok]
why not? the XML provides the MBID for the release.
21:15:54 [ruaok]
for the artist case, compare the aritist ids returned.
21:16:06 [ruaok]
in that case you will need to ask the user, like you had to before.
21:16:21 [ruaok]
so, the heuristic would be:
21:16:32 [ruaok]
1. check to see if the artist ids are all the same.
21:16:41 [Mooky953]
comparing artist Id is good yes.
21:16:43 [ruaok]
2. if so, pick the first one, or the earliest date.
21:17:02 [ruaok]
3. If not, make a list of artists and then ask the user to choose an artist.
21:17:13 [ruaok]
4. go back to 2.
21:17:21 [Mooky953]
2. -> that may or maynot match the release I have
21:17:34 [Mooky953]
I don't think I can tag with MUSICBRAINZ_ALBUMID anymore, which is a shame.
21:17:35 [ruaok]
then you need to get more info from the user.
21:17:40 [ruaok]
why not?
21:17:52 [Mooky953]
ripright doesn't ask the user *anything*
21:18:01 [ruaok]
the mbid to put into MUSICBRAINZ_ALBUMID is returned by the query.
21:18:01 [Mooky953]
it's like autorip.
21:18:10 [ruaok]
ah, if it doesn't ask anything then expect bad data.
21:18:31 [ruaok]
what do you do in the case of a discid clash?
21:18:38 [ruaok]
just pick one and give the user bad data?
21:18:51 [Mooky953]
eject the CD. there is an option to rip and tag as all choices if the user wants.
21:19:03 [Mooky953]
clashes are quite rare, although I have had a few.
21:19:57 [Mooky953]
I wonder if it is really the case that all 10 releases of Moby's Play have the same discId...?
21:20:06 [ruaok]
yes, they do.
21:20:15 [ruaok]
but they have been released a bunch of times.
21:20:24 [ruaok]
different places in the world with different dates.
21:20:30 [ruaok]
different products, but the same CD.
21:20:50 [ianmcorvidae]
my recommendations would be: add a preferred release country option to your ripper, for the case of things that have been released a bunch of times in different countries with the same disc
21:21:00 [ruaok]
ianmcorvidae: +1
21:21:08 [ruaok]
I was about to suggest that.
21:21:14 [ianmcorvidae]
2.) use ISRCs and anything you can find in CD-TEXT to try to disambiguate
21:21:30 [ianmcorvidae]
or, more generally, try to use EVERYTHING you can find on the disc
21:21:41 [ruaok]
the thing is that we're returning better data and people are complaining about that.
21:21:44 [ruaok]
I dont get it.
21:21:49 [ianmcorvidae]
and then it's just a matter of figuring out the best cutoff for the heuristic
21:21:54 [Mooky953]
yep
21:21:57 [ruaok]
I liked my bad data that was easy!
21:22:59 [Mooky953]
does anyone know of a CD-TEXT library for Linux?
21:23:19 [Mooky953]
I didn't find much to read CD-TEXT. I guess someone here would know!
21:23:36 [ianmcorvidae]
I don't necessarily know of a library
21:23:43 [ianmcorvidae]
there's a bunch of programs that do, though
21:23:53 [ianmcorvidae]
icedax, k3b
21:24:10 [hawke_]
icedax is pretty buggy though, I think
21:24:11 [ianmcorvidae]
brasero
21:24:11 [ianmcorvidae]
etc
21:24:25 [hawke_]
Also, not too many CDs have CD-TEXT in my experience
21:24:35 [Mooky953]
I think this is what I've also found... :(
21:24:37 [hawke_]
and there isn’t really anything useful to compare to, is there?
21:24:39 [ianmcorvidae]
huh
21:24:54 [ianmcorvidae]
well, I mean, depends what all is included in the CD-TEXT
21:24:58 [hawke_]
I forget, does CD-TEXT have the release country embedded in it anywhere?
21:25:19 [Mooky953]
https://secure.wikimedia.org/wikipedia/en/wiki/CD-TEXT
21:25:20 [ruaok]
I would add a command line option to specify the country and then pick the first release date.
21:25:27 [ruaok]
done.
21:25:33 [ianmcorvidae]
what wikipedia shows is arranger, composer, disk_id (whatever that is), genre, isrcs, messages, performer, songwriter, title, toc_info, toc_info2, upc_ean, and size_info
21:25:45 [ianmcorvidae]
upc_ean would certainly help, ISRCs I already mentioned
21:25:51 [Mooky953]
UPC code might be good...
21:25:56 [ianmcorvidae]
arranger/composer/performer/songwriter stuff isn't that likely to help, but it might
21:26:03 [hawke_]
ianmcorvidae: The problem is that it will *possibly* let you differentiate between two completely different releases, which one is right…but I think a lot of the time different countries have the exact same mastering.
21:26:15 [ianmcorvidae]
well, yes
21:26:17 [Mooky953]
and... few CDs have this. which is why we need MusicBrainz!!!
21:26:19 [ianmcorvidae]
it's a heuristic
21:26:38 [hawke_]
ianmcorvidae: Yeah, but that means that in the majority of cases it’s not helpful. :-)
21:26:40 [ianmcorvidae]
I'm certainly not trying to say this will solve the whole problem, but I think it'll help choose better
21:26:50 [ianmcorvidae]
this is possible
21:26:51 [ianmcorvidae]
heh
21:27:11 [ianmcorvidae]
I think the preferred release country thing will be the most help, since that's a matter of musicbrainz data rather than a matter of how the CD was mastered
21:27:13 [hawke_]
Whereas before, release events let you pretty much ignore the problem completely because the tracklist would be the same anyway
21:27:38 [Mooky953]
yes. releaseevents were a better API, although I understand the problem in the database.
21:28:15 [Mooky953]
that said, I have discIds often link to multiple mediums, each of which then links back to a single track listing.
21:29:00 [Mooky953]
so the current querys are sometimes a little fictious in returning multiple releases and tracklistings...
21:29:33 [Mooky953]
the used bandwidth for queries must have gone up with NGS, right?
21:29:57 [ruaok]
http://stats.musicbrainz.org/mrtg/95th-percentile/201105.png
21:30:14 [ruaok]
our ISP luckily gave us 10mbit commit for the price of 6mbit.
21:30:28 [ruaok]
oh, that reminds me, I gotta sign the new contract.
21:31:29 [Mooky953]
the x scale is days?
21:31:45 [ruaok]
yep
21:32:00 [Mooky953]
kinda needs month too. when was the NGS switch on that scale?
21:32:13 [Mooky953]
it is that spike in the middle?
21:32:23 [ruaok]
http://stats.musicbrainz.org/mrtg/95th-percentile/
21:32:30 [ruaok]
a graph for each month
21:32:44 [Mooky953]
ooooh :-)
21:32:47 [ruaok]
the spike in the middle is us moving data around.
21:33:27 [Mooky953]
then going to NGS, right?
21:33:54 [ruaok]
yeah.
21:34:00 [ruaok]
18th onward is NGS
21:34:56 [Mooky953]
heh
21:35:22 [Mooky953]
since it's using this bandwidth, it seems a shame for me to combine and throw that data away :(
21:36:17 [ruaok]
agreed, but better data requires more bandwidth.
21:36:26 [Mooky953]
:-D
21:36:33 [ruaok]
but CD lookups are a drop in the bucked compared to the other lookups.
21:36:38 [ruaok]
bucket
21:37:16 [Mooky953]
how important is CD lookup to musicbrainz? is it something many people have taken commerical licences on?
21:37:34 [ruaok]
not a single license for CD lookups that I know of.
21:37:46 [ruaok]
then again, all of our web service is still free to use for anyone.
21:38:05 [ruaok]
we'll make it non-commercial soon and offer a paid commercial service that has no rate limiting
21:38:25 [Mooky953]
okay
21:39:23 [Mooky953]
I'm just throwing this out there, and I know on the list of things this is quite far down, but would it be worth adding a query on discId which ignores different releases and returns only unique Artists and tracklists?
21:39:56 [ruaok]
well, that would be returning bad data, which is what we're trying to avoid.
21:40:04 [Mooky953]
bad data? how?
21:40:27 [ruaok]
we'd be conflating a bunch of data into one query.
21:40:34 [ruaok]
and that in our book is bad data.
21:40:44 [ruaok]
that is why we worked hard to break it out and make it accurate.
21:41:24 [Mooky953]
sure - but in this case, I don't want details of the specific release, just the track listing, artist and title.
21:41:37 [Mooky953]
so it's not really conflation, it's just a different request type.
21:42:10 [ruaok]
I understand what you mean.
21:42:30 [ruaok]
but that is not consistent with our data model and thus going to run into trouble with the dev team.
21:42:52 [Mooky953]
are you the dev team?
21:43:07 [ruaok]
I'm the dev team manager.
21:43:20 [ruaok]
but they are much more critical about these things than I am.
21:43:24 [Mooky953]
sure.
21:43:26 [ruaok]
I'm a sloppy hack, really.
21:43:41 [ruaok]
* ruaok images lots of heads around the world bobbing up and down
21:44:38 [Mooky953]
:) I think it would be worth giving this some thought. Potentially it's easier for you to query the db to get the specific results I want than me then filtering and condensing the values after I've got them.
21:45:16 [Mooky953]
It would also save a bit of bandwidth, and give an easier upgrade path for apps which didn't sort themselves out over the NGS migration.
21:45:18 [MBChatLogger]
http://tickets.musicbrainz.org
21:45:18 [ruaok]
if you want, go ahead and open a ticket on tickets.mb.org
21:45:27 [ruaok]
and the dev team will consider it.
21:45:27 [nikki]
Mooky953: btw, what max does is pick the first result but give the user a way to list all the results if they think it's wrong
21:45:32 [ruaok]
and probably close it as wontfix. :)
21:46:21 [Mooky953]
I think it would be quickly closed. It's not really an outright bug either. It's one of those whiney nice to haves which depend on where you are sitting....
21:46:36 [ruaok]
* ruaok nods
21:46:42 [nikki]
* nikki finally let her phone upgrade and wonders what's up with the radioactive green colour
21:46:53 [Mooky953]
maybe I should make libmusicbrainz-conflate :)
21:46:58 [ruaok]
nikki: I'm still wondering.
21:47:06 [ruaok]
but the upgrade was well worth it.
21:47:13 [Mooky953]
anyway - this has been great discussion.
21:47:26 [nikki]
oh? what did they add?
21:47:31 [Mooky953]
thanks for musicbrainz, it's really cool, and thanks for your time
21:47:32 [nikki]
* nikki mostly wants them to add more characters to the fonts
21:47:44 [ruaok]
nikki: I forget. things are faster and overall better.
21:47:49 [ruaok]
Mooky953: you're welcome.
21:47:56 [ruaok]
sorry we're not making thigns easy for you.
21:48:10 [ruaok]
but we care a lot about good data.
21:48:21 [ruaok]
and often times that means that things are hard.
21:48:46 [Mooky953]
if I could query it using SQL, I could get what I want much more easily :)
21:49:01 [ruaok]
well, you can totally to that.
21:49:12 [ruaok]
you can use the replication to have your server stay up to date.
21:49:22 [ruaok]
and then you can hack your own support into the server.
21:49:41 [Mooky953]
sure - I've got a server here, but other people that may want to use ripright won't want to go to that trouble...
21:49:54 [Mooky953]
libmusicbrainz-conflate here we come!
21:50:02 [ruaok]
I meant that you should point your users to your server.
21:51:12 [Mooky953]
yeah - it starts to rapidly get complicated then. And that's too much like forking, when instead I hope that some feedback to you guys will push things along helpfully.
21:51:23 [ruaok]
nikki: search servers have been updated.
21:51:31 [nikki]
good to know
21:51:36 [ruaok]
a new cdstub index should be out in about 5 hours.
21:51:57 [ruaok]
but the other indexes have been up to date since the time I mentioned yesterday.
21:56:06 [Mooky953]
thanks again, bye!
21:56:10 [Mooky953]
Mooky953 has left #musicbrainz-devel
22:23:58 [flamingspinach]
flamingspinach has joined #musicbrainz-devel
22:25:09 [nikki]
hmm... ref_count should be automatically updated, right?
22:39:25 [nikki]
'cause in my db, there are 19,151 urls where ref_count is 0, but only 889 urls which aren't used
22:39:52 [ruaok]
I forget where that was left, but I think some ref counting was abandoned.
22:40:03 [ruaok]
better ask luks or ocharles when he returns
22:40:07 [nikki]
ok
22:41:58 [Mooky679]
Mooky679 has joined #musicbrainz-devel
22:42:29 [ruaok]
back for more abuse, Mooky679 ? :)
22:42:43 [Mooky679]
yes. it's hot hot hot here - can't sleep
22:42:50 [ruaok]
where is that?
22:42:59 [Mooky679]
London, UK
22:43:01 [Mooky679]
where are you?
22:43:30 [ruaok]
san luis obispo, california.
22:43:42 [Mooky679]
nice.
22:43:49 [Mooky679]
so... i have 1 more question if that's okay?
22:43:53 [ruaok]
shoot
22:44:45 [Mooky679]
This Moby CD has a pile of results, and each result has a pile of discIds. Previously you said something along the lines of these being genuine duplicate IDs for the CDs. But is that really the case????
22:44:59 [Mooky679]
I think it follows that many of the releases could have been different pressings.
22:45:22 [Mooky679]
So we may be able to distinguish the different releases by their different disc IDs.
22:45:33 [ruaok]
I haven't looked at it closely, bit it might very well be the case.
22:45:57 [Mooky679]
So... the data isn't accurate. Well, that's my hypothesis...
22:45:59 [ruaok]
I would not be surprised that different pressings have also been released multiple times.
22:46:07 [ruaok]
I think the data is accurate.
22:46:09 [Mooky679]
true
22:46:18 [ruaok]
some of our must serious editors are moby fans.
22:46:21 [ruaok]
most
22:46:38 [nikki]
but have they cleaned the disc ids up since ngs was released? ;)
22:46:59 [Mooky679]
probably not if their collection is already tagged.
22:47:06 [Mooky679]
My hypothesis is that the data is inaccurate.
22:47:13 [Mooky679]
But we need a way to test that....
22:47:20 [Mooky679]
3 thoughts come to mind....
22:47:44 [Mooky679]
1) wipe all discIds from this darn Moby Play album, and then watch to see what discIds get rebuilt by the community
22:47:53 [Mooky679]
this is a bit damaging, maybe uncool....
22:48:11 [nikki]
before we only had one tracklist with numerous release events and disc ids linked to it, but no way to say a specific disc id belongs to a specific release event, now we can so there's cleanup to be done
22:48:38 [Mooky679]
nikki: sure sure sure, but we need to accept the data is inaccurate for that to be done.
22:49:05 [Mooky679]
2) can we use GeoIP to trace queries from clients against discId? Then we could see that all the discIds coming from Japan are for the JP release...
22:49:13 [nikki]
hah!
22:49:30 [nikki]
I bet that wouldn't work :P
22:49:51 [Mooky679]
why?
22:50:47 [nikki]
because people don't care that much about country borders and because mb is more popular in certain countries than others
22:51:04 [ijabz]
ijabz has joined #musicbrainz-devel
22:51:42 [Mooky679]
Moby's Play was massive. I think there are enough listeners world wide that we would get reasonable stats on discId vs geography over a month or so.
22:52:05 [Mooky679]
I'm not saying it will work for every CD, but it proves the hypothesis that the discIds are basically all jumpbled up and need fixing.
22:52:14 [Mooky679]
I think?
22:52:20 [ruaok]
can you appeal to your users to come to MB and help us clean up the data?
22:52:26 [nikki]
* nikki hasn't been disputing that the disc ids are jumbled up
22:52:30 [ruaok]
that would be helpful and insightful.
22:52:47 [ruaok]
agreed, I'm not certain that things are all that wrong.
22:53:03 [Mooky679]
I'm not sure I have many users. I made ripright about 2 months ago, then released it about 1 month ago, then NGS broke it :-D
22:53:36 [ruaok]
lol
22:54:12 [nikki]
somewhere on the forum someone asked about fixing disc ids
22:54:38 [nikki]
and my response then was to check the history and if any of the editors who added disc ids were still active, and if they are, ask them if they know which release their disc id belongs to
22:55:40 [Mooky679]
So, option 3) we could look at the database and see for each release how many mediums it has. For releases with 2, 3, 4 etc... mediums, we could then see on average how many discIds are associated with each medium.
22:55:40 [Mooky679]
I think we know, from how NGS migration was done, that option 3 will show something along the lines of CDs with 2 releases having 2 mediums each with 2 discIds... maybe not?
22:56:43 [Mooky679]
I made some stats before, and found that on average each release has 1.07 associated mediums. Most stuff is released once with 1 pressing I guess.
22:56:52 [nikki]
(just removing them all seems a bit too drastic, but it should be possible to try and identify some of them)
22:56:53 [Mooky679]
But then, each medium has 1.4 discIds....
22:57:05 [Mooky679]
I don't think we can remove them all.
22:58:00 [nikki]
it'd be nice if we could send a message to an artist's subscribers...
22:58:08 [nikki]
* nikki wonders where the ticket for that was
22:58:22 [Mooky679]
that would be cool
22:59:02 [Mooky679]
It's probably a lot of work, but I wondered if a cdstub-like system could be done with discIds... so unverified ids could eventually get weeded out or something.
22:59:53 [Mooky679]
The real problem with discIds is that I only know about the ones I have. I can't legitimately be sure that the other discIds are wrong and remove them.
23:00:36 [Mooky679]
I could go to Moby's Play and then delete the Ids from the UK release which I think are wrong, but I might be making a mistake, and I'm not sure editors would agree.
23:01:32 [Mooky679]
Can I ask, is there an acknowledged problem on disc id accuracy? Nikki seems more accepting that they need cleaning up than ruaok. Any other views?
23:01:42 [nikki]
that was why I suggested to this other person who asked that they should try and find out where the other disc ids came from
23:02:14 [Mooky679]
Short of going on ebay and trying to buy every release of an album, it's an impossible task.
23:02:30 [nikki]
if one person has disc id 1 for a 2000 release and the other person has disc id 2 for a 2005 release, I think it's reasonable enough to remove disc id 2 from the 2000 release and disc id 1 from the 2005 release (and if someone later comes along and re-adds one of them, we can ask them about it)
23:02:36 [ruaok]
Mooky679: there isn't a major problem with our disids.
23:02:45 [ruaok]
we have some cleanup to do, but its not a major problem
23:03:28 [Mooky679]
17% of all discids are associated with more than 1 medium. I think that shows a significant problem.
23:03:38 [Mooky679]
17.1%
23:04:31 [Mooky679]
the average release has just 1.07 associated mediums...
23:05:42 [Mooky679]
ruaok: doesn't that hint of a small problem?
23:06:21 [ruaok]
uhm, ok sure.
23:06:44 [nikki]
* nikki doesn't understand the bit about releases having 1.07 mediums on average
23:07:35 [Mooky679]
I queried the database and counted up how many mediums are associated with each release. The average release is associated with 1.07 different mediums.
23:07:47 [Mooky679]
i.e. most release have a single medium in the db
23:08:13 [ianmcorvidae]
so most releases are single-disc releases
23:08:19 [ianmcorvidae]
what does that have to do with DiscIDs?
23:08:22 [nikki]
yeah
23:08:25 [nikki]
that's what I was thinking
23:08:53 [reosarevok]
I mean, even if all the releases were 2xCD
23:09:04 [reosarevok]
They'd still have a different discID for each one of the two
23:09:19 [Mooky679]
If we look at the discIds....
23:09:37 [reosarevok]
The discID issue exists
23:09:46 [Mooky679]
we find that each medium is associated with about 1.4 discids.
23:09:47 [reosarevok]
It is just not related to the number of mediums per release
23:10:04 [reosarevok]
But to the number of re-releases of said release
23:10:20 [Mooky679]
so, do we really think that most CD releases are pressed 1.4 times?
23:10:35 [reosarevok]
Nope
23:11:11 [Mooky679]
so, my hypothesis is that releases have extra discIds which is the obvious outcome of NGS.
23:11:17 [reosarevok]
DiscIDs were associated to several re-releases, and probably only one or two of those are correct
23:11:18 [Mooky679]
don't get me wrong though....
23:11:32 [Mooky679]
NGS is better, and the data wasn't there before, so this is one option.
23:11:36 [ianmcorvidae]
I think there's some bad data, but I don't know that it's NGS that's to blame -- I think NGS is just what's making it clear
23:11:40 [reosarevok]
Yep
23:11:42 [Mooky679]
agree
23:11:43 [reosarevok]
What ian says
23:11:54 [reosarevok]
Now, how do we fix the bad data?
23:12:11 [ianmcorvidae]
disc ID quality metrics? :P
23:12:14 [Mooky679]
well, if we agree it's bad, then the discussion can move on to that.
23:12:31 [nikki]
who said it's not?
23:12:55 [Mooky679]
I wasn't sure about ruaok's response. ruaok: r u a ok?
23:13:04 [nikki]
he said "we have some cleanup to do" :P
23:13:21 [reosarevok]
It's bad, it's just not worse than before… just more obvious :)
23:13:26 [ruaok]
* ruaok has one foot out the door to go to digital west
23:13:38 [reosarevok]
Now, the problem with cleanup is that we need to recheck all disc IDs for it
23:13:49 [Mooky679]
yes
23:14:04 [reosarevok]
As much tempted as I am to say "delete all discIDs which are in more than one release", that might not be desirable
23:14:05 [reosarevok]
So
23:14:08 [reosarevok]
Ideas?
23:14:16 [Mooky679]
yup. And the problem is, that it's hard to test a negative
23:14:18 [nikki]
* nikki already gave one suggestion
23:14:22 [Mooky679]
I can only verify what I have.
23:14:43 [Mooky679]
emailing editors would be cool.
23:14:49 [reosarevok]
Oh, checking history
23:14:52 [reosarevok]
That might work
23:14:53 [nikki]
you can leave edit notes on their edits
23:15:10 [reosarevok]
Maybe we should automat it
23:15:27 [Mooky679]
being able to determine a verified discId<->medium association from one which was assumed at NGS time would be a good start I think.
23:15:35 [reosarevok]
Like, make modbot post on all "add disc ID" edits for these discIDs
23:15:47 [nikki]
if I suddenly get 2000 or so emails from modbot asking me to check my disc ids, I'm sending them straight to the spam folder :P
23:16:33 [reosarevok]
I doubt there are 2000 discIDs you added that are shared by several mediums
23:16:35 [reosarevok]
Or are they?
23:16:48 [reosarevok]
"If you remember to which release this disc ID belongs, please remove it from the rest"
23:16:57 [nikki]
probably not all of them, but quite a few of them
23:17:06 [reosarevok]
Meh, ok
23:17:12 [reosarevok]
It could do it in small batches
23:17:18 [reosarevok]
Dunno
23:17:31 [nikki]
and the problem is that often the disc id was added by someone who just added it and vanished, and then the active editors didn't need to add it
23:17:47 [nikki]
so emailing subscribers seems to make more sense
23:18:35 [ianmcorvidae]
yeah
23:18:38 [nikki]
and often a disc id will be shared by multiple releases, so it needs more collaborative effort than just removing the disc id from any other release :/
23:18:57 [reosarevok]
Well
23:19:05 [reosarevok]
I'd rather have the right discID in one release
23:19:10 [reosarevok]
Than the right one in 2 and wrong in 4
23:19:27 [reosarevok]
But that might be just me
23:20:09 [ianmcorvidae]
hah, clever
23:20:51 [reosarevok]
:p
23:22:04 [Mooky679]
I like the idea of emailing subsribers. But there needs to be an easy way for them to select "this discId goes with this release" and then knock that discId out of all the other releases.
23:22:37 [Mooky679]
If later someone then finds another discId for the same release, they can add it back in.
23:23:08 [Mooky679]
It's still a bit destructive, but less so. And each release neved ends up with 0 discIds.
23:23:20 [Mooky679]
Er, when I said release there, I really mean medium :)
23:25:29 [Mooky679]
Some editing guidelines would also be needed on the wiki I guess. Who's responsible for that?
23:25:42 [reosarevok]
Everyone is :)
23:26:40 [ianmcorvidae]
congratulations, you've set yourself up to be introduced to mb-style :P
23:28:03 [Mooky679]
maybe not entirely by accident... :)
23:29:28 [reosarevok]
Actually, mb-style is probably a better place for this discussion anyway
23:30:03 [Mooky679]
if we need a change to the website functionality, I presume we'd need a discussion here as well though?
23:30:37 [reosarevok]
Kinda
23:30:45 [reosarevok]
But the standard procedure is
23:30:50 [Mooky679]
I mean, it's not just a style thing... it's *very* difficult to verify that some medium does *not* have some discId. So it needs a bit of tooling I think...
23:30:54 [reosarevok]
Get style to agree on something
23:31:06 [reosarevok]
Then come and make sure it can be done
23:31:18 [reosarevok]
Then go back and get style to confirm it
23:31:27 [reosarevok]
Then wait until it is done
23:32:29 [Mooky679]
will that be harder or easier than trying to explain that discIds need a cleanup? :D
23:32:56 [reosarevok]
Making style agree on something?
23:33:00 [reosarevok]
It can be easy
23:33:12 [Mooky679]
oh - okay. it sounded impossible for a moment there.
23:33:16 [reosarevok]
Or harder than condemning Israel in the UN security council
23:33:17 [reosarevok]
:p
23:33:29 [reosarevok]
Depends a lot
23:33:33 [Mooky679]
I see.
23:34:21 [reosarevok]
Worth a try anyway, I imagine
23:34:51 [Mooky679]
I think ideas for cleaining up the data still need consideration. A mailout is good, I'm still not sure that a bunch of IDs couldn't be fixed by geoIP, although are IP addresses logged, and would people like that?
23:35:32 [Mooky679]
There's probably other ways too... I'm not sure if there's any other databases to cross-reference.
23:36:29 [bitmap]
geoip wouldn't work for people who buy a lot of imports, like me
23:37:34 [Mooky679]
my thought would be to count queries for each DiscId by region. then you can just look for a spike in one country, and make an informed guess about the release and discId relationship.
23:37:57 [Mooky679]
bitmap: your habits would end up in the noise I suspect
23:38:23 [bitmap]
it sounds rather complicated, still
23:38:56 [Mooky679]
you could parse the server logs and do it I think.
23:39:01 [Mooky679]
there might be a lot of logs though...
23:39:40 [Mooky679]
of course, it's still not going to fix every Id as some discId's won't get enough lookups to give any confidence in the relationship with a geography
23:40:08 [reosarevok]
Aaaanyway
23:40:14 [reosarevok]
* reosarevok is going to bed
23:40:20 [reosarevok]
time-of-day y'all
23:40:41 [Mooky679]
I'm going to go to bed too.
23:41:19 [Mooky679]
these thoughts need sleeping on I think, but hopefully this has been useful.
23:42:18 [Mooky679]
I'll pop by maybe next Monday.... bye
23:42:23 [Mooky679]
Mooky679 has left #musicbrainz-devel
23:59:18 [ruaok]
ruaok has joined #musicbrainz-devel